Site Reliability Engineering (SRE) Intern
We are seeking a talented and motivated SRE Intern to join our growing team, focusing on Productivity Engineering and the Developer Experience for our development platform. In this role, you will contribute to the reliability, scalability, and performance of our enterprise-level systems. The ideal candidate has a solid foundation in software engineering with a passion for infrastructure-as-code and building robust, automated solutions.
Responsibilities
-
Developer Platform Contribution: Contribute to the design, implementation, and maintenance of Flywire's internal PaaS ecosystem, known as Victoria, which supports the full software development lifecycle (create, build, test, deploy, run, and monitor).
-
Developer experience: Cater to the needs of over 200 engineers and make their job easier and more efficient. We solve problems by writing well maintained and elegant software solutions. Scripting is for PoCs, our production systems are robust software engineering projects.
-
Automation & Tooling: Participate in the design and development of tooling for task automation. Work on a variety of web tools, UIs, CLIs, bots and stateless services to keep our eco system running smoothly.
-
Continuous Improvement (CI/CD): Assist in advancing our full DevSecOps capabilities and contributing to various phases of the development lifecycle, focusing heavily on deployment and operations.
-
Operational Support: Assist senior engineers in ensuring the performance, quality, and responsiveness of our cloud-native applications and helping to understand system requirements and troubleshoot production issues and software reliability issues
-
Platform Abstraction: Contribute to reducing tooling complexity through implementing simple and elegant abstractions.
-
Best Practices Championing: Support development teams (200+ engineers) in adopting platform best practices and internal tools.
What You Will Learn
-
Software Engineering for the Cloud: How to build and run software in close contact with your clients (the other engineers). Learn to design, implement and roll out features for existing systems or create new ones to be used in production environments.
-
AI Software Engineering Processes & Infrastructure: How to use agentic flows for building software and rely on, operate and improve our AI infrastructure.
-
Infrastructure as Code (IaC): How to manage cloud resources programmatically using tools like Terraform
-
Observability & Monitoring: How to implement logging, metrics, and tracing to gain deep insights into system health.
-
Container Orchestration: Real-world experience with Docker and AWS to manage distributed services.
-
Incident Management: How to respond to system failures, conduct post-mortems, and prevent recurrence in a production environment.
-
Security at Scale: Implementing automated security checks and best practices within the CI/CD pipeline.