Your mission
As a
Senior Site Reliability Engineer (m/f/x) on the MetaKube Accelerator team, you will leverage modern Kubernetes and cloud-native technologies to maximize reliability, scalability, and operational excellence of the MKA platform. You will solve complex platform challenges, build production-ready systems, and contribute to shared ownership and continuous improvement, shaping the evolution of MetaKube Accelerator and enhancing the reliability of our managed services.
Your tasks
- Designing and implementing observability solutions using Prometheus, Loki, and Mimir, including defining meaningful alerts and improving monitoring coverage
- Troubleshoot and improve custom Kubernetes controllers to ensure reliability and stability
- Develop and maintain production applications, ensuring code quality, scalability, and operational readiness
- Operate, automate, and continuously improve the MKA Platform with a focus on efficiency and maintainability
- Enhance internal tooling to support automation and reduce manual effort
Requirements
- Experience operating highly available, mission-critical applications in cloud and on-prem environments, including incident leadership
- Strong Kubernetes expertise and cluster management experience
- Experience with GitOps principles for deployment and delivery workflows
- Experience with Infrastructure as Code, specifically Terraform
- Proficiency in Bash and/or Python for automation and tooling
- Understanding of CI/CD pipelines, ideally with Tekton-based workflows
- Very good German and good English skills (B2+) for technical collaboration
Nice to Have- Familiarity with ArgoCD or similar GitOps tools
- Exposure to configuration management tools such as Ansible
- Programming skills in Go
- Knowledge of Nix for development tooling and automation
- Comfort working with Helm, Make, and Git
- Additional exposure to cloud-native platforms, observability, or platform automation
What you can expect
You will gain deep hands-on Kubernetes experience, exploring internals few others do. You’ll have freedom to solve challenges, share knowledge, and continuously learn through team collaboration, show-and-tell sessions, and conferences such as KubeCon or Container Days.