Introduction:
You will support development teams and ensure reliable operation of Open Science core services such as INSPIRE, SCOAP3, and CERN Analysis Preservation by delivering modern platform tooling, automation, observability, and infrastructure. This role combines hands on service operations with close collaboration with developers and the fast evaluation of new technologies.
Functions:
-
Support development teams with best practices, deployment models, observability, and platform tooling to ensure smooth integration and reliable production operations.
-
Provide and maintain developer tooling (templates, automation scripts, CI/CD workflows, development environments, platform services) to streamline development, testing, and deployment.
-
Operate and improve production services (INSPIRE, CAP, SCOAP³, and related systems), ensuring reliability, performance, scalability, and security; participate in monitoring, incident response, and service lifecycle management.
-
Lead and contribute to postmortems, conducting structured incident analysis, identifying root causes, defining actions, and driving reliability improvements.
-
Design and automate platform components, including Kubernetes resources, GitOps workflows, Helm/Kustomize configurations, and infrastructure-as-code environments to improve reproducibility and reduce operational overhead.
-
Prototype and evaluate new technologies (cloud-native tooling, operators, observability stacks, AI-related infrastructure) and integrate them when beneficial.
-
Enhance developer experience by improving documentation, automation, self-service capabilities, and platform usability.