Site Reliability Engineer - Full Remote (EU only) at Jobtome | Remote in Germany

Showing 26 of 1,000,000+Live

Jobtome

Site Reliability Engineer - Full Remote (EU only)

Remote, Switzerland·Full-time·Posted 1 month+

Location: Remote, Switzerland
Category: Software Development
Job type: Full-time
Seniority: Unspecified
Language: en

Job details

About the company

At Jobtome - https://weare.jobtome.com/- we are building a modern, cloud-native recruitment and marketing platform used at scale across multiple countries and brands. Our systems power high-traffic job distribution, integrations with external partners, and real-time data pipelines, with a strong focus on reliability, observability, and automation.

Engineering is a core function of the company: we value ownership, pragmatic decision-making, and long-term technical excellence over short-term fixes.

The role

As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our production systems.

You will work closely with Backend, Frontend, and Product teams to:

design resilient architectures
define reliability standards
improve observability and incident response
reduce operational toil through automation

This is not a pure ops role: you will contribute to codebases, collaborate on system design, and help evolve our engineering culture toward SRE best practices.

What you will do

Design, implement, and maintain reliable and scalable cloud infrastructure
Define and evolve SLIs, SLOs, and error budgets
Improve monitoring, alerting, and observability across services
Lead and participate in incident response, post-mortems, and root-cause analysis
Automate repetitive operational tasks to reduce toil
Collaborate with Backend engineers on service design, scalability, and failure modes
Improve CI/CD pipelines, deployment strategies, and release safety
Contribute to infrastructure as code and platform tooling
Act as a reliability advocate across the engineering organization

Tech stack

Cloud: Google Cloud Platform (preferred), AWS
Containers & orchestration: Docker, Kubernetes (GKE)
Infrastructure as Code: Terraform
CI/CD: GitLab CI/CD
Observability: Cloud Monitoring, Logging, Prometheus, Grafana
Languages: Go, Python, Bash
Networking & security: IAM, VPCs, service accounts, secrets management

What we expect from a senior SRE

Strong experience running production systems at scale
Solid understanding of distributed systems and failure modes
Proven experience with SLO-driven reliability
Strong coding skills
Cloud infrastructure automation experience
Ability to debug complex cross-system issues
Ownership mindset and strong communication skills
Pragmatic approach to reliability, speed, and cost trade-offs

Working model

Flexible working hours
Remote-friendly setup
Small autonomous teams
Direct collaboration with product and leadership

Remote English-friendly

About the company

Engineering is a core function of the company: we value ownership, pragmatic decision-making, and long-term technical excellence over short-term fixes.

The role

As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our production systems.

You will work closely with Backend, Frontend, and Product teams to:

design resilient architectures

define reliability standards

improve observability and incident response

reduce operational toil through automation

This is not a pure ops role: you will contribute to codebases, collaborate on system design, and help evolve our engineering culture toward SRE best practices.

What you will do

Design, implement, and maintain reliable and scalable cloud infrastructure

Define and evolve SLIs, SLOs, and error budgets

Improve monitoring, alerting, and observability across services

Lead and participate in incident response, post-mortems, and root-cause analysis

Automate repetitive operational tasks to reduce toil

Collaborate with Backend engineers on service design, scalability, and failure modes

Improve CI/CD pipelines, deployment strategies, and release safety

Contribute to infrastructure as code and platform tooling

Act as a reliability advocate across the engineering organization

Tech stack

Cloud: Google Cloud Platform (preferred), AWS

Containers & orchestration: Docker, Kubernetes (GKE)

Infrastructure as Code: Terraform

CI/CD: GitLab CI/CD

Observability: Cloud Monitoring, Logging, Prometheus, Grafana

Languages: Go, Python, Bash

Networking & security: IAM, VPCs, service accounts, secrets management

What we expect from a senior SRE

Strong experience running production systems at scale

Solid understanding of distributed systems and failure modes

Proven experience with SLO-driven reliability

Strong coding skills

Cloud infrastructure automation experience

Ability to debug complex cross-system issues

Ownership mindset and strong communication skills

Pragmatic approach to reliability, speed, and cost trade-offs

Related English-speaking jobs