About the company
At Jobtome - https://weare.jobtome.com/- we are building a modern, cloud-native recruitment and marketing platform used at scale across multiple countries and brands.
Our systems power high-traffic job distribution, integrations with external partners, and real-time data pipelines, with a strong focus on reliability, observability, and automation.
Engineering is a core function of the company: we value ownership, pragmatic decision-making, and long-term technical excellence over short-term fixes.
The role
As a Senior Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our production systems.
You will work closely with Backend, Frontend, and Product teams to:
-
design resilient architectures
-
define reliability standards
-
improve observability and incident response
-
reduce operational toil through automation
This is not a pure ops role: you will contribute to codebases, collaborate on system design, and help evolve our engineering culture toward SRE best practices.
What you will do
-
Design, implement, and maintain reliable and scalable cloud infrastructure
-
Define and evolve SLIs, SLOs, and error budgets
-
Improve monitoring, alerting, and observability across services
-
Lead and participate in incident response, post-mortems, and root-cause analysis
-
Automate repetitive operational tasks to reduce toil
-
Collaborate with Backend engineers on service design, scalability, and failure modes
-
Improve CI/CD pipelines, deployment strategies, and release safety
-
Contribute to infrastructure as code and platform tooling
-
Act as a reliability advocate across the engineering organization
Tech stack
-
Cloud: Google Cloud Platform (preferred), AWS
-
Containers & orchestration: Docker, Kubernetes (GKE)
-
Infrastructure as Code: Terraform
-
CI/CD: GitLab CI/CD
-
Observability: Cloud Monitoring, Logging, Prometheus, Grafana
-
Languages: Go, Python, Bash
-
Networking & security: IAM, VPCs, service accounts, secrets management
What we expect from a senior SRE
-
Strong experience running production systems at scale
-
Solid understanding of distributed systems and failure modes
-
Proven experience with SLO-driven reliability
-
Strong coding skills
-
Cloud infrastructure automation experience
-
Ability to debug complex cross-system issues
-
Ownership mindset and strong communication skills
-
Pragmatic approach to reliability, speed, and cost trade-offs
Working model