Job details
Company
Accesa
Location
Remote, Romania
Employment type
Full-time
Seniority
Mid level
Primary category
Customer Service & Support
Posted date
22 Apr 2026
Valid through
Job description
About the project
You will join one of our ongoing projects for a leading European financial services provider, contributing to a modern, cloud-native SaaS platform built on the Microsoft Azure ecosystem.
You will be part of a dedicated team responsible for ensuring the stability, reliability, and performance of a production-critical application. The team covers end-to-end support activities, working closely with engineering, product, and customer-facing teams.
This role goes beyond traditional ticket handling meaning that you will actively contribute to incident resolution, system reliability improvements, and the overall quality of the production environment.
Responsibilities
- Production Stability: Ensure the continuous availability and reliability of mission‑critical applications by owning production incidents end‑to‑end and keeping stakeholders informed with clear, timely communication.
- Root Cause Mastery: Diagnose complex issues across distributed systems using logs, metrics, and traces, driving permanent fixes across application, infrastructure, and data layers instead of temporary workarounds.
- Azure Cloud Resilience: Support and troubleshoot Azure environments (App Services, AKS, Functions, Storage) while working with Infrastructure as Code (Terraform, Bicep, ARM) to ensure consistent and stable deployments.
- System Visibility: Strengthen observability using Azure Monitor, Application Insights, and Log Analytics, improving alert quality and contributing to meaningful SLIs, SLOs, and reliability metrics.
- API Reliability: Safeguard integrations by debugging REST and GraphQL APIs, authentication flows, webhooks, queues, and third‑party services using tools like Postman, curl, and logs.
- Data Confidence: Investigate data‑related issues using SQL and backend code analysis to resolve inconsistencies, support corrections, and protect system integrity.
- Operational Efficiency: Reduce incident frequency by automating repetitive tasks with Python, Bash, or PowerShell and continuously improving internal tools and operational processes.
- Continuous Improvement: Proactively identify stability risks and optimize systems by refining runbooks, internal tooling, and support workflows.
- Shared Expertise: Scale team effectiveness by maintaining clear documentation, contributing to knowledge bases, and supporting onboarding and mentoring.