**Job Description: **
Are you an expert in deploying, observing, and maintaining distributed fleets of devices? Do you build infrastructure that scales effortlessly and recovers automatically from mass reconnections? Join our team to oversee the operational backbone of our edge-to-cloud ecosystem. If you love automating complex deployments and diving deep into observability metrics, you are the right fit for us!
**Project Description: **
Our project, GroundOS, is not just another screen manager. It is a next-generation Universal Display System (UDS) built to power the future of global mobility. We are building an "Operating System for Reality" that orchestrates massive, data-driven signage networks across critical infrastructure, from major international airports to sprawling public transport systems. GroundOS moves beyond static displays; it uses a state-of-the-art digital twin to process and react to real-time operational data. To guarantee continuous operation, the platform features a resilient, offline-first edge architecture that ensures screens keep running smoothly even if the network fails. Join us to blend high-performance Rust edge computing with modern TypeScript cloud services and help us set a new global standard for how hundreds of millions of passengers experience their journey.
Tasks
-
Manage the deployment, observability, and lifecycle of thousands of remote mini-PCs 聽聽聽聽聽聽聽聽聽 alongside Cloud components.
-
Execute Over-The-Air (OTA) updates reliably across a massive edge fleet.
-
Configure and manage NATS JetStream, including Leaf Nodes for edge-cloud bridging, stream retention, and cluster HA.
-
Setup and maintain tracing and metrics using OpenTelemetry to monitor cross-system 聽聽聽聽聽聽聽聽 health.
-
Architect resilient systems capable of withstanding mass fleet reconnection events 聽聽 (thundering herd) without performance loss.
-
Manage secrets, certificates, and secure mTLS communication between edge devices and the central control plane.
-
Lead incident management and root-cause analysis for fleet-wide issues.
-
Design scalable operations workflows to keep maintenance effort constant as the fleet 聽聽聽聽 grows.