About the role
As an IT infrastructure engineer, you will be responsible for establishing infrastructure from the ground up, capacity planning, disaster recovery, and day-to-day operations. You will manage, configure, and monitor our IT infrastructure, including automated backups; ensure the security and availability of resources; and work closely with engineering and operations teams to provide a robust, scalable IT infrastructure that supports our AI and robotics development workflows.
Your Responsibilities
-
Infrastructure Architecture & Operations: Design, implement, and maintain on-premise IT infrastructure (compute, storage, networking). Perform capacity planning and develop/execute backup and disaster recovery strategies. Maintain comprehensive infrastructure documentation.
-
Physical Data Center & Cloud Infrastructure: Manage and monitor on-premise IT facilities (servers, cooling, power) and hardware. Design and provision storage and compute/GPU infrastructure for high-performance workloads (ML/AI).
-
Enterprise Networking: Design and implement WAN/LAN/WiFi network topology with proper segmentation and security controls (firewalls, IDS/IPS). Configure and manage enterprise networking equipment (switches, routers, load balancers).
-
System Administration & Support: Deploy and manage Linux server infrastructure. Configure and deploy employee workstations (Linux, macOS, Windows) and manage IT equipment procurement. Provide technical troubleshooting and support. Manage user accounts with SSO.
-
Vendor Management: Establish and manage relationships with technology vendors, negotiate contracts, and coordinate with service providers (ISPs, colocation).
Essential Skills
-
Proven track record in building or transforming infrastructure.
-
Deep expertise in enterprise networking (WAN/LAN, VLANs, routing, switching, firewalls, VPNs).
-
Strong hands-on experience with server hardware assembly, configuration, and maintenance.
-
Expert knowledge of storage (RAID, SAN/NAS) and backup/recovery solutions.
-
Experience with Linux server administration and troubleshooting.
-
Solid understanding of data center operations (power, cooling, security).
-
Hands-on experience provisioning and managing GPU infrastructure.
-
Scripting skills (Python, Bash) for automation.
-
Experience with Infrastructure-as-Code (Terraform, Ansible).
-
Strong problem-solving and troubleshooting skills for complex hardware and network issues.
-
Excellent documentation and communication skills.
-
Self-motivated and able to work independently in a fast-paced environment.