This role is responsible for designing, implementing, managing, and optimizing the infrastructure, automation pipelines, and workflows that support the entire lifecycle of software development, data processing, Analytics and machine learning model deployment. This individual will be a key technical expert ensuring the reliability, scalability, efficiency, and speed of our development, data, analytics, and ML operations, fostering collaboration between teams and promoting best practices across DevOps, DataOps, and MLOps domains.
Here's what you'll be doing:
-
Design, build, and maintain robust CI/CD pipelines for software applications, data transformations (ETL/ELT), and machine learning models (training, validation, deployment).
-
Implement and manage Infrastructure as Code (IaC) using tools like Terraform to ensure reproducible and scalable environments (cloud or on-premise).
-
Develop and automate data quality checks, data pipeline monitoring, and alerting systems within the DataOps framework.
-
Establish and manage MLOps workflows including experiment tracking, model versioning, automated model retraining, and performance monitoring (drift, bias detection).
-
Implement comprehensive monitoring, logging, and alerting solutions across all systems and pipelines (applications, data flows, ML models).
-
Collaborate closely with software developers, data engineers, data scientists, and analysts to understand their needs and provide operational support and tooling.
-
Champion and enforce best practices in security, reliability, and performance across all operational domains.
-
Troubleshoot and resolve complex infrastructure, pipeline, and deployment issues.
-
Evaluate and recommend new tools and technologies to improve operational efficiency and capabilities.
These objectives are not exhaustive and will evolve according to identified needs and current projects.