Introduction
You will play a major role in the evolution of non-relational data stores and big data platforms, based on technologies such as Hadoop and Spark. You will apply your software engineering expertise to large and long-lived data platforms, high-throughput ingestion pipelines, performance-critical access patterns, and demanding reliability requirements. Your work will directly support the operation, monitoring, and analysis of particle accelerator systems through the management of multi-petabyte datasets accumulated over many years.
Functions
-
Drive the evolution of the CERN Accelerator Archival system (NXCALS).
-
Design and develop the core components of the system, including ingestion pipelines (ETL), metadata services, data compaction mechanisms, data extraction algorithms, and APIs.
-
Collaborate with different user communities to define and promote best practices for using NXCALS in the development of control applications for the CERN Control Centre.
-
Work closely with the CERN IT department to select and validate evolution of the underlying storage technologies (e.g. HDFS, ClickHouse).
-
Contribute to the operation, maintenance, and user support of the system.
-
Keep watch on relevant big-data technologies and assess their applicability to NXCALS.
-
Mentor and technically support junior software engineers contributing to these activities.
-
Contribute to the development of other Controls data engineering platforms according to overall priorities.