XG
Master Thesis (all genders) - Semantic 4D Occupancy Forecasting
XITASO GmbH IT & Software Solutions
Karlsruhe, GermanyPosted 2 days agoInternship
Job details
Company
XITASO GmbH IT & Software Solutions
Location
Karlsruhe, Germany
Employment type
Internship
Seniority
Intern
Primary category
General And Other R And D And Science
Posted date
5 May 2026
Valid through
Job description
Abstract
Semantic 4D occupancy forecasting is vital for safe autonomous driving, allowing vehicles to anticipate future scene dynamics and geometry. However, training state-of-the-art models relies heavily on fully supervised methods that require massive, prohibitively expensive dense 3D voxel annotations.To overcome this data bottleneck, cutting-edge research is shifting towards self-supervised and weakly-supervised paradigms that leverage pre-trained 2D foundation models (e.g., DINOv2, CLIP, or SAM). By aligning these rich, open-vocabulary 2D semantic features with 3D/4D spatial representations using advanced Transformer architectures, it is possible to achieve robust spatial-temporal understanding without dense 3D ground truth.
Building upon these breakthroughs, this Master's thesis focuses on developing a foundation-model-aligned framework for vision-based 4D occupancy forecasting. You will design an architecture that distills rich multi-view semantics into a 4D forecasting pipeline, bridging the gap between scalable camera-only inputs and high-fidelity environment prediction.
For outstanding results, we actively encourage and support submissions to top-tier conferences.
These tasks interest you
- Develop a Transformer-based network for predicting future semantic 4D occupancy from sequential multi-view camera inputs using weak or self-supervision.
- Build and train the PyTorch pipeline, designing alignment mechanisms to distill semantic features from 2D foundation models into your 4D spatial-temporal representation.
- Benchmark against fully-supervised baselines on large-scale datasets (e.g., nuScenes, OpenOccupancy), focusing on forecasting accuracy (IoU), semantic precision, and label efficiency.
That makes you stand out
- You are registered in a master's program in computer science, artificial intelligence, robotics, or a related field.
- You have excellent programming skills in Python as well as solid experience with deep learning frameworks (especially PyTorch).
- You have a solid background in 3D computer vision. Practical experience with semantic segmentation, occupancy networks, or 3D Gaussian splatting is a major plus.
- You have knowledge of Vision Transformers (ViT), Foundation Models (DINO, CLIP), and paradigms of self- and weakly-supervised learning.
- You work independently and are solution-oriented, highly motivated, and have very good German and English skills (at least C1 level) to ensure clear and confident communication within the team and with our partners.
Your contact person
Daniela+49 821 885882-0
work@xitaso.com