Data is a fundamental layer in Luma that unlocks advanced capabilities in our foundation models. We tackle the fundamental data questions around how different modalities can be combined to enable new behaviors and capabilities, working on the openended challenges of what makes multimodal AI systems truly powerful and versatile.
Identify capability gaps and research solutions
Design datasets and datamixture ablations to systematically improve model capabilities across vision, audio, and language
Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
Create prototypes and demonstrations that showcase new multimodal capabilities
Strong programming skills in Python and PyTorch
Experience with largescale dataset
Experience with multimodal data processing pipeline
Understanding of computer vision, audio processing, and or natural language processing techniques
(Preferred) Expertise working with interleaved multimodal data
(Preferred) Handson experience with Vision Language Models, Audio Language Models, or generative video models
Genesis Tech
Michelin
Wander
NRG Energy
NinjaOne