Research ScientistEngineer – Multimodal Capabilities

Work set-up: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Proficiency in Python and PyTorch programming., Experience with multimodal data processing and dataset curation., Understanding of computer vision, audio processing, and natural language processing., Preferred experience with multimodal models like Vision Language or Audio Language Models..

Key responsibilities:

  • Collaborate with the team to identify capability gaps and research solutions.
  • Design datasets, experiments, and methodologies to enhance multimodal model capabilities.
  • Develop evaluation frameworks and benchmarks for multimodal AI systems.
  • Create prototypes and demonstrations showcasing new multimodal functionalities.

Luma AI logo
Luma AI https://lumalabs.ai/dream-machine
11 - 50 Employees
See all jobs

Job description

About the Role

The Multimodal Capabilities team at Luma focuses on unlocking advanced capabilities in our foundation models through strategic research into multimodal understanding and generation. This team tackles fundamental research questions around how different modalities can be combined to enable new behaviors and capabilities, working on the openended challenges of what makes multimodal AI systems truly powerful and versatile.

Responsibilities
  • Collaborate with the Foundation Models team to identify capability gaps and research solutions

  • Design datasets, experiments, and methodologies to systematically improve model capabilities across vision, audio, and language

  • Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities

  • Create prototypes and demonstrations that showcase new multimodal capabilities

    • Experience
      • Strong programming skills in Python and PyTorch

      • Experience with multimodal data processing pipelines and largescale dataset curation

      • Understanding of computer vision, audio processing, and or natural language processing techniques

      • (Preferred) Expertise working with interleaved multimodal data

      • (Preferred) Handson experience with Vision Language Models, Audio Language Models, or generative video models

    Required profile

    Experience

    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Collaboration

    Researcher Related jobs