Match score not available

Research ScientistEngineer – Multimodal Capabilities

Work set-up:

Full Remote

Contract:

Work from:

California (USA)

Offer summary

Qualifications:

Proficiency in Python and PyTorch programming., Experience with multimodal data processing and dataset curation., Understanding of computer vision, audio processing, and natural language processing., Preferred experience with multimodal models like Vision Language or Audio Language Models..

Key responsibilities:

Collaborate with the team to identify capability gaps and research solutions.
Design datasets, experiments, and methodologies to enhance multimodal model capabilities.
Develop evaluation frameworks and benchmarks for multimodal AI systems.
Create prototypes and demonstrations showcasing new multimodal functionalities.

Luma AI https://lumalabs.ai/dream-machine

11 - 50 Employees

Job description

About the Role
The Multimodal Capabilities team at Luma focuses on unlocking advanced capabilities in our foundation models through strategic research into multimodal understanding and generation. This team tackles fundamental research questions around how different modalities can be combined to enable new behaviors and capabilities, working on the openended challenges of what makes multimodal AI systems truly powerful and versatile.
Responsibilities
Collaborate with the Foundation Models team to identify capability gaps and research solutions
Design datasets, experiments, and methodologies to systematically improve model capabilities across vision, audio, and language
Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
Create prototypes and demonstrations that showcase new multimodal capabilities
Experience
Strong programming skills in Python and PyTorch
Experience with multimodal data processing pipelines and largescale dataset curation
Understanding of computer vision, audio processing, and or natural language processing techniques
(Preferred) Expertise working with interleaved multimodal data
(Preferred) Handson experience with Vision Language Models, Audio Language Models, or generative video models

Required profile

Experience

Spoken language(s):

English

Check out the description to know which languages are mandatory.

Hard Skills

Natural Language Processing (NLP)Computer Vision PyTorch (Machine Learning Library)Python (Programming Language)Audio Signal Processing Data Curation Language Model Large Language Modeling

Other Skills

Collaboration

Are you interested?

Share

Researcher Related jobs

Threat Researcher (Spam analysis)_Sophos Labs

Threat Researcher (Spam analysis)_Sophos Labs

Threat Researcher (Spam analysis)_Sophos Labs

Today

Sophos

Full time

Malware AnalysisScriptingThreat DetectionAnti-Spam Techniques

Principal Research Scientist, EBeam Lithography

Principal Research Scientist, EBeam Lithography

Principal Research Scientist, EBeam Lithography

9 days ago

Natcast.org

Full time

Data Researcher Caller Scheduler Hybrid Role

Data Researcher Caller Scheduler Hybrid Role

Data Researcher Caller Scheduler Hybrid Role

Today

Unison Site Management

Full time

Sales ManagementLead Generation

Field Researcher-Boston

Field Researcher-Boston

Field Researcher-Boston

1 day ago

CoStar Group

Full time

Data Collection

Research Intern

Research Intern

Research Intern

10 days ago

iHeartMedia

Part time

See more Researcher jobs