Role overview

Qualifications

Strong background in machine learning research with emphasis on training dynamics and optimization
Experience training large neural networks (LLMs, multimodal models, or large sequence models)
Publication experience in ML venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, COLM, arXiv) or equivalent high-quality open research
Proficiency in Python and PyTorch; solid understanding of optimization theory and practice, backpropagation, gradient flow, training stability

Responsibilities

Design and evaluate training optimization techniques for large models (e.g. optimization algorithms, schedulers, normalization, curriculum strategies)
Improve training efficiency and stability across long runs and large datasets
Research and implement methods such as optimizer and scheduler innovations, mixed-precision and memory-efficient training, gradient noise reduction, convergence analysis, training-time regularization and robustness
Run large-scale experiments, analyze results, translate findings into actionable improvements, and publish results; collaborate with infrastructure and inference teams to ensure training decisions translate to real-world performance

Key facts

Remote from: Anywhere
Full time
Researcher
English

Hard skills

Convex Optimization Engineering Optimization Advanced Distributed Learning Curriculum Development Open-Channel Flow Python (Programming Language) Data Normalization Analytics Controlled Experiments Noise Reduction Robustness Testing PyTorch (Machine Learning Library) Hybrid Systems Data-Driven Decision Making

Other skills

Report Writing
Scheduling
Collaboration
Communication
Analytical Skills

About the company

Featherless AI

Artificial Intelligence & Machine Learning Services

We enable serverless inference via our GPU orchestration and model load-balancing system. We unlock fine-tuning by enabling organizations to size their server fleet to throughput needs, not number of models in the catalogue. See it in action on our public cloud, which offers inference for 4,200+ open weight models.

Company details

IndustryArtificial Intelligence & Machine Learning Services

Company size1 - 10

Links

Website LinkedIn See all jobs

Your match analysis

See how your profile stacks up against this role.

We compared the job requirements to your profile to show where you're strong and where you fall short.

Job description

About the Role

We’re looking for an AI Researcher focused on training optimization to help us push the efficiency, stability, and scalability of large-scale model training. You’ll work at the intersection of research and systems, developing novel techniques to reduce training cost, accelerate convergence, and improve model quality—while validating ideas through rigorous experiments and publications.

This role is ideal for someone who enjoys turning research insights into practical training wins, and who has a track record (or strong ambition) of publishing applied ML research.

What You’ll Work On

Design and evaluate training optimization techniques for large models (e.g. optimization algorithms, schedulers, normalization, curriculum strategies)
Improve training efficiency and stability across long runs and large datasets
Research and implement methods such as:
- Optimizer and scheduler innovations
- Mixed-precision, low-precision, and memory-efficient training
- Gradient noise reduction, scaling laws, and convergence analysis
- Training-time regularization and robustness techniques
Run large-scale experiments, analyze results, and translate findings into actionable improvements
Author or co-author research papers, technical reports, or blog posts
Collaborate closely with infrastructure and inference teams to ensure training decisions translate to real-world performance

What We’re Looking For

Strong background in machine learning research, with emphasis on training dynamics and optimization
Experience training large neural networks (LLMs, multimodal models, or large sequence models)
Publication experience in ML venues (e.g. NeurIPS, ICML, ICLR, ACL, EMNLP, COLM, arXiv) or equivalent high-quality open research
Solid understanding of:
- Optimization theory and practice
- Backpropagation, gradient flow, and training stability
- Distributed and large-batch training
Proficiency in Python and modern ML frameworks (PyTorch preferred)
Ability to independently design experiments and reason from data

Nice to Have

Experience with non-standard architectures (e.g. RNN variants, long-context models, hybrid systems)
Experience optimizing training on GPUs at scale (FSDP, ZeRO, custom kernels)
Contributions to open-source ML or research codebases
Comfort operating in fast-moving, ambiguous startup environments

Why This Role

Real influence over core model training decisions
Freedom to pursue and publish novel research
Direct access to large-scale experiments and real production constraints
A small, senior team that values thinking deeply and shipping thoughtfully

Apply once. Then go straight to the hiring manager.

After you apply, unlock the direct contact details of the people who actually make the call. A quick follow-up makes you 5x more likely to land an interview.