Key Facts

Remote From:

Anywhere

Full time

English

Hard Skills

Other Skills

•
Collaboration
•
Communication
•
Adaptability
•
Dealing With Ambiguity

Job description

About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do

Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
- Quantization (fp16, bf16, int8, fp8)
- KV-cache optimization & reuse
- Speculative decoding, batching, and streaming
- Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have

Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Why Join Us

Real ownership over performance-critical systems
Direct impact on product reliability and unit economics
Close collaboration with research, infra, and product
Competitive compensation + meaningful equity at Series A
A team that cares about engineering quality, not hype

Ready to apply?

APPLY

Share ·

Machine Learning Engineer Related jobs

Worldwide Machine Learning Engineer

Senior Machine Learning Engineer I

30+ days ago

Parexel

Full time

Natural Language Processing (NLP)Machine LearningDeep LearningPython (Programming Language)Data Structures

Senior Staff Engineer, Machine Learning

30+ days ago

Nagarro

Full time

Machine LearningPython (Programming Language)KubernetesProof Of Concept (POC) DevelopmentRoot Cause Analysis

Senior Data Engineer- AI/ML (Remote)

30+ days ago

Ad Hoc LLC

Fixed term

MLOps (Machine Learning Operations)PyTorch (Machine Learning Library)Python (Programming Language)EmbeddingMLflow

Staff Software Engineer, Machine Learning Infrastructure

30+ days ago

Clarifai

Full time

Lifecycle ManagementScalabilityOpen Source DevelopmentDev TestingPerformance Improvement

Machine Learning Engineer II

30+ days ago

Parexel

Full time

Natural Language Processing (NLP)Machine LearningPython (Programming Language)Deep LearningData Structures

Other jobs at Featherless AI

Senior Software Engineer - API Gateway

30+ days ago

Featherless AI

Full time

Node.js (Javascript Library)Application Programming Interface (API)KubernetesObservabilityApplication Programming Interface (API)

Developer Relations Associate/Intern (Partnerships) Boston-Based

30+ days ago

Featherless AI

Internships
120 - 120K

JavaScript (Programming Language)API TestingPython (Programming Language)EcologyCloud Computing

Developer Relations (DevRel)

30+ days ago

Featherless AI

Full time
Senior (5-10 years)
250 - 250K

Large Language ModelingCommunity DesignDevelopment SupportCustomer Success ManagementBusiness Analysis

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.

Machine Learning Engineer — Inference Optimization

Key Facts

Hard Skills

Other Skills

Job description

About the Role

What You’ll Do

What We’re Looking For

Nice to Have

Why Join Us

Machine Learning Engineer Related jobs

Senior Machine Learning Engineer I

Senior Staff Engineer, Machine Learning

Senior Data Engineer- AI/ML (Remote)

Staff Software Engineer, Machine Learning Infrastructure

Machine Learning Engineer II

Other jobs at Featherless AI

Senior Software Engineer - API Gateway

Developer Relations Associate/Intern (Partnerships) Boston-Based

Developer Relations (DevRel)

We help you get seen. Not ignored.

Auto-Apply

AI Match Feedback