Match score not available

GPU Performance Engineer

72% Flex
UNLIMITED HOLIDAYS - EXTRA HOLIDAYS - EXTRA PARENTAL LEAVE - LONG REMOTE PERIOD ALLOWED
Remote: 
Full Remote
Contract: 
Work from: 
New York (USA)

Offer summary

Qualifications:

M.Sc./Ph.D. in computer science or software engineering, Strong GPU optimization and programming skills, Experience with CUDA, Triton, and MLIR.

Key responsabilities:

  • Develop and optimize GPU code for real-world applications
  • Contribute to product roadmap for performance enhancements
  • Communicate effectively with remote team members
Adaptive ML logo
Adaptive ML Information Technology & Services Startup https://www.adaptive-ml.com/
11 - 50 Employees
See more Adaptive ML offers

Job description

Logo Jobgether

Your missions

About the team

Adaptive is helping companies build singular generative AI experiences by democratizing the use of reinforcement learning. We are building the foundational technologies, tools, and products required for models to learn directly from users' interactions and for models to self-critique and self-improve from simple written guidelines. Our tightly-knit team was previously involved in the creation of state-of-the-art open-access large language models such as Falcon-180B. We have closed a $20M seed with Index & ICONIQ, and are looking forward to shipping a first version of our platform, Adaptive Engine, in early 2024.

Our Technical Staff is responsible for building the foundational technology powering Adaptive, in line with requests and requirements identified by our Product and Commercial Staff. We strive to build excellent, robust, and efficient technology, and to conduct at-scale, honest research with high-impact for our roadmap and customers.

About the role

As a GPU Performance Engineer in our Technical Staff, you will help ensure that our LLM stack (Adaptive Harmony) delivers state of the art performance across a wide variety of settings; including in latency-bound regimes where serving requests with sub-second response times is key, to throughput-bound regimes during training and offline inference. You will help build the foundational technology powering Adaptive by delivering performance improvements directly to our clients as well as to our internal workloads.

Some examples of tasks you will encounter during your work:

  • Profile and iterate GPU inference kernels in Triton or CUDA, identifying memory bottlenecks and optimizing latency—and decide how to adequately benchmark an inference service;

  • Systematically identify and eliminate synchronization points between the CPU and GPU, enabling asynchronous communication of results from Python workers to our Rust backend;

  • Work with quantization methods to minimize the memory footprint of our models;

  • Modify existing implementation of kernels to support requested features, and efficiently implement novel operations entirely from scratch.

We are looking for self-driven, intense individuals, who value technical excellency, honesty, and growth.

Your responsibilities

Generally,

  • Build and maintain fast and robust GPU code, focusing on delivering performance improvements in real world applications;

  • Contribute to our product roadmap, by identifying promising trends that can improve performance;

  • Report clearly on your work to a distributed collaborative team, with a bias for asynchronous written communication.

On the engineering side,

  • Write high-quality software in CUDA and/or Triton, with a focus on performance and robustness;

  • Profile dedicated GPU kernels in CUDA or Triton, optimizing across latency/compute-bound regimes for complex workloads.

Your (ideal) background

The background below is only suggestive of a few pointers we believe could be relevant; we welcome applications from candidates with diverse backgrounds, do not hesitate to get in touch if you think you could be a great fit even if the below doesn't fully describe you.

  • A M.Sc./Ph.D. in computer science, or demonstrated experience in software engineering, preferably with a focus on GPU-optimization;

  • Strong programming skills, preferably with a focus on systems and general purpose GPU programming;

  • Contributions to relevant open-source projects, such as CUTLASS, Triton and MLIR;

  • A track record of writing high performance kernels, preferably demonstrated ability to reach state of the art performance on well defined tasks;

  • Passionate about the future of generative AI, and eager to build foundational technology to help machines deliver more singular experiences.

Benefits
  • Comprehensive medical (health, dental, and vision) insurance;

  • 401(k) plan with 4% matching (or equivalent);

  • Unlimited PTO — we strongly encourage at least 5 weeks each year;

  • Mental health, wellness, and personal development stipends;

  • Visa sponsorship if you wish to relocate to New York or Paris.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Soft Skills

  • Self-Driven
  • Honesty
  • Growth Mindset

Go Premium: Access the World's Largest Selection of Remote Jobs!

  • Largest Inventory: Dive into the world's largest remote job inventory. More than half of these opportunities can't be found on standard platforms.
  • Personalized Matches: Our AI-driven algorithms ensure you find job listings perfectly matched to your skills and preferences.
  • Application fast-lane: Discover positions where you rank in the TOP 5% of applicants, and get personally introduced to recruiters with Jobgether.
  • Try out our Premium Benefits with a 7-Day FREE TRIAL.
    No obligations. Cancel anytime.
Upgrade to Premium

Find other similar jobs