Offer summary

Qualifications:

Strong problem-solving skills in PyTorch, CUDA, and distributed systems., Experience training large models using Python and PyTorch, including data processing and inference., Proficiency in profiling and optimizing CPU and GPU code, with familiarity in tools like Nvidia Nsight., Knowledge of high-performance parallel C++, Triton, and writing custom PyTorch kernels..

Key responsibilities:

Implement efficient models and systems for data processing, training, and deployment.

Optimize and troubleshoot system performance bottlenecks in memory, speed, and utilization.

Collaborate with research teams to ensure system efficiency from start to finish.

Develop tools for dataset visualization, evaluation, and filtering.

Job description

We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs.

Responsibilities

Ensure efficient implementation of models & systems for data processing, training, inference and deployment
Identify and implement optimization techniques for massively parallel and distributed systems
Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing highperformance CUDA, Triton, C++ and PyTorch code
Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish
Build tools to visualize, evaluate and filter datasets
Implement cuttingedge product prototypes based on multimodal generative AI