Match score not available

AI Cloud k8s Infrastructure Engineer

EXTRA HOLIDAYS - FULLY FLEXIBLE
Remote: 
Full Remote
Contract: 
Salary: 
148 - 276K yearly
Experience: 
Senior (5-10 years)
Work from: 
California (USA), Washington (USA), United States

Offer summary

Qualifications:

BS or MS in CS/CE/EE or equivalent experience, At least 2 years of k8s experience on-prem and in AWS, At least 4 years building automation software for large scale computing clusters, Versatile with at least one programming language like Go or Python, Deep knowledge of networking fundamentals.

Key responsabilities:

  • Propose and create solutions to improve availability of the AI platform
  • Automate critical processes for distributed GPU clusters
  • Drive development of new SRE automation tools
  • Work on enhancing observability tooling
  • Impact the efficiency of the AV Perception team
NVIDIA logo
NVIDIA XLarge http://www.nvidia.com/
10001 Employees
See more NVIDIA offers

Job description

Logo Jobgether

Your missions

We are looking for a highly motivated AI cloud k8s infrastructure engineer to join our team in the fastest growing organization at NVIDIA. There is an excellent opportunity to architect and drive advancements in the SRE automation on the largest GPU clusters ! Please apply if you are passionate about Kubernetes, building infrastructure automation and deployment tools and working on new technologies and Cloud Native applications

What you'll be doing:

  • As part of Maglev infrastructure and SRE team you will propose and craft new ways to improve availability of our Cloud Native AI Platform by automating critical processes on the multiple distributed GPU clusters

  • The solutions you propose and build will impact directly efficiency of the AV Perception team!

  • You will be driving a new SRE automation tooling development and observability

What we need to see:

  • BS or MS in CS/CE/EE or equivalent experience

  • At least 2 years of the k8s experience on-prem and in the AWS

  • At least 4 years building automation software APIs for the large scale computing clusters and data platforms

  • You are versatile with at least one programming languages like: Go, Python

  • Complete understanding of the Kubernetes and Cloud Native Architecture

  • You have experience deploying services onprem/cloud and managing them

  • Deep knowledge of the networking fundamentals

  • Expertise at problem solving and complexity analysis of the distributed systems

  • Working experience with the observability tooling - prometheus, grafana

  • Proficiency with Linux environment

  • Excellent written and verbal social skills

Ways to stand out from the crowd:

  • You are a meticulous organizer with an ever positive, can-do attitude

  • You'll be a fun and hardworking teammate who enjoys a challenge and celebrates success

  • Previous experience with building sophisticated tooling and SRE automation on the large 100+ nodes GPU and CPU clusters

  • You have extensive experience across wide range of Observability solutions

For two decades, we have pioneered visual computing, the art and science of computer graphics. With our invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis and scientific research. Today, we stand at the beginning of the new AI computing era, ignited by a new computing model, GPU deep learning. This new model - where deep neural network is trained to recognize patterns from extensive amounts of data - has shown to be deeply effective at solving some of the most daring problems in everyday life.

The base salary range is 148,000 USD - 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Spoken language(s):
Check out the description to know which languages are mandatory.

Soft Skills

  • Organizational Skills
  • Teamwork
  • Problem Solving
  • Non-Verbal Communication

Cloud Engineer Related jobs