NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We are looking to grow our company and establish teams with the most thoughtful people in the world.
NVIDIA DGX, HGX, and MGX systems deliver the world's leading solutions for enterprise AI infrastructure at scale.
We are looking for a talented and experienced RAS (Reliability, Availability, and Serviceability). Architect. You will be responsible for improving the reliability of NVIDIA GPU and Grace systems by designing, architecting, and implementing robust RAS features. You will work closely with cross-functional teams, including hardware engineers, system architects, and software developers, to create architecture that meets stringent reliability requirements and deliver exceptional customer experiences. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement.
What you will be doing:
Release engineering and branch management for the software and firmware releases.
Providing tested releases of software and firmware to partners.
Development of release plans in collaboration with various stakeholders.
Documentation of release notes and effective communication to internal and external collaborators
Management of entire release process, including branch management and all aspects of software version control.
Development of CI pipelines for downstream firmware release processes.
What we need to see:
BS, MS, or PhD in EE/CS or related field of education (or equivalent experience) with 8+ years of experience
Proven experience in Release management and Release automation of product quality system software.
Experience in software building and build management – for official downstream software release candidates and formal releases.
Excellent problem-solving skills, attention to detail, and the ability to analyze complex system-level issues.
You should possess excellent written and oral communication skills, excellent work ethics, a deep sense of teamwork, love to produce quality work and commitment to finishing your tasks every single day.
A self-starter who loves to find creative solutions to complicated problems.
Ways to stand out from the crowd:
Proven record of doing Release engineering for multiple customers
Familiarity with QA of platform software for server platforms.
Defect triaging and defect management experience of system software releases. Having released for x86_64 and arm64 architectures
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you are creative and autonomous, we want to hear from you!
The base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.