Key Facts

Remote From:

Full time

English

Hard Skills

OpenShift Linux Administration Kubernetes Virtualization Ceph (Software) Prometheus (Software) Containerization GlusterFS Software Maintenance Remote Server Management +22 more

Other Skills

•
Teamwork
•
Customer Service
•
Verbal Communication Skills
•
Problem Solving

Roles & Responsibilities

5+ years of advanced Linux administration and troubleshooting
5+ years managing RedHat OpenShift Kubernetes and Virtualization clusters
5+ years of experience managing infrastructure in high-performance computing environments, including configuration and troubleshooting
Experience with HPC schedulers (e.g., SLURM, Kubernetes, PBS, Run:ai) required

Requirements:

Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities
Plan and perform software and firmware maintenance activities
Assess customer environments for performance and design issues and propose resolutions
Work across technical teams to troubleshoot complex infrastructure issues

Job description

AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation.

At AHEAD, we prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD.

We are an equal opportunity employer, and do not discriminate based on an individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, marital status, or any other protected characteristic under applicable law, whether actual or perceived.

We embrace all candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD.

The High-Performance Computing Compute Engineer is primarily responsible for the overall health and maintenance of the physical cluster and server technologies in our managed services customer's environments. Our Compute Engineers are a valued member of the Managed Services Infrastructure Practice responsible for Tier 3 incident management, service request management and change management infrastructure support for all Managed Services customers.

Principal Duties and Responsibilities

Provide enterprise-level operational support to Managed Services customers for incident, problem, and change management activities

Plan and perform software and firmware maintenance activities

Assess customer environments for performance and design issues and propose resolutions

Work across technical teams to troubleshoot complex infrastructure issues

Create and maintain detailed documentation

Serve as a subject matter expert and escalation point for compute technologies

Work with vendors to resolve compute issues

Communicate with customers and internal team with transparency

Participate in on-call rotation

Completion of training and certification as assigned to further skills and knowledge

Education and Experience

Bachelor’s degree or equivalent Information Systems or related field. Unique education, specialized experience, skills, knowledge, training, or certification may be substituted for education

5+ years of advanced Linux administration and troubleshooting

5+ years managing RedHat OpenShift Kubernetes and Virtualization clusters

5+ years of expert level experience managing infrastructure in high-performance computing environments including configuration, troubleshooting, and best practice

2+ years of experience with Nvidia DGX preferred

Experience with HPC schedulers (e.g., SLURM, Kubernetes, PBS, Run:ai) required

Proficient in physical server environments

Experience configuring, maintaining and troubleshooting containers

Experience with storage technology (e.g., Ceph or Vast Data Platform) and distributed file systems (e.g., Lustre, GPFS, NFS, GlusterFS)

Experience with machine learning or data science workflows in HPC/AI environments

1+ years working with monitoring platforms (e.g., Prometheus, Grafana); Elastic Observability experience is a bonus

1+ years working with an enterprise ITSM system: Service Now is a bonus

Previous experience with automation tools such as Ansible, Puppet, or Chef a plus

Managed Services or consulting experience is required

Strong background with customer service

High level problem-solving and communication skills

Strong oral and written communications skills

Related Linux, Nvidia, Scheduler, Containerization, Virtualization, and Clustering certifications are a bonus

Why AHEAD:

Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.

We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.

India Employment Benefits include:

Comprehensive health insurance coverage for employees, with options to extend coverage to dependents

Paid time off and company holidays, along with additional leave benefits as per policy

Flexible work arrangements, supporting work-life balance

Learning and development opportunities to support continuous growth and upskilling

Employee wellness initiatives and programs focused on physical and mental well-being

Retirement and statutory benefits in line with India regulations

Inclusive and people-first culture, with a strong focus on collaboration and ownership

Ready to apply?

APPLY

Share ·

Related jobs

India

Marketing Operations Associate (AI-First)

30+ days ago

T-mapp

Full time

MarketingSalesforceMarketing AutomationMarketing AutomationACORD Forms

IRP - Surgical Technologist

3 days ago

Brigham and Women's Hospital

Part time

Sterile Techniques And ProceduresPatient SafetySurgical InstrumentsInternal DocumentationInfection Control

Lead Graphic Designer at Social Discovery Group

21 days ago

Social Discovery Group

Full time

Graphic DesignAI/ML InferenceMarketing Performance Measurement And ManagementDigital Asset ManagementFigma (Design Software)

Mental Health Therapist for Seniors (Remote - Licensed in Mississippi)

30+ days ago

Sailor Health

Full time

Evidence-Based PracticeTelehealthGerontologyTreatment PlanningMedical Licensing (Health Law)

Senior Sales Engineer - Southwest (NV)

30+ days ago

Oasis Security

Full time

Cloud ComputingSales EngineeringCloud Management PlatformsIdentity And Access ManagementSales Management

Other jobs at AHEAD

Project Coordinator - ServiceNow

Today

AHEAD

Full time
Mid-level (2-5 years)

ServiceNowProject CoordinationServiceNowSmartsheetMicrosoft Project

ServiceNow Project Manager

Today

AHEAD

Full time

ServiceNowProject ManagementScrum (Software Development)Conflict ResolutionRisk Management

Senior Engineer - Privileged Access Management

7 days ago

AHEAD

Full time
Senior (5-10 years)

Privileged Access ManagementTrust AdministrationScriptingSession Manager SubSystemsPassword Management

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.

Sr Engineer -Compute

Key Facts

Hard Skills

Other Skills

Roles & Responsibilities

Requirements:

Job description

Related jobs

Marketing Operations Associate (AI-First)

IRP - Surgical Technologist

Lead Graphic Designer at Social Discovery Group

Mental Health Therapist for Seniors (Remote - Licensed in Mississippi)

Senior Sales Engineer - Southwest (NV)

Other jobs at AHEAD

Project Coordinator - ServiceNow

ServiceNow Project Manager

Senior Engineer - Privileged Access Management

We help you get seen. Not ignored.

Auto-Apply

AI Match Feedback