Site Reliability Engineer (SRE) - grok.com & API

Work set-up: 
Full Remote
Contract: 
Work from: 
United Kingdom

Offer summary

Qualifications:

Strong expertise in Kubernetes and container orchestration., Experience with continuous deployment tools like Buildkite and ArgoCD., Proficiency in monitoring technologies such as Prometheus, Grafana, and PagerDuty., Knowledge of infrastructure as code tools like Pulumi or Terraform..

Key responsibilities:

  • Maintain and improve backend services powering grok.com and the API.
  • Ensure services are scalable and reliable to handle high query volumes.
  • Collaborate with team members across locations to coordinate development efforts.
  • Participate in technical interviews and contribute to engineering excellence.

xAI logo
xAI Information Technology & Services Startup https://x.ai/
11 - 50 Employees
See all jobs

Job description

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the team

You will work on the team that is responsible for the backend services that power grok.com and our API. Our team is currently based primarily in London with a small but growing number of engineers located in Palo Alto. We focus on writing highly scalable and reliable services that can efficiently process tens of thousands of queries per second. The services are hosted on a number of Kubernetes clusters (on-prem & cloud).

About the role

An ideal candidate meets at least the following requirements:

  1. Expert knowledge of Kubernetes,
  2. Expert knowledge of continuous deployment systems such as Buildkite and ArgoCD,
  3. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty,
  4. Expert knowledge of infrastructure as code technologies such as Pulumi or Terraform.
Location

We hire engineers in London and in Palo Alto. We usually work from the office 5 days a week but allow for work-from-home days when required. Candidates joining the London team must be willing to attend late meetings at least once a week to coordinate with the rest of our team.

Interview process

After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview (“phone interview”) during which a member of our team will ask some basic technical questions. If you clear the initial phone interview, you will enter the main process, which consists of two technical interviews.

All interviews will be conducted via Google Meet.

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer.

California Consumer Privacy Act (CCPA) Notice

Required profile

Experience

Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Prioritization
  • Strong Work Ethic
  • Communication

Site Reliability Engineer (SRE) Related jobs