Logo for Intuition Machines

Senior Site Reliability Engineer

Roles & Responsibilities

  • Expert in Kubernetes and multi-cloud environments, including virtual networking, load balancing, and web application firewall (WAF).
  • Backend software engineering background in Kubernetes-based systems; strong programming skills in Python, JavaScript, Go, C++, or Rust.
  • Strong understanding of networking, proxies, and content delivery networks (e.g., Cloudflare); experience with monitoring applications, infrastructure and networks; CI/CD expertise.
  • Six+ years of hands-on experience in engineering, DevOps, or SRE; familiarity with distributed systems (queue-first architectures, sharding); demonstrated ability to gather requirements, problem-solve, and make recommendations; familiarity with security frameworks and attack vectors preferred.

Requirements:

  • Develop and operate solutions for performance, availability, security, and cost-efficiency for large-scale, multi-cloud systems handling millions of requests per second.
  • Improve releases and deployment processes to ensure performance gains across quality, security, uptime, speed-to-deliver, threat detection, and customer engagement.
  • Collaborate across infrastructure, data, and application layers to keep systems fast and Dev teams productive; translate customer and internal feedback into prioritized improvements.
  • Make rapid, data-driven decisions and directly create value to enhance the customer experience and overall system reliability.

Job description

Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the hCaptcha security suite. Our approach is simple: low overhead, small teams, and rapid iteration.


As a Senior Site Reliability Engineer, you will focus on engineering solutions related to performance, availability, security, and cost-effectiveness. We consider these non-functional features to be core requirements for us and our customers. You will work at multiple layers of our internet-scale system (infrastructure, data, application logic) and build the solutions.

What you will do:

  • Work with large-scale systems (handling millions of requests per second, serving millions of users, across multiple cloud providers).
  • Develop solutions to enhance performance, availability, security, and cost-effectiveness.
  • Keep us up, keep us fast, and keep our dev teams productive ensuring that every peer release improves performance across the spectrum including quality, security, uptime, speed-to-deliver, threat detection, and customer engagement.
  • Source improvement ideas, priority and capabilities from customers, the internal community, new and existing system metrics. Make decisions rapidly.
  • Be creative and desire an environment where you can directly create value and be a force to improve the experience for our customers.

What we are looking for:

  • Expert in Kubernetes.
  • Expert in monitoring applications, infrastructure and network. 
  • Background in software engineering with expertise in backend development within Kubernetes-based systems.
  • Strong programming skills in one or more of the following languages: Python, JavaScript, Go, C++, Rust.
  • Strong understanding and experience in networking, proxies, content delivery networks (Cloudflare)
  • Multi cloud experience including virtual networking, load balancing, web application firewall.
  • Strong experience with CI/CD.
  • Hands-on experience in development and orchestration within high-scale, high-uptime, and high-reliability environments.
  • Minimum of six years of hands-on experience in related roles (engineering, DevOps, SRE).
  • Familiarity with distributed systems, including queue-first architectures and sharding.
  • Demonstrated engineering expertise, including gathering requirements, problem-solving, and making recommendations.
  • Preferred: Familiarity with security frameworks, attack vectors, botnets, and impact analysis.


What we offer:

  • Fully remote position with flexible working hours.
  • An inspiring team of colleagues spread all over the world.
  • Pleasant, modern development and deployment workflows: ship early, ship often.
  • High impact: lots of users, happy customers, high growth, and cutting edge R&D.
  • Flat organization, direct interaction with customer teams.

We celebrate equality of opportunity and are committed to creating an inclusive environment for all team members. 

Join us as we transform cybersecurity, user privacy, and machine learning online!

Please note that all positions require pre-employment screening, including third-party verification of work history, education, and identity, as well as a final in-person interview and identity verification step, which will be conducted in your country of residence.

Site Reliability Engineer (SRE) Related jobs

Other jobs at Intuition Machines

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.