Site Reliability Engineer

fully flexible
Work set-up: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Proficiency in at least one programming language, preferably Go., Experience supporting large-scale, mission-critical applications in a production environment., Knowledge of infrastructure as code tools like Terraform, and container orchestration such as Kubernetes or Docker., Understanding of cloud platforms (Azure, AWS, or GCP) and microservices architecture..

Key responsibilities:

  • Design and develop custom software to improve system reliability.
  • Collaborate with engineering teams to embed reliability principles.
  • Respond to critical incidents and troubleshoot production issues.
  • Develop automation tools and promote best practices for system reliability.

Okta logo
Okta Computer Software / SaaS XLarge https://www.okta.com
5001 - 10000 Employees
See all jobs

Job description

Get to know Okta

Okta is The World’s Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences. 

Join our team! We’re building a world where Identity belongs to you.

At Auth0, we provide an unparalleled authentication experience for hundreds of millions of users worldwide. Our commitment to reliability is a key foundation of our product and our dedication to exceeding customer availability expectations is a core engineering focus. As a mid-level Site Reliability Engineer, you'll join our SRE team based in Europe to ensure our production systems are not only operational but also resilient, scalable, and ready for exponential growth. This isn't just about keeping the lights on; it's about directly contributing to the platform's core resiliency and robustness. You'll be a hands-on builder, crafting solutions that make our system more reliable by design.

What you’ll do:

  • Design and build custom software in Go to enhance the platform's reliability, resiliency, and redundancy.
  • Partner with engineering teams to embed reliability principles, improving the availability, performance, and observability of our services.
  • Use your deep understanding of infrastructure and observability principles to identify opportunities for improvement within the product and implement solutions.
  • Contribute to our on-call rotation, providing rapid, effective response to critical incidents and using your expertise to troubleshoot, mitigate or accurately escalate production issues.
  • Develop and refine our SRE tooling and processes, focusing on automation and operational efficiency.
  • Define, document, and champion reliability best practices across the organisation.

What you'll need to be successful:

This role requires a unique blend of a software engineer's mindset and operational expertise. You'll thrive in this role if you have:

  • A proactive and systematic approach to problem-solving, with a high degree of ownership.
  • Proven experience in a production environment supporting large-scale, mission-critical applications with a high degree of autonomy.
  • Proficiency in at least one programming language, with a strong preference for Go. You should be comfortable writing custom applications, not just scripts.
  • Experience with infrastructure as code (Terraform), container orchestration (Kubernetes, Docker) and GitOps (ArgoCD).
  • Demonstrable expertise in a major cloud provider (Azure, AWS, or GCP).
  • A strong grasp of microservices architecture, databases (SQL, NoSQL), and networking fundamentals, so you can understand how custom code can solve platform-level issues.
  • An understanding of core SRE principles, including SLIs, SLOs, and error budgets.
  • Experience in an on-call rotation for a 24/7 cloud-based environment.
  • Exceptional communication and collaboration skills, with a proven ability to work effectively in a remote, distributed team, where tasks may be self-driven.

We're looking for someone who is not just looking for a job, but a career-defining opportunity to tackle complex challenges at a massive scale. If you're a curious and motivated engineer who's passionate about building reliability directly into the platform, we'd love to hear from you.

#LI-Remote

What you can look forward to as a Full-Time Okta employee!

Okta cultivates a dynamic work environment, providing the best tools, technology and benefits to empower our employees to work productively in a setting that best and uniquely suits their needs. Each organization is unique in the degree of flexibility and mobility in which they work so that all employees are enabled to be their most creative and successful versions of themselves, regardless of where they live. Find your place at Okta today! https://www.okta.com/company/careers/.

Some roles may require travel to one of our office locations for in-person onboarding.

Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.

If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.

Okta is committed to complying with applicable data privacy and security laws and regulations. For more information, please see our Privacy Policy at https://www.okta.com/privacy-policy/

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Industry :
Computer Software / SaaS
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs