Logo for Thoughtworks

Service Reliability Engineer

Roles & Responsibilities

  • Hands-on experience in programming and scripting languages such as Python, Go or Bash
  • Good understanding of at least one Public Cloud (AWS, Azure, GCP)
  • Exposure to observability tools such as Grafana, Datadog, NewRelic, ELK Stack, Dynatrace
  • Familiar with DevOps and GitOps practices

Requirements:

  • Provide operational support for large-scale distributed environments and debug production issues
  • Quickly diagnose and resolve issues to minimize downtime and impact to users
  • Handle production incidents, managing incident communication with clients and drafting RCA documents
  • Monitor and ensure that technical/business expectations of deliverables are met on projects

Job description

As a Consultant Service Reliability Engineer (SRE) you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives.

Job responsibilities

  • You will provide operational support for large-scale distributed environments and debug production issues across services and stack levels.
  • You will quickly diagnose and resolve issues to minimize downtime and impact to users.
  • You will do troubleshooting and investigation across databases, web services and applications.
  • You will handle production incidents, managing incident communication with clients and help in drafting RCA documents.
  • You will respond and communicate over incidents (incident management).
  • You will monitor and ensure that technical/business expectations of deliverables are consistently met on projects.
  • You will share ideas with appropriate team members, stakeholders and leaders to facilitate further discussion and exploration.
  • You will ensure the development and maintenance of positive relationships with internal peers and other colleagues, in ways that help to deliver strategic objectives.
  • You will adjust and suggest innovative solutions to current constraints and business policies.

Job qualifications

Technical Skills

  • You have hands-on experience in programming and scripting languages such as Python, Go or Bash.
  • You have a good understanding of at least one Public Cloud (AWS, Azure, GCP) .
  • You have had exposure to observability tools such as Grafana, Datadog, NewRelic, ELK Stack, Dynatrace or equivalent and you are proficient in using data from these tools to dissect and identify root causes of system and infrastructure issues .
  • You are familiar with DevOps and GitOps practices .
  • You have a good knowledge of container-based architecture and orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc.
  • You understand technical architecture and modern design patterns, including microservices, serverless functions, NoSQL and RESTful APIs, with experience in fixing bugs, analyzing logs, building metrics and operational dashboards.
  • You are familiar with creating infrastructure resources for improving reliability of system that follows Cloud’s Well Architected Framework principles: Reliability, security, cost optimization, performance efficiency and operational.

Professional Skills

  • You are able to understand customer requirements with respect to infrastructure, monitoring, CI/CD, devops, etc.
  • You have good articulation skills and ability to communicate technical matters to non-technical individuals .
  • You have the ability to work in close communication with engineering teams to identify single points of failure and other high-risk architecture issues by proposing and implementing resilient solutions.
  • You are interested in learning new concepts and keeping your skills and knowledge up-to-date.
  • You stay informed about new and emerging thoughts in your technical space.
  • You are willing to be part of a rotation- and need-based 24x7 available team.

Other things to know

Learning & Development

There is no one-size-fits-all career path at Thoughtworks: however you want to develop your career is entirely up to you. But we also balance autonomy with the strength of our cultivation culture. This means your career is supported by interactive tools, numerous development programs and teammates who want to help you grow. We see value in helping each other be our best and that extends to empowering our employees in their career journeys.

About Thoughtworks

Thoughtworks is a dynamic and inclusive community of bright and supportive colleagues who are revolutionizing tech. As a leading technology consultancy, we’re pushing boundaries through our purposeful and impactful work. For 30+ years, we’ve delivered extraordinary impact together with our clients by helping them solve complex business problems with technology as the differentiator. Bring your brilliant expertise and commitment for continuous learning to Thoughtworks. Together, let’s be extraordinary.

#LI-Remote

Site Reliability Engineer (SRE) Related jobs

Other jobs at Thoughtworks

We help you get seen. Not ignored.

We help you get seen faster β€” by the right people.

πŸš€

Auto-Apply

We apply for you β€” automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.