Team Lead, Site Reliability Engineering

Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Minimum of 3 years experience leading a technical team., Experience with Google Cloud and Infrastructure as Code (IaC) tools like Terraform., Strong knowledge of microservices, containers (Kubernetes, Docker), and networking., Hands-on Linux systems administration and experience with service mesh and PKI..

Key responsibilities:

  • Lead and mentor a team of Site Reliability Engineers to ensure operational excellence.
  • Oversee incident management, SLA adherence, and workload prioritization.
  • Collaborate on designing, deploying, and maintaining large-scale distributed systems.
  • Partner with AI/ML teams to ensure infrastructure readiness for data pipelines and model training.

Pythian logo
Pythian SME https://www.pythian.com
201 - 500 Employees
See all jobs

Job description

Team Lead, Site Reliability Engineering
Regular travel to Brighton required | United Kingdom | Remote | Work from Home

Why Pythian:
At Pythian, we are experts in strategic database and analytics services, driving digital transformation and operational excellence. Pythian, a multinational company, was founded in 1997 and started by ensuring the reliability and performance of missioncritical databases. We quickly earned a reputation for solving tough data challenges. We were there when the industry moved from onpremises to cloud environments, and as enterprises sought more from their data, we expanded our competencies to include advanced analytics.

Today, we empower organizations to embrace transformation and leverage advanced technologies, including AI, to stay competitive. We deliver innovative solutions that meet each client’s data goals and have built strong partnerships with Google Cloud, AWS, Microsoft, Oracle, SAP, and Snowflake. The powerful combination of our extensive expertise in data and cloud and our ability to keep on top of the latest bleeding edge technologies make us the perfect partner to help mid and largesized businesses transform to stay ahead in today’s rapidly changing digital economy.

Why You:
Pythian is building a nextgeneration Site Reliability Engineering team, and we’re looking for a talented, and experienced Team Lead who thrives in fastpaced, problemsolving environments.

As a Team Lead, you’ll be responsible for leading a team of site reliability engineers that are designing, deploying, and operating largescale distributed systems across compute, storage, networking, and AIML environments. You will act as the primary technical escalation point, oversee daytoday operational delivery, mentor and coach team members, and ensure adherence to SLAs and quality standards. You may also directly contribute to delivery by leading projects from architecture to automation to intelligent monitoring, collaborating with both clients and teammates to build resilient, highperforming infrastructure.

If this is you, and you wonder what it would be like to work at Pythian, reach out to us and find out! Intrigued to see what a life is like at Pythian? Check out #pythianlife on LinkedIn!


What you will be doing:
  • Team Leadership & Operational Management:
  • Lead and mentor a team of Site Reliability Engineers to ensure technical excellence, timely resolution of incidents, and professional growth of team members.
  • Oversee queue management, ticket prioritization, and workload distribution to meet SLA and utilization targets.
  • Act as the primary point of contact for critical escalations and severity1 incidents, providing guidance and technical direction.
  • Conduct performance reviews, and knowledgesharing sessions to strengthen the team’s capabilities.
  • Collaborate with management on performance metrics, process adherence, and resource planning.
  • Sets specific goals and objectives for team members as part of Pythian’s goal planning program. Provides guidance to team members in regards to training opportunities as part of Pythian’s selfdirected training program. Meets regularly with team members for oneonone sessions to disseminate information and gain feedback on opportunities for improvement.
  • Technical Responsibilities:
  • Operate and optimize Kubernetes clusters, Istio service mesh, and Linuxbased systems.
  • Automate workflows using Go, Python, and Shell scripting.
  • Build monitoring and observability solutions with Prometheus, Grafana, and Loki.
  • Troubleshoot complex networking, storage, and system performance issues.
  • Partner with AIML teams to ensure infrastructure readiness for model training and data pipelines.

  • What you bring:
  • A minimum of 3 years previous experience leading a team.
  • Experience with Google Cloud, plus IaC tools (Terraform).
  • Strong knowledge of microservices, containers (Kubernetes, Docker), and networking.
  • Handson experience with PKI, service mesh, and Linux systems administration.
  • SRE mindset with a focus on automation, scalability, and reliability.

  • What you get in return:
  • Love your career: Competitive total rewards package. Blog during work hours. Hone your skills or learn new ones with our substantial training allowance; participate in professional development days, attend training, become certified, whatever you like!
  • Love your worklife balance: Flexibly work remotely from your home, there’s no daily travel requirement to an office! All you need is a stable internet connection.
  • Love your coworkers: Collaborate with some of the best and brightest in the industry!
  • Love your workspace: We give you all the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment!
  • Love yourself: Pythian cares about the health and wellbeing of our team. You will have an annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more). Additionally, you will receive a generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity.
  • Required profile

    Experience

    Level of experience: Senior (5-10 years)
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Other Skills

    • Team Leadership
    • Collaboration
    • Communication
    • Mentorship
    • Problem Solving

    Site Reliability Engineer (SRE) Related jobs