Domain Lead Site Reliability Management (REF4372N)

Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Proven experience in site reliability engineering or related IT roles., Strong understanding of enterprise-scale systems and modern cloud technologies like AWS, Docker, Kubernetes., Educational background in Computer Science, Information Technology, or related fields., Leadership skills with the ability to guide and mentor technical teams..

Key responsibilities:

  • Lead and develop the SRE organization to ensure system stability and reliability.
  • Implement proactive monitoring and prevention strategies using AI and observability tools.
  • Drive modernization efforts by adopting cloud-native and microservices architectures.
  • Collaborate across teams to embed a culture of reliability and continuous improvement.

Deutsche Telekom IT Solutions HU logo
Deutsche Telekom IT Solutions HU XLarge https://www.deutschetelekomitsolutions.hu/
5001 - 10000 Employees
See all jobs

Job description

Company Description

The largest ICT employer in Hungary, Deutsche Telekom IT Solutions (formerly ITServices Hungary, ITSH) is a subsidiary of the Deutsche Telekom Group. Established in 2006, the company provides a wide portfolio of IT and telecommunications services with more than 5000 employees. ITSH was awarded with the Best in Educational Cooperation prize by HIPA in 2019, acknowledged as one of the most attractive workplaces by PwC Hungary’s independent survey in 2021 and rewarded with the title of the Most Ethical Multinational Company in 2019. The company continuously develops its four sites in Budapest, Debrecen, Pécs and Szeged and is looking for skilled IT professionals to join its team.

Job Description

Lead the Future of Site Reliability Engineering for Telekom IT

At Deutsche Telekom IT Solutions, we run the beating heart of the TSystems internal software landscape — from missioncritical legacy platforms to cuttingedge cloudnative microservices. Now we’re looking for a handson, technologydriven leader to take charge of our entire SRE organization.

Your mission:
Build and guide a team focused on one clear goal — preventing problems before they happen. You’ll develop a proactive reliability culture across hundreds of applications, ensuring stability while driving modernization.

The road ahead:

  • AI in practice — embed intelligent detection and prevention into operations
  • Observability & reporting — transform raw data into actionable insights
  • Cloud & container orchestration — accelerate adoption of modern platforms
  • Diverse tech portfolio — from proven monoliths to 70%+ modern microservices, and growing
    • Who we’re looking for:
      A leader who can move between strategy and execution with ease. Someone who understands enterprisescale systems, enjoys modern technology, and knows how to engage and guide people — a facilitator, mentor, and problemsolver in one.
      You’ll be comfortable working with technologies such as AWS or other hyperscalers, Docker, Kubernetes, OpenShift, observability tools like Prometheus, Grafana, ELK Stack, and automationCICD with Jenkins, GitLab CICD, Ansible, Terraform.

      Why join us:

      • Influence the reliability approach for one of Europe’s largest tech landscapes
      • Work across a varied ecosystem — no two days are the same
      • Drive meaningful change with the resources and stability of Deutsche Telekom
        • If you’re ready to lead SRE at scale — with modern tools, talented teams, and a clear mission — let’s talk.

          Additional Information

          * Please be informed that our remote working possibility is only available within Hungary due to European taxation regulation.

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Mentorship
  • Problem Solving
  • Leadership

Site Reliability Engineer (SRE) Related jobs