Logo for Interval Group

T3 Operations & Support Specialist — Compute & OS (PID9066)

Key Facts

Remote From: 
Fixed term
Expert & Leadership (>10 years)
English, German

Other Skills

  • Troubleshooting (Problem Solving)
  • Collaboration

Roles & Responsibilities

  • 5 to 10+ years in IT operations, service delivery or platform operations with demonstrated leadership in mission-critical environments
  • Proven experience implementing and leading Incident, Problem, Change and Release governance in production
  • Hands-on experience with VMware 8 virtualisation
  • Fluent English and German (C1 minimum in both)

Requirements:

  • Providing T3 operational ownership for Compute OS services: handling complex incidents, troubleshooting and RCA, and driving permanent fixes and preventive measures
  • Ensuring compute/OS readiness for releases and changes: monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks
  • Executing and improving standard operational procedures through automation to reduce toil and improve MTTR and stability
  • Coordinating with Kubernetes, Data, Network and Storage SMEs to resolve cross-domain production issues

Job description

This is a remote position.

T3 Operations & Support Specialist — Compute & OS (PID9066)

  • Contract / Freelance
  • Full-time
  • Remote with travel readiness required (Germany)
  • Start: ASAP

About the role

We are working with a long-standing anchor client to source a T3 Operations & Support Specialist (Compute & OS) for a large-scale cloud-native platform programme supporting a major energy transmission operator in Germany. The platform is a service-oriented hybrid cloud environment providing application teams with self-service capabilities to develop, run and operate software products across private and public cloud infrastructure.

In this role you will provide Tier-3 operational ownership for Compute & Operating System services within Local Production (DE), handling complex incidents, deep troubleshooting and root cause analysis, and driving permanent fixes and preventive measures.

What you'll be doing

  • Providing T3 operational ownership for Compute & OS services: handling complex incidents, troubleshooting and RCA, and driving permanent fixes and preventive measures
  • Ensuring compute/OS readiness for releases and changes: monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks
  • Executing and improving standard operational procedures through automation to reduce toil and improve MTTR and stability
  • Coordinating with Kubernetes, Data, Network and Storage SMEs to resolve cross-domain production issues
  • Validating deployment artefacts from an operations perspective and enforcing quality assurance measures
  • Monitoring system health, performance metrics and service availability across multi-tenant environments
  • Identifying, analysing and resolving incidents to minimise service disruption, and triggering RCA and corrective actions
  • Implementing monitoring and logging strategies to support audit and compliance requirements
  • Performing routine security scans and remediating identified vulnerabilities


Requirements

What you'll need

  • 5 to 10+ years in IT operations, service delivery or platform operations with demonstrated leadership in mission-critical environments
  • Proven experience implementing and leading Incident, Problem, Change and Release governance in production
  • Hands-on experience with VMware 8 virtualisation
  • Operating Systems: Red Hat Enterprise Linux and Ubuntu
  • OS tooling: Satellite, IPA, Certificate Server
  • ITSM/collaboration tooling: Jira Service Management, Jira, Confluence
  • Fundamental understanding of core operations processes (Incident, Change, Problem management, ITSM) and SRE concepts
  • Experience gathering operational insights from monitoring/observability including SLI/SLA/SLO management and tracking
  • Hands-on experience documenting procedures and enforcing clear runbooks and playbooks
  • Hands-on experience with monitoring and logging tools (e.g. Prometheus, Grafana, Datadog, Mimir, Loki)
  • Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to govern specialists
  • Fluent English and German (C1 minimum in both)

Desirable

  • Experience operating in regulated or high-availability industries (banking, telco, public sector, healthcare)
  • Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management
  • Familiarity with enterprise DevOps toolchains (GitLab, JFrog Artifactory, Backstage, Harness)
  • GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm)


Benefits

As a freelancer / contractor with us, you will enjoy flexible working hours and the freedom to choose your own projects. Our platform gives you access to exciting projects in various industries and supports you in advancing your career. You'll benefit from competitive pay and a dedicated team to help you with any questions you may have. Work independently and utilise our strong network to achieve your professional goals.

Operations Specialist Related jobs

Other jobs at Interval Group

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.