Logo for Interval Group

Observability & Operations Engineer PID9064

Key Facts

Remote From: 
Fixed term
Mid-level (2-5 years)
English, German

Other Skills

  • Troubleshooting (Problem Solving)

Roles & Responsibilities

  • At least 3 years of operational experience managing Kubernetes clusters
  • Experience with logging and monitoring ecosystems like Prometheus and Grafana
  • Deep understanding of core networking concepts
  • Professional fluency in both English and German

Requirements:

  • Validate deployment artifacts and define quality assurance measures
  • Oversee system health and service availability across environments
  • Rapidly identify and resolve platform incidents with root-cause analyses
  • Implement monitoring strategies to meet compliance audits

Job description

This is a remote position.


Contract opportunity for a Observability & Kubernetes Operations Engineer to optimize platform reliability and manage large-scale cloud infrastructure. In this role, you will take ownership of system visibility and operational readiness by scaling enterprise monitoring tools, streamlining CI/CD pipelines, and maintaining multi-tenant container stability.

  • Position Type: Contract (1 FTE)

  • Compensation: Daily rate available

  • Location: Remote (with occasional onsite visits in Germany)

  • Language Requirement: English and German fluent


Responsibilities

  • CI/CD Support & Operational Readiness: Validate deployment artifacts from an operational standpoint, define quality assurance measures, and guarantee robust rollback strategies and observability are live for production environments.

  • Platform Operations & Incident Management: Oversee system health, performance metrics, and service availability across multi-tenant environments, ensuring maximum platform stability and minimal service disruption.

  • Problem Resolution: Rapidly identify, analyse, and resolve platform incidents, triggering detailed root-cause analyses and rolling out long-term preventative actions.

  • Automation & SRE Implementation: Mitigate operational toil by automating recurring standard procedures and validating all code updates through testing and staging lifecycles.

  • Security & Compliance Enforcement: Implement robust monitoring and logging strategies to meet compliance audits, conduct routine security scans, and remediate platform vulnerabilities.




Requirements

  • Kubernetes Platform Operations: At least 3 years of deep operational experience managing self-managed Kubernetes clusters and running production applications within on-premise environments.

  • Observability & Tool Administration: Hands-on experience with the administration, operation, and consumption of logging and monitoring ecosystems (such as Prometheus, Grafana, Datadog, Mimir, Loki, and OpenTelemetry collectors).

  • Networking Architecture: Deep structural understanding of core networking concepts, including enterprise protocols, load balancing, and network security.

  • CI/CD & GitOps Integration: Profound knowledge of building continuous integration and delivery processes using modern tooling (such as GitLab, Jenkins, Tekton, Argo Workflows, or Argo CD) alongside relevant security checks.

  • ITSM & SRE Principles: Fundamental comprehension of core IT Service Management processes (incident, change, and problem management) combined with practical Site Reliability Engineering concepts.

  • SLO Tracking & Metrics: Proven experience extracting actionable operational insights from platform data, including defining, tracking, and managing SLIs, SLAs, and SLOs.

  • Technical Documentation: Experienced in cleanly mapping out operational topics, authoring technical documentation, and maintaining actionable team runbooks or playbooks.

  • Language Skills: Professional fluency in both spoken and written English and German (at least C1 level for both).

  • Eligibility: Residency and right to work in the EU, EEA, UK, or Switzerland.



  • Benefits

    As a freelancer / contractor with us, you will enjoy flexible working hours and the freedom to choose your own projects. Our platform gives you access to exciting projects in various industries and supports you in advancing your career. You'll benefit from competitive pay and a dedicated team to help you with any questions you may have. Work independently and utilise our strong network to achieve your professional goals.

    Field Engineer (Solutions) Related jobs

    Other jobs at Interval Group

    We help you get seen. Not ignored.

    We help you get seen faster — by the right people.

    🚀

    Auto-Apply

    We apply for you — automatically and instantly.

    Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

    AI Match Feedback

    Know your real match before you apply.

    Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

    Upgrade to Premium. Apply smarter and get noticed.

    Upgrade to Premium

    Join thousands of professionals who got noticed and hired faster.