This is a remote position.
This is a full-time contract position offering a daily rate. The role involves managing a high-availability, on-premises production platform that serves as the primary host for mission-critical business applications, ensuring stability and performance within a private cloud environment.
Fluent German and English (C1 level) are required. Only occasional onsite visits in Germany
Platform Stability & Hosting: Ensure the high availability of on-premises infrastructure and consult on the seamless operation of productive business applications.
CI/CD Consulting: Provide expertise for CI/CD pipelines, validate deployment artifacts from an operations perspective, and ensure robust rollback strategies are in place.
Kubernetes Management: Monitor system health, performance metrics, and service availability across multi-tenant managed Kubernetes environments.
Incident & Problem Management: Identify and resolve technical incidents to minimise service disruption and lead root cause analysis for preventive actions.
Automation of Operations: Reduce operational toil by automating recurring standard processes following established software development lifecycles, including staging and validation reviews.
Security & Compliance: Implement monitoring and logging strategies to support audit requirements, perform routine security scans, and remediate vulnerabilities.
Lifecycle Maintenance: Execute routine updates, patches, and system optimisations within the local infrastructure.
Senior-level professional with a proven track record in operations management for private cloud solutions.
At least 5 years of operational experience with self-managed Kubernetes clusters and productive applications in on-premises environments.
Deep understanding of networking concepts, including protocols, load balancing, and infrastructure security.
Profound knowledge of CI/CD processes and tooling (such as GitLab, Jenkins, Tekton, and ArgoCD) and associated security assurance.
Fundamental understanding of core operations processes (ITSM) and Site Reliability Engineering (SRE) concepts.
Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Loki, Mimir).
Experience in gathering operational insights and managing SLI/SLA/SLO tracking.
Strong ability to document procedures and enforce clear operational runbooks.
Eligibility Residency in the EU, EEC, UK, or Switzerland.
As a freelancer / contractor with us, you will enjoy flexible working hours and the freedom to choose your own projects. Our platform gives you access to exciting projects in various industries and supports you in advancing your career. You'll benefit from competitive pay and a dedicated team to help you with any questions you may have. Work independently and utilise our strong network to achieve your professional goals.

UST HealthProof

BP

KeyBank

Guidehouse

Medtronic

Interval Group

Interval Group

Interval Group