This is a remote position.
CI/CD Support & Operational Readiness: Lead the validation of deployment artifacts from an operational standpoint, define quality assurance measures, and ensure robust rollback strategies and observability frameworks are active for production deployments.
Platform Operations & Incident Management: Monitor system health, performance metrics, and service availability across multi-tenant environments to maintain maximum stability for the managed Kubernetes platform.
Problem Resolution: Direct root cause analyses and implement corrective and preventive actions to resolve incidents swiftly, minimizing service disruption.
Automation & Lifecycle Validation: Reduce operational toil by automating recurring remedial processes and validating all procedures following structured software development lifecycles, including staging and testing reviews.
Security & Compliance Enforcement: Implement comprehensive logging and monitoring strategies to meet audit requirements, perform routine security scans, and remediate platform vulnerabilities.
Senior Kubernetes Platform Experience: At least 5 years of dedicated operational experience working with self-managed Kubernetes clusters, self-managed services providing clusters, and productive on-premise applications.
CI/CD & GitOps Mastery: Profound knowledge and implementation experience with continuous integration and delivery processes, workflows, and modern tooling (such as GitLab, Jenkins, Tekton, Argo Workflows, or Argo CD) along with security assurance practices.
Networking Architecture: Deep structural understanding of networking concepts, including protocols, load balancing, and infrastructure security.
ITSM & SRE Principles: Fundamental comprehension of core operations processes (incident, change, and problem management) alongside advanced Site Reliability Engineering concepts.
Observability & Monitoring: Hands-on experience with logging and monitoring tools (such as Prometheus, Grafana, Datadog, Mimir, and Loki) to gather operational insights, manage, and track SLIs, SLAs, and SLOs.
Technical Documentation: Proven experience documenting technical procedures cleanly and enforcing actionable runbooks or playbooks for engineering teams.
Language Skills: Professional proficiency in both spoken and written English and German (at least C1 level for both).
Eligibility: Residency and right to work in the EU, EEA, UK, or Switzerland.

ZayZoon

PradeepIT Consulting Services Pvt Ltd

Berkeley Square IT

HighLevel

Interval Group

Interval Group

Interval Group

Interval Group