Platform Engineer / SRE (Kubernetes)

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

6+ years of hands-on experience in Platform Engineering, DevOps, or SRE roles., 3+ years operating large-scale on-prem or self-managed Kubernetes clusters in production., Deep understanding of Kubernetes control-plane components and service mesh technologies., Proficiency in Infrastructure as Code and GitOps workflows..

Key responsibilities:

  • Operate and manage self-hosted Kubernetes clusters at scale across multiple sites.
  • Serve as a subject-matter expert on Kubernetes internals and deliver proactive support.
  • Design and automate Day-2 operational workflows and lead technical engagements.
  • Collaborate with customer stakeholders and internal teams as part of a 24/7 high-availability model.

Portainer.io logo
Portainer.io http://www.portainer.io
51 - 200 Employees
See all jobs

Job description

As part of the global Platform Engineering team at Portainer, this role is critical to ensuring the reliability, scalability, and efficiency of large-scale, self-managed Kubernetes environments across customer data centers. You’ll be working directly with customer platform teams to operate and improve their Kubernetes estate, enhance observability and automation, and extend platform capabilities via Portainer and complementary tools. This is a high-impact role that blends deep infrastructure knowledge, cloud native expertise, and a DevOps/SRE mindset to support mission-critical systems across global time zones.

We’re looking for someone who’s done the hard yards - not just operating Kubernetes, but engineering it at scale. You’ve faced real-world incidents, solved complex infrastructure problems, and carry the kind of experience that only comes from owning production systems end-to-end. This role demands practical expertise earned through building, breaking, and hardening Kubernetes platforms in demanding environments.

Requirements

  • Operate and manage self-hosted Kubernetes clusters at scale (5,000+ nodes per region) across multiple sites.
  • Serve as a subject-matter expert on Kubernetes internals, delivering proactive support, performance tuning, and architectural recommendations.
  • Enable and extend platform tooling using Portainer, integrating it with identity, observability, and lifecycle management systems.
  • Design and automate Day-2 operational workflows including node lifecycle, network overlays, and storage provisioning.
  • Lead technical engagements such as architecture reviews, operational readiness assessments, and incident postmortems.
  • Build and maintain IaC pipelines and GitOps patterns using tools like Terraform, ArgoCD, and Flux.
  • Troubleshoot and resolve advanced infrastructure issues related to scheduling, networking, DNS, ingress, and runtime isolation.
  • Contribute to internal reusable tooling, engineering standards, and automation frameworks.
  • Collaborate with customer stakeholders and internal technical teams across time zones as part of a 24/7 high-availability model.

Skills & Qualifications:

  • 6+ years of hands-on experience in Platform Engineering, DevOps, or SRE roles.
  • 3+ years operating large-scale on-prem or self-managed Kubernetes clusters in production.
  • Deep understanding of Kubernetes control-plane components (API server, etcd, controller-manager, scheduler).
  • Experience with Portainer or other Kubernetes platform management tools (e.g., Rancher, Lens, OpenShift).
  • Proficiency in service mesh technologies such as Istio and Envoy.
  • Demonstrable experience in Go is a strong advantage; particularly in building custom Kubernetes operators or contributing upstream (e.g., submitting PRs to Kubernetes core or CNCF projects).
  • Advanced skills in Infrastructure as Code (Terraform, Helm, Kustomize) and GitOps workflows.
  • Solid knowledge of CNI plugins (e.g., Cilium, Calico), ingress controllers, and CSI drivers.
  • Scripting and automation using Python, Ansible, Terraform, or Bash.
  • Familiarity with observability tooling (Prometheus, Grafana, Loki, VictoriaMetrics, Mimir, etc.).
  • Strong grasp of reliability engineering principles: SLOs, SLIs, chaos testing, and scaling patterns.

Benefits

Portainer is a leading tech company offering a broad benefits package including a highly competitive salary and the ability to work anywhere in the world while still being part of a dynamic team taking on some of the most interesting challenges in the technology/infrastructure space.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Problem Solving

Platform Engineer Related jobs