Logo for ENSEK

Senior Site Reliability Engineer

Roles & Responsibilities

  • 5+ years of SRE or senior DevOps experience operating cloud-native services at scale
  • Deep experience with AWS (compute, networking, storage, and managed services)
  • Infrastructure as Code: Terraform, CloudFormation or equivalent
  • Kubernetes and related tooling (Helm, operators, service mesh or similar)

Requirements:

  • Platform reliability: Design, build and operate highly available systems on AWS with defined SLIs/SLOs
  • Automation tooling: Automate deployment, scaling, recovery and operational runbooks using IaC and CI/CD pipelines
  • Observability: Implement and maintain metrics, tracing and logging for end-to-end visibility and data-driven decisions
  • Incident management: Lead post-incident reviews, root cause analysis and remediation to prevent recurrence

Job description

About Ensek

Ensek builds the cloud‑native SaaS software that’s transforming how energy retailers operate, innovate and manage at scale. We help retailers lower operating costs, improve billing accuracy for consumers, and enhance customer experience through automation and AI‑driven insight, all underpinned by modern, cloud‑native architecture.

Ensek is at an exciting inflection point as we scale at pace towards new international horizons. If you’re driven by solving complex, real‑world problems and want to build reliable, resilient infrastructure that accelerates the global energy transition, you’ll feel right at home with us.

About the role

As we transition to a truly product‑led organisation, SRE becomes the pulse of engineering — the centre of excellence for reliability, monitoring, and observability.

You’ll shape our new foundational platform, harden its resilience, and ensure it consistently meets the expectations of our customers. You’ll automate away manual toil, streamline operations, and build the systems and tooling that allow engineering teams to move faster with confidence.

Alongside the new platform, you’ll also take ownership of our existing estate — tuning, optimising, and evolving it to modern standards. This is a hands‑on role embedded deeply in the engineering community, operating with a product mindset and delivering value just like any other high‑performing team

Key responsibilities:
  • Platform reliability: Design, build and operate highly available systems and services on public cloud (primarily AWS), ensuring SLIs/SLOs are defined and met.

  • Automation & tooling: Automate deployment, scaling, recovery and operational runbooks using Infrastructure as Code and CI/CD pipelines.

  • Observability: Implement and maintain metrics, tracing and logging to provide end‑to‑end visibility, reduce MTTD/MTTR and enable data‑driven decisions.

  • Incident management: Lead post‑incident reviews, root cause analysis and remediation to prevent recurrence and share learnings across teams.

  • Security & compliance: Collaborate with InfoSec to embed secure configuration, secrets management and compliance controls into platform lifecycle.

Key outcomes:
  • Stable, observable platform: Services meet agreed SLAs/SLOs with clear dashboards, playbooks and automated remediation where appropriate.

  • Reduced incident impact: Measurable reductions in MTTD/MTTR and clear evidence of prevention from RCA actions.

  • Broad adoption of SRE practices: Cross‑team improvements in reliability, testing and operational readiness guided by SRE principles.

Experience required:
  • 5 years + of SRE or senior DevOps experience operating cloud‑native services at scale

  • Deep experience with public cloud platforms (AWS) including compute, networking, storage and managed services.

  • Infrastructure as Code: Practical knowledge of Terraform, CloudFormation or equivalent.

  • Container orchestration: Strong experience with Kubernetes and related ecosystem tools (Helm, operators, service mesh or similar).

  • Observability tooling: Experience implementing logging, metrics, tracing (Datadog or similar).

  • Automation & CI/CD: Skill with automation frameworks and pipeline tooling (Github, Azure DevOps or similar).

  • Programming & scripting: Comfortable writing code or scripts (Python, Go, Bash) to automate tasks and build tools.



Company benefits
  • 25 days’ holiday + bank holidays

  • Option to buy or sell 5 extra annual leave days per year

  • Vitality Health Insurance, including private healthcare, virtual GP access and mental‑health support

  • Pension with 5% matched contribution

  • Regular team‑wide and company‑wide events

  • 2 volunteering days per year

  • Remote‑first working environment with offices in London and Nottingham

Site Reliability Engineer (SRE) Related jobs

Other jobs at ENSEK

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.