Logo for Partnerize

Lead Site Reliability Engineer (Terraform/Linux)

Roles & Responsibilities

  • Extensive hands-on experience managing hybrid on-premise and AWS environments with containerisation using Docker, Kubernetes, and Argo Workflows
  • Strong expertise in data pipelines and storage (MongoDB, Snowflake) and data ingestion layers (Airflow, S3), with experience supporting Konnecto technology stack
  • Proficiency in automation and IaC (Python or Bash, Terraform, Ansible) and GitHub workflows, including familiarity with AI coding tools
  • Deep knowledge of security, observability (Prometheus, Grafana, Loki), incident management, and Linux system administration; ability to design threat modelling and security strategy

Requirements:

  • Design and rollout a robust containerisation strategy (Docker/Kubernetes) to enable self-service deployment and empower engineering teams
  • Define Service Level Indicators (SLIs), set Service Level Objectives (SLOs), manage error budgets, and lead incident response, blameless post-mortems, and RCA
  • Lead hybrid platform engineering including on-prem and AWS, drive integration and modernisation of Konnecto's architecture, and optimise cloud costs (FinOps)
  • Mentor and develop the SRE team, manage workload prioritisation, plan capacity, and drive security architecture, threat modelling, and CI/CD improvements

Job description

Who We Are

At Partnerize, we're on a mission to transform the way businesses grow. We've built the leading partnership automation platform that empowers brands to discover, engage, and convert their audiences at scale. From affiliate marketing to influencer collaborations, we help our clients build and manage profitable partnerships that drive real results. We're a team of passionate problem-solvers who are dedicated to helping our clients win in the ever-evolving world of digital marketing.

Why Join Us

We're looking for passionate, talented people who want to be part of a winning team. At Partnerize, you'll find a culture of collaboration, innovation, and respect. We're guided by our core values, and we're committed to creating an environment where everyone can do their best work. We also offer a competitive salary, generous benefits, and a flexible work environment that allows you to thrive both personally and professionally. If you're ready to grow your career and make a difference, we'd love to hear from you.

Job Summary:

This is a captivating and exciting time to join Partnerize. We are at a pivotal point in our tech progression, looking to significantly expand our technical estate, scale the platform, and replace existing legacy systems with modern solutions. You will play a vital role in the ongoing operationalisation and management of our entire platform portfolio found in our on-prem datacentres and AWS cloud: Partnerize, BrandVerity, Ascend, and our recent acquisition, Konnecto. While there will be a key focus on integrating and supporting Konnecto's advanced data and AI layers, a critical pillar of your mission will be spearheading the development of our enterprise on-prem containerisation solution. This initiative is designed to fundamentally shift our engineering culture towards a "you build it, you own it" model. By providing robust, automated container platforms, you will empower our Engineering teams to deploy quickly and independently, significantly reducing the bottleneck created by relying solely on the TechOps department.

We are looking for a Lead SRE who is both a deep technical expert and a capable mentor. In this role, your primary responsibility is ensuring our diverse, hybrid systems remain available, scalable, and secure. You will act as an authoritative Subject Matter Expert (SME), championing developer autonomy, driving IT systems security policies, and working closely with the security compliance team to protect our platforms from threats while driving continuous integration and delivery.This role will report into the SRE and Application Manager providing them with technical guidance and recommendations whilst being the technical lead for the SRE team.

 

The Team

You will be responsible for ensuring the continuous development and progression of team members. We are looking for a player/coach who can mentor, empower, and up-skill talent. We have a mix of technical generalists, specialists, and junior engineers; you will help identify their strengths and constructively develop areas of weakness, guiding their technological career paths as we transition to a DevOps-centric operating model.

 

The Operational Reality

You will operate in a fast-paced, high-velocity environment where your work directly and visibly shapes the company's architectural future. This requires a highly adaptable and pragmatic leader who can balance strategic project delivery with hybrid-estate maintenance. By applying modern incident management frameworks to troubleshooting and ticket management, you are responsible for ensuring all issues across our estate are addressed decisively and efficiently.

 

As a Lead SRE, You Will:

Strategic & Operational Management

  • Developer Empowerment&Containerisation
    • Collaborate on the design, build, and rollout of a robust containerisation strategy (Kubernetes/Docker). Your goal is to assist in delivering a platform that enables Engineering teams to take full ownership of their code from build to deployment.
  • Reliability & Error Budgets
    • Define Service Level Indicators (SLIs), set Service Level Objectives (SLOs), and manage Error Budgets to expertly balance feature velocity with platform stability.
  • Hybrid Platform Engineering & Konnecto
    • Build software and systems to manage platform infrastructure across on-prem and AWS. Take the lead technical role in integrating and modernising Konnecto's architecture, ensuring its data ingestion and AI logic layers scale securely.
  • FinOps / Cloud Cost Optimisation
    • Manage, monitor, and optimise cloud infrastructure spend across our hybrid environments, ensuring architectural decisions are both highly performant and cost-effective.
  • CI/CD Pipeline Responsibility
    • Responsible for the continuous improvement, continuous delivery, and continuous integration pipelines to facilitate rapid engineering velocity.

 

People Leadership & Talent Development

  • Mentorship
    • Deliver coaching sessions to the team and individuals, acting as a technical escalation point and fostering a culture of knowledge sharing.
  • Workload Management
    • Scope the work coming into the SRE team, prioritise hybrid-estate maintenance vs. project delivery, and delegate tasks to team members to ensure prompt resolution.

 

Security & Architecture

  • Design&Threat Modelling
    • Produce production-grade application security designs. Perform design reviews and threat modelling of our services and products.
  • Security Strategy
    • Drive improvements to Partnerize platforms' security through strategic planning, vulnerability assessments, and security testing.

 

Incident Management&Toil Reduction

Toil Reduction Champion toil reduction through automation, continually identifying manual, repetitive operational work and engineering it out of existence.

 

Post-Mortems & Escalation

Act as the ultimate escalation point for complex support incidents, participate in the On-Call rotation, lead blameless post-mortems, conduct Root Cause Analysis (RCA), and aggressively track metrics like Mean Time To Recovery (MTTR).

 

General Duties

  • Consulting&Planning
    • Participate in system design consulting, platform management, and capacity planning.
  • Escalation Support
    • Act as the ultimate escalation point for complex support incidents and assignments while maintaining a high level of quality.
  • On-Call
    • Participate in the On-Call Rotation.

 

Essential Knowledge, Skills and Experience

Core Competencies

  • Technical Ability
    • Highly proficient SME capable of reliably applying technical methods, leading cultural technical shifts (e.g., DevOps adoption), and supporting the development of new skills in colleagues.
  • Problem Solving&Decision Making
    • Capable of making decisions quickly and decisively, weighing options, and approaching problems methodically and innovatively.
  • Communication&Influence
    • Effectively communicates initiatives to all stakeholders and is capable of procuring buy-in for key transformational projects (like containerisation rollouts).

 

Technical Competencies

  • Cloud, Hybrid & Containerisation
    • Essential knowledge of hybrid architectures, managing both AWS and on-premise environments. Extensive hands-on experience designing and managing advanced containerisation environments using Docker, Kubernetes, and Argo Workflows to enable developer self-service.
  • Konnecto Tech Stack & Data Pipelines
    • Proven experience managing modern storage layers and databases, specifically MongoDB and Snowflake. Experience supporting complex data ingestion layers, including clickdata streams, S3 raw/parsed ingestion, and Airflow ETL.
  • Programming & Automation
    • Experience in automation languages (Python orBash).
    • Deep understanding of GitHub and experience implementing or working alongside AI coding tools and practices.
    • Knowledge of Infrastructure as Code (Terraform, Ansible).
  • Security&Observability
    • Experience with security in a DevOps environment. Experience managing observability stacks (e.g., Prometheus, Grafana, and Loki).
  • Operations&Troubleshooting
    • Exceptional Linux system administration skills. Highly proficient in troubleshooting, diagnosing, and independently solving issues using modern incident management frameworks.

 

Desirable Knowledge, Skills and Experience

The following skills or experiences are advantageous but not strictly required

  • Innovation&Debt Management
    • A keen interest in new technologies, specifically supporting development teams in the refactoring of technical debt.
  • Legacy Databases
    • Strong experience with relational databases (MySQL, PostgreSQL, Redis).
  • Data Streaming
    • Experience with data streaming and queuing technologies, specifically Apache Kafka and Druid.
  • Web&Storage
    • Knowledge of Nginx (or other web server technologies) and storage technologies like Gluster.

 

UK Benefits & Perks

  • 25 days holiday in addition to bank holidays 
  • Enhanced Parental Leave: 6 months full pay for birth parent, 4 weeks non-birth parent at full pay after one year employment
  • 5 extra 'Partnerize Parental Days' each year
  • Private Medical Insurance through Vitality 
  • Enhanced pension contributions
  • Cycle to Work scheme 
  • Eye Care Vouchers 
  • Life Assurance
  • Enhanced Wellness Program including access to EAP, Wellness Coaching & Wellness Fridays program
  • Regular company events and activities

Our Commitment to Diversity & Inclusion

We are committed to attracting, developing, and advancing our outstanding team members, regardless of race, ethnic identity, sexual orientation, religion, age, gender, gender identity, physical abilities, or any other dimension of diversity. We strive to foster an environment where people can be their authentic selves, raise concerns and innovate, all without fear; where they are treated fairly and respectfully, have equal access to opportunities and resources and can contribute fully to the organization’s success. Every individual in our business is expected to live this commitment without exception.

Privacy and data protection. The data collected as part of this application will be used for the recruitment process and any subsequent employment. You can find further information in Partnerize privacy policy here: https://partnerize.com/privacy-policy/

Notice to Recruiters and Staffing Agencies: To protect the interests of all parties, Partnerize will not accept unsolicited resumes from any source other than directly from a candidate or an approved vendor that has a written and signed agreement in place with Partnerize. Please do not contact or forward resumes to our company employees or locations. Any unsolicited resumes will be considered Partnerize property. Partnerize is not responsible for any charges or fees related to unsolicited resumes.

Site Reliability Engineer (SRE) Related jobs

Other jobs at Partnerize

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.