Match score not available

Lead Site Reliability Engineer

extra holidays
Remote: 
Full Remote
Contract: 
Salary: 
4 - 19K yearly
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

5+ years as a Site Reliability Engineer or similar, 2+ years leading an SRE squad, Proficient in Terraform for managing cloud infrastructure, Strong expertise in system architecture and Kubernetes, Excellent command of English, verbal and written.

Key responsabilities:

  • Lead and mentor the SRE team
  • Identify initiatives to improve efficiency in SRE operations
  • Advocate for cost management and propose optimization solutions
  • Automate systems using industry tools like Terraform
  • Build and maintain SLIs and SLOs while leading incident response
Plum logo
Plum Financial Services Scaleup https://withplum.com/
51 - 200 Employees
See more Plum offers

Job description

At Plum, we're on a mission to maximise wealth for all. We’re making saving money effortless and turning investing into something everyone can do. Our journey began back in 2017, when we became one of the first to use artificial intelligence and automation to simplify personal finance. Fast forward to today, and we've already helped people save £2 billion across 10 European markets.

Named the UK's fastest-growing fintech in the Deloitte Technology Fast 50, our success is down to the passion and dedication of our diverse team. Based in our London, Athens and Nicosia offices, 170 talented people work together to empower people to do more with their money. And now, the team is growing!

The Role

You will be joining our Infrastructure squad as a Lead Site Reliability Engineer to ensure that Plum’s systems are resilient, secure, scalable, observable and fully capable to support our growth. You will support our Engineering function to use our infrastructure in the most efficient way. You’ll proactively identify areas of improvements and propose initiatives to make the SRE function more streamlined and with reduced overhead.

What you will be doing:

  • Lead the SRE team in their daily work, provide mentoring and growth their skills and career
  • Identify initiatives to improve efficiency, raise the bar of the SRE function, prioritise team’s work, define a strategic vision aligned with company’s goals
  • Be an advocate of costs management (FinOps) and able to propose solutions to optimise our infrastructure
  • Be hands on for daily work and to contribute to initiatives owned by the team
  • Operate and scale our infrastructure (GCP, Kubernetes, PostgreSQL, RabbitMQ, Redis). We have data on the size of TBs that need to be blazing fast
  • Automate aspects of systems using infrastructure management tools of the trade (we use Terraform). Code once, deploy everywhere mindset
  • Ensure our metrics give an accurate picture of how the system is performing (we use Prometheus). Leverage observability in your day-to-day processes
  • Build and maintain SLIs and SLOs for our infrastructure; provides a platform for squads to build their SLIs and SLOs on top of collected metrics
  • Lead incident response and troubleshoot issues, correcting and improving systems to prevent incidents and grow at scale. Take point in handling service degradation
  • Collaborate with our Engineering function to deliver their craft into Plum infrastructure
  • Collaborate with the Principal Engineer to improve the Engineering function’s DevOps posture

For this role, we'd like to see:

  • Working experience of 5+ years as a Site Reliability Engineer, DevOps or of a similar position
  • Working experience of 2+ years leading an SRE squad to success
  • Proficiency in managing cloud infrastructure as IaaC with tools like Terraform
  • Ability to maintain the IaaC codebase in a optimal and efficient way (clear codebase structure, Terraform modules, etc.)
  • Strong expertise in system architecture, networking, database management, administration of Kubernetes clusters
  • Strong expertise in observability (Logging, Monitoring, Tracing)
  • Analytical skills, troubleshooting attitude
  • Proactive approach on problems, able to identify them and propose solutions
  • Passion for continuous improvement and challenging the status quo
  • Excellent communication skills in English (verbal and written)

Good to have

  • Familiarity with RDBMS databases management and migration procedures with zero downtime
  • Having built an SRE team from scratch focusing on efficiency
  • Proven stakeholder management skills and the ability to negotiate priorities with internal teams
  • Experience in Python, ability to navigate large codebases

Plum's Perks

  • We're all in this together! Own part of the company through stock options 💷
  • Annual training budget
  • Private Health & Life Insurance
  • Free Plum Premium subscription (normally £9.99 a month).
  • Free parking slots
  • 25 days holiday a year, excluding public holidays
  • Employee referral scheme up to €4000
  • Flexible approach to remote working, though we encourage at least 2-3 days a week in our beautiful office in central Athens for optimal collaboration.
  • 45 days work from anywhere
  • Team breakfast on Tuesdays and team lunch on Thursdays in the office, as well as a plentiful supply of fruit, snacks and coffee.
  • 1 day paid leave for volunteering, supporting you giving back to society.
  • 2 weeks paid sabbatical after four years of service.
  • Team trip to secret destinations once a year ✈️
  • Great office location in the heart of Athens (Syntagma square), with an amazing view!
  • A vibe that’s 🦄🌈💯

If you think this sounds like a bit of you then don’t hesitate to get in touch!

Thanks,

Plum Τeam 💜

* Plum is an Equal Opportunity Employer. Plum does not discriminate on the basis of age, race, religion, sex, gender identity, sexual orientation, non-disqualifying physical or mental disability, national origin or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit and business need.

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Industry :
Financial Services
Spoken language(s):
EnglishEnglish
Check out the description to know which languages are mandatory.

Other Skills

  • Verbal Communication Skills
  • Problem Solving
  • Team Leadership
  • Analytical Skills
  • Team Building
  • Troubleshooting (Problem Solving)

Site Reliability Engineer (SRE) Related jobs