Match score not available

Lead Site Reliability Engineer

extra holidays

Remote:

Full Remote

Contract:

Full time

Salary:

4 - 19K yearly

Experience:

Mid-level (2-5 years)

Work from:

Greece, Belgium

Offer summary

Qualifications:

5+ years as a Site Reliability Engineer or similar, 2+ years leading an SRE squad, Proficient in Terraform for managing cloud infrastructure, Strong expertise in system architecture and Kubernetes, Excellent command of English, verbal and written.

Key responsabilities:

Lead and mentor the SRE team
Identify initiatives to improve efficiency in SRE operations
Advocate for cost management and propose optimization solutions
Automate systems using industry tools like Terraform
Build and maintain SLIs and SLOs while leading incident response

Plum Financial Services Scaleup https://withplum.com/

51 - 200 Employees

See more Plum offers

Job description

At Plum, we're on a mission to maximise wealth for all. We’re making saving money effortless and turning investing into something everyone can do. Our journey began back in 2017, when we became one of the first to use artificial intelligence and automation to simplify personal finance. Fast forward to today, and we've already helped people save £2 billion across 10 European markets.

Named the UK's fastest-growing fintech in the Deloitte Technology Fast 50, our success is down to the passion and dedication of our diverse team. Based in our London, Athens and Nicosia offices, 170 talented people work together to empower people to do more with their money. And now, the team is growing!

The Role

You will be joining our Infrastructure squad as a Lead Site Reliability Engineer to ensure that Plum’s systems are resilient, secure, scalable, observable and fully capable to support our growth. You will support our Engineering function to use our infrastructure in the most efficient way. You’ll proactively identify areas of improvements and propose initiatives to make the SRE function more streamlined and with reduced overhead.

What you will be doing:

Lead the SRE team in their daily work, provide mentoring and growth their skills and career
Identify initiatives to improve efficiency, raise the bar of the SRE function, prioritise team’s work, define a strategic vision aligned with company’s goals
Be an advocate of costs management (FinOps) and able to propose solutions to optimise our infrastructure
Be hands on for daily work and to contribute to initiatives owned by the team
Operate and scale our infrastructure (GCP, Kubernetes, PostgreSQL, RabbitMQ, Redis). We have data on the size of TBs that need to be blazing fast
Automate aspects of systems using infrastructure management tools of the trade (we use Terraform). Code once, deploy everywhere mindset
Ensure our metrics give an accurate picture of how the system is performing (we use Prometheus). Leverage observability in your day-to-day processes
Build and maintain SLIs and SLOs for our infrastructure; provides a platform for squads to build their SLIs and SLOs on top of collected metrics
Lead incident response and troubleshoot issues, correcting and improving systems to prevent incidents and grow at scale. Take point in handling service degradation
Collaborate with our Engineering function to deliver their craft into Plum infrastructure
Collaborate with the Principal Engineer to improve the Engineering function’s DevOps posture

For this role, we'd like to see:

Working experience of 5+ years as a Site Reliability Engineer, DevOps or of a similar position
Working experience of 2+ years leading an SRE squad to success
Proficiency in managing cloud infrastructure as IaaC with tools like Terraform
Ability to maintain the IaaC codebase in a optimal and efficient way (clear codebase structure, Terraform modules, etc.)
Strong expertise in system architecture, networking, database management, administration of Kubernetes clusters
Strong expertise in observability (Logging, Monitoring, Tracing)
Analytical skills, troubleshooting attitude
Proactive approach on problems, able to identify them and propose solutions
Passion for continuous improvement and challenging the status quo
Excellent communication skills in English (verbal and written)

Good to have

Familiarity with RDBMS databases management and migration procedures with zero downtime
Having built an SRE team from scratch focusing on efficiency
Proven stakeholder management skills and the ability to negotiate priorities with internal teams
Experience in Python, ability to navigate large codebases

Plum's Perks

We're all in this together! Own part of the company through stock options 💷
Annual training budget
Private Health & Life Insurance
Free Plum Premium subscription (normally £9.99 a month).
Free parking slots
25 days holiday a year, excluding public holidays
Employee referral scheme up to €4000
Flexible approach to remote working, though we encourage at least 2-3 days a week in our beautiful office in central Athens for optimal collaboration.
45 days work from anywhere
Team breakfast on Tuesdays and team lunch on Thursdays in the office, as well as a plentiful supply of fruit, snacks and coffee.
1 day paid leave for volunteering, supporting you giving back to society.
2 weeks paid sabbatical after four years of service.
Team trip to secret destinations once a year ✈️
Great office location in the heart of Athens (Syntagma square), with an amazing view!
A vibe that’s 🦄🌈💯

If you think this sounds like a bit of you then don’t hesitate to get in touch!

Thanks,

Plum Τeam 💜

* Plum is an Equal Opportunity Employer. Plum does not discriminate on the basis of age, race, religion, sex, gender identity, sexual orientation, non-disqualifying physical or mental disability, national origin or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit and business need.