Key Facts

Remote From:

Category: Site Reliability Engineer (SRE)

Full time

English

Hard Skills

Incident Response Service Level Management Distributed Computing Kubernetes Datadog Kubernetes Observability Prometheus (Software) Rate Limiting High Availability Design +30 more

Other Skills

•
Communication
•
Leadership
•
Teamwork
•
Problem Solving

Roles & Responsibilities

Minimum of 4 years of experience in SRE or Backend Engineering with strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
Deep understanding of distributed systems architecture and design patterns, including microservices fundamentals, event-driven architectures, and scalability.
Extensive experience with cloud providers (GCP preferred; AWS/Azure acceptable) and proficiency in running production workloads on Kubernetes (GKE/EKS) with troubleshooting capabilities.
Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz to improve system visibility.

Requirements:

Participate in on-call rotations as the primary technical lead and act as Incident Commander during major severity incidents, coordinating war rooms and cross-functional teams with clear status updates.
Instrument code to expose high-cardinality metrics and distributed traces; define, measure, and defend SLOs and error budgets with product owners.
Write production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that reduce manual operator intervention.
Partner with Product Engineering teams during design to ensure new services include reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one.

Moniepoint Group

Financial Services

About Moniepoint Group

Moniepoint Inc. is a leading financial technology company that provides a seamless platform for businesses, their employees and customers, to accept payments digitally, receive credit and access business management tools that enable them to grow with ease. We are the parent company of TeamApt Ltd and Moniepoint MFB and we support over 1,800,000 businesses to process $12 billion monthly through our digital payment acceptance channels. For our work in making digital payment accessible to businesses in emerging markets, our Nigerian subsidiary was awarded the National Inclusive Payment Initiative Award by the Central Bank of Nigeria. In 2022, CB insights recognised us as a top global fintech. We are backed by QED, British International Investment, FMO, and other leading global venture capital funds. Moniepoint Inc. is a fully remote tech company with a diverse workforce worldwide and is headquartered in London, with offices in the US, Nairobi and Lagos. Join us as a #DreamMaker to help power the dreams of businesses globally.

Company type: Large

Industry: Financial Services

Founded: 2018

Company size: 1001 - 5000

Website LinkedIn See all jobs →

Job description

Who We Are

Moniepoint is an all-in-one financial services platform for emerging markets and the second-fastest growing company in Africa.

Since 2019, Moniepoint’s technology has powered over 3 million people, offering personal and business banking, payment, credit and business management tools to help them succeed. Moniepoint processed $182 billion in 2023, and currently processes the majority of the POS transactions in Nigeria.

What We Do

At Moniepoint, we are a customer-focused community, dedicated to crafting solutions that redefine our industry. We have several products that provide essential services for businesses, such as credit, overdrafts, etc. We leverage artificial intelligence and data to make our decisions, but also have the technology and data-driven best practices used to support our businesses.

Curious about what makes Moniepoint an incredible place to work? Check out posts on how we cultivate a culture of innovation, teamwork, and growth.

Job Summary

We are seeking an experienced SRE to engineer the reliability of our highly distributed platform. You will combine deep knowledge of distributed systems with strong coding skills to define SLOs, lead incident response, and build automation and self-healing mechanisms into our systems. You will balance immediate operational stability with long-term strategic engineering to ensure our services scale linearly with our hyper-growth.

Responsibilities

Participate in on-call rotations as the primary technical lead. Act as the Incident Commander during major severity incidents: initiating war rooms, coordinating cross-functional teams, and providing clear status updates.
Instrument code to expose high-cardinality metrics and distributed traces. Collaboratively define, measure, and defend Service Level Objectives (SLOs) and Error Budgets with product owners.
Write high-quality, production-ready code (in Java, Go, or Python) to build internal tooling, automation platforms, and self-healing mechanisms that eliminate manual operator intervention.
Partner with Product Engineering teams during the design phase to ensure new services are built with reliability, scalability, and observability patterns (circuit breakers, rate limiting, backpressure, fallback strategies) from day one.
Analyze system performance and traffic patterns to model future capacity needs. Conduct load testing and chaos engineering experiments to verify system resilience under failure conditions.
.

Requirements

Minimum of 4 years of experience in SRE or Backend Engineering with a strong ability to write clean, performant, and tested code in Java, Go, Rust, or Python.
Deep understanding of distributed systems architecture and design patterns. You possess a strong command of microservices fundamentals, event-driven architectures, and the underlying principles required to build systems that scale.
Extensive experience with Google Cloud Platform (GCP) or similar cloud providers (AWS/Azure). You are proficient in running production workloads on Kubernetes (GKE/EKS) and troubleshooting cluster/infrastructure issues.
Experience designing observability strategies using OpenTelemetry, Prometheus, New Relic, Datadog, or SigNoz to improve system visibility.
Familiarity with operating and tuning production data stores (e.g., PostgreSQL, MySQL) and streaming platforms (e.g., Kafka, RabbitMQ) in a high-throughput environment.

What we can offer you

Culture - We put our people first and prioritize the well-being of every team member. We’ve built a company where all opinions carry weight and where all voices are heard. We value and respect each other and always look out for one another. Above all, we are human.
Learning - We have a learning and development-focused environment with an emphasis on knowledge sharing, training, and regular internal technical talks.
Compensation - You’ll receive an attractive salary, pension, health insurance, annual bonus, plus other benefits.

What to expect in the hiring process

A preliminary phone call with the recruiter
A technical interview with the Hiring Manager
A behavioural and technical interview with a member of the Executive team.

Moniepoint is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees and candidates.

Ready to apply?

APPLY

Share ·