Offer summary

Qualifications:

Bachelor’s Degree in Computer Science, Information Technology, or related field., At least 2 years of experience as a Site Reliability Engineer., Proficiency with cloud platforms, especially AWS, and infrastructure-as-code tools like Terraform or CloudFormation., Strong understanding of cloud networking, monitoring tools, and distributed platform architecture..

Key responsibilities:

Ensure high availability, scalability, and performance of the platform.

Design and implement reliable cloud infrastructure and automation solutions.

Participate in incident response to diagnose and resolve production issues.

Collaborate with cross-functional teams to promote reliability throughout the software development lifecycle.

Job description

Description

At Paymentology, we’re redefining what’s possible in the payments space. As the first truly global issuerprocessor, we give banks and fintechs the technology and talent to launch and manage Mastercard, Visa, and UnionPay cards at scale across more than 60 countries.
Our advanced, multicloud platform delivers realtime data, unmatched scalability, and the flexibility of shared or dedicated processing instances. Its this global reach and innovation that sets us apart.
We’re looking for a Site Reliability Engineer to ensure the high availability, scalability, and performance of our platform. This role is essential to maintaining reliable systems, reducing operational overhead, and enabling continuous improvement across our global technology landscape. If youre passionate about automation, incident response, and working at the intersection of infrastructure and software, this is your opportunity to help build resilient systems that power financial inclusion worldwide.

What you get to do:
Platform Reliability and Scalability
Build software that enhances Paymentology services scalability and reliability.
Ensure platform services meet required uptime and service quality levels.
Contribute to the design of reliable cloud infrastructure and implement reusable clouduptime components as code.
Regularly review and optimise SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation
Contribute to the design, implementation, and maintenance of observability and monitoring solutions to track the platform health, its costeffectiveness, the reliability, and scalability, and identify potential issues which can be fed back to product and platform engineering in a continuous improvement loop.
Develop and implement automation scripts and tools to streamline operations and reduce manual interventions.
Enable product teams to selfserve by participating in the development of a developer platform.
Production Issue Resolution
Play an active role with the incident response teams, diagnosing and resolving production issues quickly to minimise downtime.
Standards Compliance
Support product teams in building services that adhere to our security and quality standards.
Crossteam Collaboration
Work closely with engineering, operations, and product teams to ensure reliability is considered throughout the endtoend software development lifecycle. We seek to achieve this through advocacy and developing a culture of reliability.

Requirements
What it takes to succeed:
Strong understanding of cloud networking principles.
Proficiency with leading monitoring tools, such as Datadog, Honeycomb.io, Splunk, Prometheus, Grafana, ELK Stack, and New Relic.
Programming expertise, especially in systems programming languages and databases
Familiarity with one of these industryleading CICD tools such as Jenkins, GitHub Actions, Gitlab CI, CodePipelines, CircleCI, and ArgoCD.
Proven in achieving platformlevel and endtoend SLIs, SLOs, and SLAs, and fostering accountability.
Ability to navigate complex situations and lead effective postincident reviews (PIRs).
Knowledge of implementing solutions to reduce Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR).
Comprehensive understanding of largescale distributed platform architecture.
Expertise in implementing best practices for load balancing, fault tolerance, and resource allocation to maintain service quality and efficiency at scale.
Understanding of security best practices within cloud environments.

Education and Experience:
Bachelor’s Degree in Computer Science, Information Technology, or related field.
Professionals with a verifiable employment history in the role may also be considered.
2+ years of experience as a Site Reliability Engineer.
2+ years in software development.
Extensive cloud experience, especially with AWS.
Proven expertise in one of the infrastructureascode using Terraform, CloudFormation, Puppet, and Ansible.
Handson experience with Docker, ECS, EKS, and Kubernetes.**

What you can look forward to:
At Paymentology, it’s not just about building great payment technology, it’s about building a company where people feel they belong and their work matters. You’ll be part of a diverse, global team that’s genuinely committed to making a positive impact through what we do. Whether you’re working across time zones or getting involved in initiatives that support local communities, you’ll find real purpose in your work and the freedom to grow in a supportive, forwardthinking environment.

Required profile

Are you interested?