Site Reliability Engineer AI Platform

Work set-up: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Proven experience in infrastructure and reliability engineering, including deployment automation and monitoring., Solid programming skills in Python, Go, or TypeScript, with production-grade coding experience., Familiarity with cloud-native development and AWS services, especially AI/ML-related services., Experience with Kubernetes, CI/CD pipelines, Infrastructure-as-Code tools like Terraform, and containerization using Docker..

Key responsibilities:

  • Design and develop platform components for machine learning and GenAI use cases.
  • Manage deployment, maintenance, monitoring, and incident response for services.
  • Collaborate with ML Engineers, SREs, and platform teams to ensure operability and scalability.
  • Participate in code reviews, documentation, and team decision-making processes.

N26 logo
N26 https://n26.com/
1001 - 5000 Employees
See all jobs

Job description

About the opportunity

We are seeking a Site Reliability Engineer to join the Platform Engineering domain in the AI Platform team.

The mission of Platform Engineering is to provide trusted, performant, selfservice platforms that empower product teams to build the bank the world loves to use. The AI Platform team contributes to this mission by creating scalable, secure, and compliant infrastructure solutions that support MLOps and GenAI capabilities.

As one of the first banks completely hosted in the cloud, our security, resilience, and productivity standards require not only the use of a modern technology stack but also building teams in line with our principles, supporting our product teams, the company, and our customers.

In this role, you will:
  • Contribute to the design and development of platform components that enable machine learning and generative AI use cases across the company
  • Take ownership of reliable deployment, maintenance, monitoring, and incident response for our services
  • Write highquality, maintainable code and help ensure our platform solutions are welldocumented and testable
  • Work alongside more senior engineers to evolve our infrastructure and build secure, compliant and scalable solutions across cloud, networking, observability and CICD domains
  • Collaborate with ML Engineers, SREs, and other Platform teams to ensure operability and maintainability of AI capabilities offered across the company
    Participate in code reviews, RFCs, documentation and product discovery, contributing to the teams design and decisionmaking processes
  • Identify technical or knowledge gaps and proactively work to address them, either independently or with the team
  • Help improve our engineering practices and team ways of working

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Teamwork
  • Proactivity
  • Curiosity

Site Reliability Engineer (SRE) Related jobs