Site Reliability Engineer - Database and Observability (f/m/d)

Work set-up: 
Full Remote
Contract: 
Work from: 
Switzerland

Offer summary

Qualifications:

Deep understanding of Linux systems administration., Proficiency in managing large-scale MariaDB deployments., Experience with Go programming language and distributed systems principles., Familiarity with observability tools like Prometheus and interest in Kafka, Cassandra, or FoundationDB..

Key responsibilities:

  • Maintain and optimize core data infrastructure including databases and messaging systems.
  • Enhance and evolve the observability stack for better system monitoring.
  • Automate operations to improve efficiency and reliability.
  • Participate in on-call rotations and contribute to system scalability and high availability.

Exoscale logo
Exoscale SME https://www.exoscale.com/
11 - 50 Employees
See all jobs

Job description

Exoscale is the leading Swiss/European cloud service provider.

With services covering the full cloud infrastructure spectrum - from fast deploying virtual machines to S3 compatible object storage - Exoscale provides a simple and scalable experience in order to let its clients focus on their core business.

Join a dynamic working environment with a cutting-edge distributed team based in Lausanne. Exoscale strives to create an environment with great working conditions and welcomes diverse applicants.
As part of its ongoing efforts to grow its infrastructure footprint Exoscale is hiring a Site Reliability Engineer.

The site reliability engineer plays a critical role in ensuring constant availability of the Exoscale platform. The engineering team at Exoscale works on all aspects from designing & developing products, to their operation and support.

With an expanding customer base and new products to further advance Exoscale's product portfolio, site reliability engineers build and maintain a wide range of technologies. As users of Exoscale itself, site reliability engineers also take active part in improving products.

This position focuses on designing, developing and maintaining Exoscale's core platform and security components.

Some of the challenges you will be working on:
  • Maintain and optimize our persistent data infrastructure, including MariaDB, Cassandra, FoundationDB, and Kafka.
  • Enhance and evolve our observability stack to improve system visibility and performance monitoring.
  • Take part in automation and orchestration efforts to streamline operations and reduce manual intervention.
  • Improve processes to ensure scalability, reliability, and high availability of our infrastructure.
  • Join the on-call rotation after completing a training period.

Ideal candidates are:
  • Experienced with Linux and have a deep understanding of systems administration.
  • Proficient in MariaDB and experienced in managing large-scale database deployments.
  • Proficient in Go programming language and understands distributed systems principles
  • Familiar with Prometheus and the broader observability ecosystem.
  • Experienced (or is eager to learn) Kafka, Cassandra, and/or FoundationDB.
  • Skilled in configuration management and managing large-scale infrastructure.
  • Passionate about automation. Looking for ways to optimize workflows and reduce manual effort.
  • Team players who thrive in a distributed team environment.
  • Curious, autonomous, and eager to learn new technologies every day.
  • Strong communicators in English, both written and spoken.

What we offer:
  • Flexible working hours and working from home.
  • Autonomous working conditions with a lot of freedom to create.
  • Modern working atmosphere and centrally located office with great public transport connection
  • Team events as well as training and further education.

Candidates who are not familiar with all the topics above but willing to learn are encouraged to apply.
We look forward to your application!!

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Teamwork
  • Communication

Site Reliability Engineer (SRE) Related jobs