Match score not available

Site Reliability Engineer (SRE)

extra holidays
Remote: 
Full Remote
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Minimum 3 years of experience in SRE roles., Proficiency with monitoring tools like Prometheus, Grafana., Experience in Kubernetes and Azure environments., Familiarity with cloud-native architectures..

Key responsabilities:

  • Drive ownership of SRE practices.
  • Implement SLIs and SLOs collaboratively.
Relout logo
Relout Startup https://relout.team/
11 - 50 Employees
See more Relout offers

Job description

Relout is a place created by ambitious people with a passion for technology. We work for international projects and clients from various industries, helping startups, software houses, and enterprises to transform and scale their businesses. We’re a boutique consulting (https://relout.team) & technology (https://relout.cloud) partner that builds the foundation to scale for our client’s success. Our mission is to connect best-in-class, passionate engineers with fast-growing digital & technology companies.

We're looking for a Site Reliability Engineer to join a long-term project with one of our client's - a well established provider of safety solutions for logistics sector, operating mainly in US and Canada. The SRE engineer will join a Cloud Engineering & Infrastructure team responsible mainly for the platform running SaaS product on top of Azure and Kubernetes.

We're seeking a highly skilled individual who possesses technical expertise in SRE as well as strong communication and initiative driving skills. The candidate will play crucial role in bridging the gap between operations and business stakeholders, managing discussions related to observability and reliability as well as driving SRE initiatives. The ability to effectively implement SLOs & SLIs, build useful dashboards, work closely with development teams on implementation and supervise the escalations to L1/L2 teams will be key elements to fullfilling this role

Our client operates in US timezones, but remains flexibile in terms of working hours, requiring only 2-3h overlap with Eastern Timezone (EST) for purpose of business meetings & syncs.

Oferujemy

  • Regular company events and integrations (meetups)
  • Recurring Fun budget to spend on anything that makes you happy (team activities encouraged!)
  • Educational budget to spend on certifications, training, and conference attendance
  • Ability to access & use coworking office spaces in every major city in Poland
  • Attractive referral programs
  • Unlimited legal advice & support with B2B partnership and self-employment
  • Missing anything you like? Luxmed, Multisport? Ask us about it!

Obowiązki

  • Drive the SRE discipline: Take ownership of SRE practices, serving as the primary advocate and driving force behind the company reliability strategy.
  • Implement SLIs and SLOs: Collaborate with stakeholders to define and implement service-level indicators, objectives, and error budgets.
  • Build centralized dashboards: Develop tools to enhance visibility into system performance, identify trends, and improve overall observability.
  • Oversee incident management processes: Collaborate with the 24/7 incident support team, assist in drafting and refining SOPs, ensure RCA processes are followed, and handle escalations as needed.
  • Drive cross-team collaboration: Act as the bridge between Cloud Engineering, Infrastructure Development, and other technical teams to ensure alignment on objectives and seamless execution of SRE initiatives.
  • Collaborate with senior management: Prepare and deliver clear, business-oriented presentations that justify investments, outline strategic priorities, and demonstrate measurable progress on reliability metrics.

Wymagania

  • Minimum 3 years of experience in Site Reliability Engineering (SRE) roles or simmilar
  • Proficiency in utilizing monitoring and observability tools such as Prometheus, Grafana, Elasticsearch and OpsGenie
  • Ability to design and implement centralized dashboards for system performance monitoring and analysis.
  • Experience in drafting and executing SOPs and RCA processes
  • Proven track record in monitoring complex cloud infrastructures, including Kubernetes and Azure (PaaS and IaaS)
  • Experience in working with large-scale distributed systems, microservices, and cloud-native architectures
  • Proven ability to guide cross-functional engineering teams and collaborate with stakeholders at various organizational levels.
  • Familiarity with monitoring databases, including MongoDB, CosmosDB, MySQL, and SQL Server.
  • Exceptional verbal and written communication skills in English
  • Strong analytical thinking and problem-solving abilities in dynamic project environments.
  • Communication skills to present technical ideas and plans in a way that aligns with organizational goals

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Analytical Thinking
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs