Match score not available

Senior Site Reliability Engineer

extra holidays - fully flexible

Remote:

Full Remote

Contract:

Full time

Experience:

Senior (5-10 years)

Work from:

Poland

Offer summary

Qualifications:

Experience in incident management, Knowledge of microservices technologies, Programming and scripting experience, Familiarity with monitoring and pipelining tools, Basic knowledge of serverless services from public cloud providers.

Key responsabilities:

Define SLIs and SLOs with teams
Create systems for observability
Analyze failure scenarios and mitigations
Assist in creating runbooks for failures
Participate in incident management

Valtech Large https://www.valtech.com/

5001 - 10000 Employees

See more Valtech offers

Job description

Hello! Cześć!

Valtech is looking for a Senior Site Reliability Engineer to join our global team of 6,000 professionals in more than 60 offices worldwide!

Are you passionate about Site Reliability Engineering, do you have an eye for SLIs, SLOs, automation, do you hate toil and intend to do something about it, and does it excite you to get things done in close collaboration with people around the globe? Would you like the freedom to choose to either work from the comfort of your home and also have the opportunity to visit any of our offices close to you? Then you might be the person we’re looking for! Keep reading to find out. 

Valtech and Site Reliability Engineering

Valtech is a leading global agency in the business of digital transformation. We help our client to transform their business into a true digital experience. In this mission we design, build and run large scale global experience and commerce platforms in co-creation and co-operation with our clients. Experience and commerce platforms have drastically evolved over the last years in complex eco systems that tie together multiple services of multiple vendors – also known as MACH or composable architecture. As founding member of MACH Alliance, a group that educates enterprises on best-of-breed Microservices, APIs, Cloud, and Headless (MACH) technology, Valtech pioneers in how to properly build and manage those complex eco systems. Site reliability engineering is at the core of our vision how this modern day distributed eco system should and can be managed.

The Ideal Candidate

We would love to talk to you if

You are assertive with good communicative skills, capable of taking the lead and coaching a development team to make the right choices.

You have experience with incident management on a production environment of a public facing online service with high business value and preferably high traffic in a 24x7 fashion.

You have experience in working in corporate environments.

You have experience programming and scripting.

You have at least basic knowledge of serverless services in one or more public cloud providers (AWS, Azure, GCP).

You have extensive knowledge of and experience with various monitoring systems, amongst which APM systems such as Datadog, New Relic, Dynatrace, Prometheus, Grafana.

You have knowledge of and experience with various pipelining tools, such as GitHub, Azure DevOps, Gitlab, Jenkins.

You have knowledge of and experience with microservices related technology: Docker, Kubernetes.

You have a good conceptual understanding of software architecture and system thinking.

You have worked as an engineer in a DevOps context.

You have an excellent command of English (C1 or above).

Are familiar with the following technologies:

Datadog (or APM equivalent)
Argos CI/CD
Java / Springboot
Kafka
Kubenetes / EKS
AWS

Have worked within the context of publicly accessible, highly available eCommerce platforms.
Have experience working in an international context with on- and off-shore teams.

The Role

A day in the life of a Site Reliability Engineer (SRE)

As a Site Reliability Engineer (SRE), you are the bridge between software development and operations. You help us to deliver reliable speed to our clients, allowing them to leverage the benefits of continuous deployment without losing grip on customer experience. You will work with our multidisciplinary teams in an essential DevOps way of working where your main responsibility is to keep everyone focused on production, while creating the facilities to do so.

Your responsibilities will be: 

Work with teams to define SLIs and SLOs

Creating systems for observability

Work with teams to analyze failure scenarios and possible mitigations.

(Assisting to) create runbooks to remediate or prevent failure scenarios.

Reduce work that does not add value.

Participate and facilitate incident management including On Call Duty.

Benefits

Mental and physical health:

20 working days of paid vacation
National holidays covered
Sick leave (up to 20/year)
Unpaid leave (up to 20/year)
Medical insurance
Multisport card OR Multikafeteria
Maternity & paternity leave support

Personal and professional development:

Internal workshops & learning initiatives
English language classes compensation
Professional certifications reimbursement
Participation in professional local & global communities
Growth Framework to manage expectations and define the steps to move towards the selected career
Mentoring program with the ability to become a mentor or a mentee to grow to a higher position

Valtech Poland has a system of progressive benefit packages – the longer you stay with the company – the more benefits you get.

Our recruitment flow:

1) Screening - we are going to send you a short questionnaire with few questions before we will invite you to first meeting with HR.

2) HR Interview - up to 1 hour interview via Teams with Talent Acquisition Specialist.

3) Technical Interview - up to 1 hour technical discussion with our experts, meeting via Teams.

4) Final Interview - up to 1 hour soft-skill related meeting with Delivery Manager and People Partner.

5) Collecting references - we will ask you to provide some references you already got from previous projects or 2/3 e-mail addresses to people of your choice that will be able to respond to our short survey.

You can not only become a part of constant evolution but can lead the change. The more we grow – the more opportunities there are to take responsibility, implement your creative ideas, and be the innovator and driver rather than the task executor.