Lead Site Reliability Engineer (SRE)

extra holidays - extra parental leave
Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Degree in Software Engineering, Computer Science, or related STEM field., At least 5 years of experience in software development or operations., Proficiency in at least one high-level programming language such as Python, Java, or C#., Strong troubleshooting, debugging, and communication skills in English..

Key responsibilities:

  • Lead and onboard services and teams to reliability standards.
  • Establish and maintain Service Level Objectives and Agreements.
  • Design and implement scalable, reliable, and secure infrastructure.
  • Collaborate with development teams to ensure system resilience and performance.

Outsystems logo
Outsystems Computer Software / SaaS Large https://www.outsystems.com/
1001 - 5000 Employees
See all jobs

Job description

There are NO limits to your career: come shape the future and be part of a truly unique global culture at OutSystems!

About This Role

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals of SRE are to create scalable and highly reliable systems. Our SREs ensure our production systems' reliability, performance, and scalability while enabling rapid development and deployment of new features and services.
SREs at OutSystems work closely with development teams, acting as an extension of the team, in adopting the reliability tenets with the shared goal of meeting Service Level Objectives (SLOs) and thus delivering a smooth and frictionless Customer Experience.
 

Key Responsibilities

As an SRE at OutSystems, here are your key responsibilities and duties:

  • Lead and onboard services and teams to the reliability tenets;

  • Establish and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs);

  • Design and implement scalable, reliable, and secure infrastructure, while ensuring cloud-native best practices;

  • Collaborate with software development teams to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant;

  • Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents;

  • Lead incident response efforts, ensuring quick resolution and minimal downtime, and conduct RCA/post-mortems;

  • Automate every operational task, with a special focus on fast incident detection & recovery;

  • Foster a culture of continuous improvement and knowledge sharing;

  • Communicate effectively with stakeholders, providing updates on system reliability and performance;

  • Participate in on-call rotation to provide 24/7 support for production systems.

Site Reliability Engineering Performance Indicators

The main KPIs that aid in understanding the impact and success of the SRE function at OutSystems are:

  • SLA and Service Level Objectives (SLO) compliance;

  • SLO Coverage and Detection Ratio;

  • MTTD - Mean time to detect;

  • MTTA - Mean time to acknowledge;

  • MTTR - Mean time to resolve.

Qualifications

To illustrate the desired profile for a Site Reliability Engineer. Nevertheless, the selection of candidates will always vary depending on specific knowledge of the field and prior experience.

  • STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields);

  • 5+ years of experience in software development and/or operations;

  • Proficiency in at least one high-level programming language (C++, Python, Java, C#, etc.).

  • Strong troubleshooting and debugging skills.

  • Fluency in English and excellent communication skills.

Soft Skills
  • Communication - able to communicate effectively (in English) both orally and written showing empathy for the other person;

  • Humbleness - accepts mistakes and acts accordingly, with a humble attitude, apologizing for them and mitigating them ASAP to avoid higher impact.

  • Accountability - takes ownership of problems and makes sure to see them through. Even if he does not have all the necessary knowledge to move on alone, can involve the right people to reach closure.

  • Negotiation Skills - has tough and politically complex conversations with colleagues and customers, defusing disagreements and leading towards a mutual agreement and understanding of all parties involved.

  • Process Oriented - is organized and able to properly follow defined processes, whilst being able to properly challenge inefficient processes and suggest improvements.

  • Problem-solving - Has a top-down approach to problems, breaking them into smaller pieces and solving them by starting with a wider scope and narrowing it down as the analysis progresses. Has critical thinking, so can analyze information objectively and make a reasoned judgment.

Technical Skills

Experience in any of the following is valued, but not fully required:

  • Containerization technologies and orchestration platforms, mainly Kubernetes
    (CKA, CKAD, CKS certifications are valued);

  • Experience with automation and Infrastructure as Code (IaC) tools, such as AWS CloudFormation, Terraform, Puppet, Chef, Spacelift, etc;

  • Experience with Python, Go, Bash/Shell scripting, or other automation tools/languages;

  • Familiarity with AWS services like EC2, RDS, ELB, CloudFront, Lambda, etc;

  • Proficiency in monitoring and troubleshooting complex distributed systems;

  • Experience with Grafana, ELK stack, Prometheus, or others;

  • Strong understanding of designing resilient and fault-tolerant systems;

  • Expertise in debugging complex distributed systems.

The Longer Story:

OutSystems is a global leader transforming how companies innovate through software, empowering IT leaders with a better way to build the software that matters most.We are looking for talented and motivated people to join us in helping companies solve some of their most strategic business challenges, from modernizing their workplace processes to transforming their employee and customer experiences. As a member of the OutSystems global team, you will help build, deliver, manage, and evolve the software that is a low-code market leader and preferred by professional developers around the world.

 

OutSystems is a truly global company, with more than 800,000 developer community members, 1,700 employees, more than 500 partners, and thousands of active customers in over 75 countries and across 21 industries. Founded in 2001, OutSystems has offices in the United States, United Kingdom, the Netherlands, Portugal, Germany, the UAE, Japan, Hong Kong, Malaysia, Australia, India, and Singapore, and of course has a thriving, worldwide community of remote employees.

Working at OutSystems

Our goal is to ensure that OutSystems is a place for bright, happy, and motivated people who share a common purpose and take pride in excellent work towards our vision. Our culture is focused on building agility at scale, which allows us to operate with a high drive in a competitive market. At OutSystems, we operate like a startup at scale, where teams act as coordinated "startups" - a true Federation of Teams Culture. Our attributes define the core behaviors that fuel our innovation and foster agility at scale. We encourage our team members to collaborate, focus on results, act quickly, understand our business and reinvent themselves.

What do we have to offer you? 

  • A company that continues to grow, change and innovate, and gives our teams the space to be proactive and creative. 

  • Real career opportunities. We care about growth and development. Vertical career progression is an obvious possibility, but we also offer the possibility for lateral moves, joining different teams, and mastering specific skills. 

  • Work colleagues that are as smart, hardworking and driven as you – and a team that is global. 

  • Disrupting the status quo is in our DNA. In fact, it’s why our company exists.

  • We “Ask Why” a lot. It helps us connect our individual work to the bigger picture and sometimes even uncover a better way.

Are you ready for the next step in your career? Then we’d love to hear from you!

OutSystems nurtures an inclusive culture of diversity, where everyone feels empowered to be their authentic self and perform at their best. A company that embraces the creativity and innovation that comes through diverse perspectives. We are committed to creating a team that reflects society through inclusive programs and initiatives and are proud to be an equal opportunity employer. All qualified applicants receive equal consideration regardless of race, place of origin, color, age, marital status, religion, sex, sexual orientation, gender expression or identity, protected veteran status, disability status or any other status protected by law.

Join us in disrupting the status quo of the low-code market, we give you the power to "Ask Why", you give our customers the power to innovate through software!

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Computer Software / SaaS
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Humility
  • Accountability
  • Communication
  • Negotiation
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs