There are NO limits to your career: come shape the future and be part of a truly unique global culture at OutSystems!
About This Role
Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals of SRE are to create scalable and highly reliable systems. Our SREs ensure our production systems' reliability, performance, and scalability while enabling rapid development and deployment of new features and services.
SREs at OutSystems work closely with development teams, acting as an extension of the team, in adopting the reliability tenets with the shared goal of meeting Service Level Objectives (SLOs) and thus delivering a smooth and frictionless Customer Experience.
Site Reliability Engineer Role
As an SRE at OutSystems here are your key responsibilities and duties:
Enable and onboard services and teams to the reliability tenets;
Establish and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs);
Design and implement scalable, reliable, and secure infrastructure while ensuring cloud-native best practices;
Work closely with software development teams to enhance system observability, fault tolerance, recovery mechanisms, and scalability.
Deploy and optimize monitoring, alerting, logging, and tracing solutions to detect and respond to incidents effectively.
Participate in incident response, ensuring quick resolution and minimal downtime, and conduct RCA/post-mortems;
Automate operational tasks with a strong focus on incident detection, mitigation, and recovery;
Continuously improve reliability processes and share knowledge with peers.
Communicate effectively with stakeholders, providing updates on system reliability and performance;
Participate in on-call rotation to provide 24/7 support for production systems.
Site Reliability Engineering Performance Indicators
The main KPIs that aid in understanding the impact and success of the SRE function at OutSystems are:
SLA and Service Level Objectives (SLO) compliance;
SLO Coverage and Detection Ratio;
MTTD - Mean time to detect;
MTTA - Mean time to acknowledge;
MTTR - Mean time to resolve.
Qualifications and Skills
To illustrate the desired profile for a Site Reliability Engineer. Nevertheless, the selection of candidates will always vary depending on specific knowledge of the field and prior experience.
Qualifications
STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields);
5+ years of experience in software development and/or operations;
Proficiency in at least one high-level programming language (Python, Go, Java, C#, etc.).
Strong troubleshooting and debugging skills.
Fluency in English and excellent communication skills.
Soft Skills
Communication – Effectively communicates in English, both orally and in writing, ensuring clarity and empathy in interactions.
Humility – Acknowledges mistakes, takes responsibility, and acts quickly to mitigate impact.
Accountability – Takes ownership of issues and ensures resolution, involving the right people when needed.
Collaboration – Works effectively with colleagues and stakeholders to find common ground and resolve challenges.
Process Adherence & Improvement – Follows established processes while identifying and suggesting improvements when necessary.
Problem-Solving – Approaches problems methodically, breaking them into smaller components, applying critical thinking, and narrowing down solutions efficiently.
Technical Skills
Experience in any of the following is valued, but not fully required:
Ability to establish, monitor, and improve Service Level Objectives (SLOs), Indicators (SLIs), and Agreements (SLAs) in line with business needs.
Containerization technologies and orchestration platforms, mainly Kubernetes and EKS
(CKA, CKAD, CKS certifications are valued);
Experience with automation and Infrastructure as Code (IaC) tools, such as AWS CloudFormation, Terraform, Puppet, Chef, Spacelift, etc;
Experience with Python, Go, Bash/Shell scripting, or other automation tools/languages;
Familiarity with AWS services like EC2, RDS, ELB, CloudFront, Lambda, etc;
Proficiency in monitoring and troubleshooting complex distributed systems;
Experience with Grafana, ELK stack, Prometheus, or others;
Strong understanding of designing resilient and fault-tolerant systems;
Expertise in debugging complex distributed systems.
The Longer Story:
OutSystems is a global leader transforming how companies innovate through software, empowering IT leaders with a better way to build the software that matters most.We are looking for talented and motivated people to join us in helping companies solve some of their most strategic business challenges, from modernizing their workplace processes to transforming their employee and customer experiences. As a member of the OutSystems global team, you will help build, deliver, manage, and evolve the software that is a low-code market leader and preferred by professional developers around the world.
OutSystems is a truly global company, with more than 800,000 developer community members, 1,700 employees, more than 500 partners, and thousands of active customers in over 75 countries and across 21 industries. Founded in 2001, OutSystems has offices in the United States, United Kingdom, the Netherlands, Portugal, Germany, the UAE, Japan, Hong Kong, Malaysia, Australia, India, and Singapore, and of course has a thriving, worldwide community of remote employees.
Working at OutSystems
Our goal is to ensure that OutSystems is a place for bright, happy, and motivated people who share a common purpose and take pride in excellent work towards our vision. Our culture is focused on building agility at scale, which allows us to operate with a high drive in a competitive market. In our federation of teams culture, if we have every team operating like a startup, we can all learn, grow, and innovate while having the space to be proactive and creative. We encourage our team members to collaborate, focus on results, act quickly, understand our business, and adopt a growth mindset.
What do we have to offer you?
A company that continues to grow, change and innovate, and gives our teams the space to be proactive and creative.
Real career opportunities. We care about growth and development. Vertical career progression is an obvious possibility, but we also offer the possibility for lateral moves, joining different teams, and mastering specific skills.
Work colleagues that are as smart, hardworking and driven as you – and a team that is global.
Disrupting the status quo is in our DNA. In fact, it’s why our company exists.
We “Ask Why” a lot. It helps us connect our individual work to the bigger picture and sometimes even uncover a better way.
Are you ready for the next step in your career? Then we’d love to hear from you!
OutSystems nurtures an inclusive culture of diversity, where everyone feels empowered to be their authentic self and perform at their best. A company that embraces the creativity and innovation that comes through diverse perspectives. We are committed to creating a team that reflects society through inclusive programs and initiatives and are proud to be an equal opportunity employer. All qualified applicants receive equal consideration regardless of race, place of origin, color, age, marital status, religion, sex, sexual orientation, gender expression or identity, protected veteran status, disability status or any other status protected by law.
Join us in disrupting the status quo of the low-code market, we give you the power to "Ask Why", you give our customers the power to innovate through software!
HERE Technologies
Setpoint.io
3Pillar
FRVR
Patch My PC