Key Facts

Remote From:

Uruguay , Brazil , El Salvador , Guatemala , Honduras , Costa Rica , Ecuador , Colombia , Panama , Argentina , Chile , Bolivia , Peru , Mexico

Category: Site Reliability Engineer (SRE)

Full time

Senior (5-10 years)

English

Hard Skills

Google Cloud Platform (GCP) Kubernetes Site Reliability Engineering Containerization Root Cause Analysis Mathematical Optimization Capacity Planning Post-Mortem Care Systems Architecture Failover +16 more

Other Skills

•
Troubleshooting (Problem Solving)
•
Communication
•
Teamwork
•
Problem Solving

Roles & Responsibilities

4-6 years of experience in Site Reliability Engineering, DevOps, or related roles
Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, or cloud-native tools)
Familiarity with cloud platforms (Azure, AWS, or GCP) and containerization/orchestration (Docker, Kubernetes)
Strong scripting/programming skills (Python, Go, Bash) and understanding of CI/CD pipelines

Requirements:

Ensure high availability, reliability, and performance of applications and infrastructure
Define and monitor SLIs, SLOs, and SLAs to maintain service reliability
Implement automation to reduce manual operations and improve system efficiency
Lead incident management, root cause analysis (RCA), and post-mortem processes

Talentus

About Talentus

At Talentus, we are an organization with over 30 years of experience in delivering talent, providing solutions and transforming businesses. Our team has successfully delivered quality-based IT talent, by helping organizations through their digital enablement initiatives aimed at competing in this new digital era, driving business growth, and having a healthy and sustainable business model across the globe. Our focus is around four very specific set of services: • We provide quality-based near-shore Smart Sourcing on an individual staffing need. • We provide Dedicated Teams that have all the roles and skills required to accomplish and IT job. • We deliver Project-based Solutions, starting from the design, blue printing, architecture and into a full-fledge production solution, using either traditional or agile methodologies. • We can take on the maintenance & support of your existing legacy solutions or other platforms, with SLA’s and process improvements in place, that include minor enhancements. We also have several centers of excellence around AI, Quality Assurance & Engineering, Program Management, and ERP solutions including Salesforce, ServiceNow, Oracle and SAP. Lastly, with presence in 20 countries across the globe, we are able to deliver the services required by highly demanding organizations.

Founded: 2018

Company size: 11 - 50

Website LinkedIn See all jobs →

Job description

At Talentus Global, we are looking for you!

We are a U.S. company with a strong presence in LATAM and across 20+ countries around the world. Some of our key near-shore BPO services include: smart-sourcing, dedicated or cluster teams, managed IT services, software outsourcing, and top ERP & CRM solutions—driven by our practices across many industries, including Higher Education.

We are currently looking for a Site Reliability Engineer (SRE), to become a valuable addition to our dynamic team!

Responsibilities:

Ensure high availability, reliability, and performance of applications and infrastructure.
Define and monitor SLIs, SLOs, and SLAs to maintain service reliability.
Implement automation to reduce manual operations and improve system efficiency.
Monitor systems, detect anomalies, and respond to incidents in a timely manner.
Lead incident management, root cause analysis (RCA), and post-mortem processes.
Collaborate with development and DevOps teams to improve system resilience and scalability.
Manage observability tools (monitoring, logging, tracing) to gain system insights.
Optimize system performance, capacity planning, and cost efficiency.
Implement reliability best practices, including redundancy, failover, and disaster recovery.
Continuously improve system reliability through proactive engineering initiatives.

Qualifications:

4 to 6 years of experience in Site Reliability Engineering, DevOps, or related roles.
Strong understanding of system reliability, scalability, and performance engineering.
Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, or cloud-native tools).
Familiarity with cloud platforms such as Azure, AWS, or GCP.
Experience with scripting or programming languages ( Python, Go, Bash).
Knowledge of CI/CD pipelines and DevOps practices.
Experience with containerization and orchestration tools (Docker, Kubernetes).
Strong troubleshooting and incident management skills.
Understanding of networking, distributed systems, and system architecture.
Experience working in Agile/Scrum environments.
Advanced English proficiency skills (C1) required.
Must have experience working for US clients