4-6 years of experience in Site Reliability Engineering, DevOps, or related roles
Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, or cloud-native tools)
Familiarity with cloud platforms (Azure, AWS, or GCP) and containerization/orchestration (Docker, Kubernetes)
Strong scripting/programming skills (Python, Go, Bash) and understanding of CI/CD pipelines
Requirements:
Ensure high availability, reliability, and performance of applications and infrastructure
Define and monitor SLIs, SLOs, and SLAs to maintain service reliability
Implement automation to reduce manual operations and improve system efficiency
Lead incident management, root cause analysis (RCA), and post-mortem processes
Job description
At Talentus Global, we are looking for you!
We are a U.S. company with a strong presence in LATAM and across 20+ countries around the world. Some of our key near-shore BPO services include: smart-sourcing, dedicated or cluster teams, managed IT services, software outsourcing, and top ERP & CRM solutions—driven by our practices across many industries, including Higher Education.
We are currently looking for a Site Reliability Engineer (SRE), to become a valuable addition to our dynamic team!
Responsibilities:
Ensure high availability, reliability, and performance of applications and infrastructure.
Define and monitor SLIs, SLOs, and SLAs to maintain service reliability.
Implement automation to reduce manual operations and improve system efficiency.
Monitor systems, detect anomalies, and respond to incidents in a timely manner.
Lead incident management, root cause analysis (RCA), and post-mortem processes.
Collaborate with development and DevOps teams to improve system resilience and scalability.
Manage observability tools (monitoring, logging, tracing) to gain system insights.
Optimize system performance, capacity planning, and cost efficiency.
Implement reliability best practices, including redundancy, failover, and disaster recovery.
Continuously improve system reliability through proactive engineering initiatives.
Qualifications:
4 to 6 years of experience in Site Reliability Engineering, DevOps, or related roles.
Strong understanding of system reliability, scalability, and performance engineering.
Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, or cloud-native tools).
Familiarity with cloud platforms such as Azure, AWS, or GCP.
Experience with scripting or programming languages ( Python, Go, Bash).
Knowledge of CI/CD pipelines and DevOps practices.
Experience with containerization and orchestration tools (Docker, Kubernetes).
Strong troubleshooting and incident management skills.
Understanding of networking, distributed systems, and system architecture.
Experience working in Agile/Scrum environments.
Advanced English proficiency skills (C1) required.
Must have experience working for US clients
What do we offer?
· Contractor model
· Remote model
· Salary in $USD
· Paid Vacations
· Day off for birthdays
· Benefits courses and/or certifications
-Opportunity to work with top-tier U.S. clients.
-Entrepreneurial, multicultural team culture.
Join us if you have what it takes to be part of the Talentus Global Team!