Offer summary
Qualifications:
Bachelor’s in Computer Science or equivalent experience, 4+ years as a Site Reliability Engineer, Experience with AWS, Azure, or GCP, Knowledge of observability stacks like Prometheus, Grafana, Familiarity with incident management tools.
Key responsabilities:
- Build tools for standardized multi-datacenter deployment
- Lead Production Readiness Review and change management processes
- Define metrics and SLAs/SLOs for system monitoring
- Implement incident response protocols and facilitate communication during incidents
- Design recovery tools to expedite system restoration