Join ABBYY and be part of a team that celebrates your unique work style. With flexible work options, a supportive team, and rewards that reflect your value, you can focus on what matters most – driving your growth, while fueling ours.
Our commitment to respect, transparency, and simplicity means you can trust us to always choose to do the right thing.
As a trusted partner for purpose-built AI and intelligent automation, we solve highly complex problems for our enterprise customers and put their information to work to transform the way they do business. Over 10,000 customers trust ABBYY, including many Fortune 500 ones. You will work on further developing a portfolio already containing client names such as DHL, Johnson & Johnson, FDA, DMV, PwC, KeyBank, Spotify, and H&R BLOCK.
We are seeking a highly motivated and experienced Site Reliability Engineering (SRE) Manager to lead our distributed SRE team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our production systems while fostering a culture of operational excellence and continuous improvement.
You will collaborate closely with engineering, product, and infrastructure teams to design and implement robust systems, drive incident response, and lead initiatives that improve system reliability and developer productivity.
- Lead and mentor a team of SREs, fostering a culture of ownership, accountability, and continuous learning.
- Define and drive the SRE roadmap, including reliability goals, SLAs/SLIs/SLOs, and automation initiatives.
- Oversee incident management processes, including on-call rotations, postmortems, and root cause analysis.
- Collaborate with engineering teams to design scalable, fault-tolerant systems and improve CI/CD pipelines.
- Implement and maintain observability tools (monitoring, logging, alerting) to ensure system health and performance.
- Champion best practices in infrastructure as code, configuration management, and cloud-native architecture.
- Drive cost optimization and performance tuning across cloud infrastructure.
- Report on system reliability metrics and provide executive-level updates.
- 5+ years of experience in Site Reliability Engineering, DevOps, or related fields.
- 2+ years of experience in a leadership or managerial role.
- Proven experience managing technical teams of 10 or more individuals.
- Hands-on experience with both AWS and Azure cloud platforms.
- Demonstrated experience in defining and implementing SLIs and SLOs to measure and improve system reliability.
- Proficiency in infrastructure as code tools (Terraform, CloudFormation, etc.).
- Experience with container orchestration (Kubernetes, ECS, etc.).
- Solid understanding of monitoring and observability tools (Prometheus, Grafana, Datadog, etc.).
- Strong scripting or programming skills (Python, Go, Bash, etc.).
- Excellent communication and collaboration skills.
- Experience managing remote or distributed teams.
- Familiarity with compliance and security standards (SOC 2, HIPAA, etc.).
- Background in high-availability systems and disaster recovery planning.
Join ABBYY, and you will:
Love how you work
Love whom you work with
Love what you work on
ABBYY is an Equal Employment Opportunity employer that values the strength that diversity brings to the workplace. To learn more about our commitment to Diversity and Inclusion, check out the careers section on our website.
Quorso
JWay Group, Inc.
Swile
Avra
Pleo