Logo for Mercury Insurance

Manager Site Reliability Operations

Key Facts

Remote From: 
Full time
119 - 231K yearly
English

Other Skills

  • Communication
  • Problem Solving
  • Team Leadership
  • Analytical Skills

Roles & Responsibilities

  • Bachelor’s degree in computer science, Information Systems, Engineering, or related field
  • 7+ years of experience in IT operations, SRE, DevOps, or related roles
  • 3+ years of experience in a lead or management role overseeing technical teams in a 24x7 environment
  • Strong understanding of CI/CD pipelines and observability practices

Requirements:

  • Lead the Site Reliability Operations team responsible for observability, real-time monitoring, incident response, and operational excellence
  • Partner with various teams to embed CI/CD and release best practices into operations
  • Oversee service reliability monitoring and incident management
  • Own and mature the Problem Management function, driving root cause analysis and ensuring corrective actions are implemented

Job description

Overview:

Join an amazing team that is consistently recognized for our achievements and culture, including our most recent Forbes award of being one of America's Best Midsize Employers for 2026!

 

Position Summary: 

The Site Reliability Operations (SRO) Manager leads the team responsible for end-to-end observability, real-time monitoring, and operational response across Mercury’s production and non-production platforms. This role centers on proactive detection of issues, live support during releases, and structured incident and problem management to minimize customer impact and drive long-term stability.  

 

The SRO Manager ensures that services are well-instrumented (metrics, logs, traces, and dashboards), that alerts are actionable and tuned, and that root cause analysis (RCA) and follow-through on corrective actions are consistently executed. The SRO Manager partners closely with application development, DevOps COE, Site Reliability Engineering (SRE), and Infrastructure teams to build release and runtime practices that are observable by design, provide real-time operational support during deployments, and use data-driven insights and automation to continuously improve system resilience, change success rates, and time to recovery.  

 

Geo-Salary Information:

An in-person interview may be required during the hiring process

 

State specific pay scales for this role are as follows:

$118,664 to $230,619 (NJ, NY, WA, HI, AK, MD, CT, RI, MA)

$107,876 to $209,653 (NV, OR, AZ, CO, WY, TX, ND, MN, MO, IL, WI, FL, GA, MI, OH, VA, PA, DE, VT, NH, ME)

$97,089 to $188,688 (UT, ID, MT, NM, SD, NE, KS, OK, IA, AR, LA, MS, AL, TN, KY, IN, SC, NC, WV)

 

In CA: Typical hiring range is $157,177.00  to $218,302.00  

 

The expected base salary for this position will vary depending on a number of factors, including relevant experience, skills and location.

Responsibilities:

Essential Job Functions: 

  • Lead the Site Reliability Operations team, including the Network Operations Center (NOC), responsible for observability, real-time monitoring, incident response, and operational excellence for key enterprise services; set direction, priorities, and success metrics for the team. 
  • Partner with Product Management, Engineering, SRE, and the rest of infrastructure team to embed CI/CD and release best practices into operations, including automated build/test/deploy, health checks, rollbacks, release monitoring via the NOC, and change-management guardrails. 
  • Oversee service reliability monitoring and incident management: ensure appropriate observability (metrics, logs, traces, dashboards), well-tuned alerting thresholds, escalation paths, and effective communications to stakeholders and leadership during incidents. 
  • Own and mature the Problem Management function for the team: drive root cause analysis (RCA) of recurring or high-severity incidents, standardize post-incident reviews, and ensure corrective actions and follow-ups are implemented and verified. 
  • Define, track, and report operational and reliability metrics (e.g., availability, MTTR, incident volume, change failure rate, deployment frequency, problem resolution time); provide regular insights and recommendations to Technology Operations leadership. 
  • Champion automation and “operations as code” (infrastructure as code, configuration as code, automated runbooks), working with engineering teams to reduce manual toil and improve consistency, speed, and safety of operations and releases. 
  • Recruit, develop, coach, and evaluate team members; provide performance feedback, make salary and promotion recommendations, and foster a high-performing, collaborative culture aligned with Mercury’s core values. 
  • Provide leadership coverage for 7x24 mission-critical support through the NOC and on-call rotations; ensure sustainable on-call practices, high-quality runbooks, and continuous improvement of tooling and processes. 

 

Qualifications:

Education: 

Minimum:

  • Bachelor’s degree in computer science, Information Systems, Engineering, or related field, or equivalent combination of education and work experience. 
  • Preferred:  Advanced coursework or certifications or experience in Site Reliability Engineering, DevOps, Cloud platforms, or ITIL). 

Experience: 

Minimum:

  • 7+ years of experience in IT operations, SRE, DevOps, or related roles supporting mission-critical systems. 
  • 3+ years of experience in a lead or management role overseeing technical teams in a 24x7 environment.

Preferred: 

  • Experience leading teams that support services deployed via modern CI/CD pipelines and running on cloud and/or container platforms (e.g., Kubernetes/OpenShift, AWS). Experience integrating operations functions with DevOps/SRE teams, including shared ownership of reliability goals and metrics. 

Knowledge and Skills:

  • Strong understanding of CI/CD pipelines (build, test, security scanning, deployment, rollback) and how they support reliable operations. 
  • Solid knowledge of observability practices and tools (metrics, logs, traces, dashboards, alerts) and how to design actionable monitoring and alerting for production systems. 
  • Deep familiarity with incident and problem management processes, including root cause analysis methods and post-incident review facilitation. 
  • Working knowledge of DevOps/SRE concepts such as SLOs/SLIs, error budgets, resilience patterns, automation to reduce toil, and blameless culture. 
  • Demonstrated ability to lead and influence cross-functional teams, build relationships, and collaborate effectively with engineering, InfoSec, infrastructure, and business stakeholders. 
  • Excellent communication skills, both written and verbal; able to clearly communicate technical issues, risks, and recommendations to technical and non-technical audiences, including senior leadership. 
  • Strong analytical and problem-solving skills; able to analyze operational data and trends to identify risks, drive decisions, and prioritize improvements. 
  • Self-motivated, adaptable, and able to operate with minimal supervision in a fast-changing environment. 
  • Ability to work extended hours, nights, or weekends as needed to support critical releases or resolve major incidents. 

 

About the Company:

Why choose a career at Mercury?

At Mercury, we have been guided by our purpose to help people reduce risk and overcome unexpected events for more than 60 years. We are one team with a common goal to help others. Everyone needs insurance and we can’t imagine a world without it.

Our team will encourage you to grow, make time to have fun, and work together to make great things happen. We embrace the strengths and values of each team member. We believe in having diverse perspectives where everyone is included, to serve customers from all walks of life.

We care about our people, and we mean it. We reward our talented professionals with a competitive salary, bonus potential, and a variety of benefits to help our team members reach their health, retirement, and professional goals.

 

Learn more about us here: https://www.mercuryinsurance.com/about/careers

Perks and Benefits:

We offer many great benefits, including:

  • Competitive compensation
  • Flexibility to work from anywhere in the United States for most positions
  • Paid time off (vacation time, sick time, 9 paid Company holidays, volunteer hours)
  • Incentive bonus programs (potential for holiday bonus, referral bonus, and performance-based bonus)
  • Medical, dental, vision, life, and pet insurance
  • 401 (k) retirement savings plan with company match
  • Engaging work environment
  • Promotional opportunities
  • Education assistance
  • Professional and personal development opportunities
  • Company recognition program
  • Health and wellbeing resources, including free mental wellbeing therapy/coaching sessions, child and eldercare resources, and more

Mercury Insurance is an equal opportunity employer.  All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by federal, state, or local law.

Pay Range: USD $118,664.00 - USD $230,619.00 /Yr.

Site Reliability Engineer (SRE) Related jobs

Other jobs at Mercury Insurance

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.