Logo for Expedite Technology Solutions LLC

Senior Manager SRE

Key Facts

Remote From: 
Full time
Senior (5-10 years)
English

Other Skills

  • Strategic Thinking
  • Communication
  • Team Effectiveness

Roles & Responsibilities

  • 10+ years in engineering, operations, or SRE roles
  • 5+ years leading SRE, platform, or reliability-focused teams
  • Proven experience implementing SRE practices at scale (SLIs, SLOs, error budgets)
  • Strong background in cloud environments (AWS, Azure, GCP)

Requirements:

  • Drive adoption of the SRE operating model across application teams
  • Define and enforce SLIs, SLOs, and Error Budgets
  • Partner with application product teams and engineering operations
  • Lead adoption of centralized observability standards

Job description

Key Responsibilities
SRE Activation & Operating Model
∙ Drive adoption of the SRE operating model across application teams
∙ Establish clarity in roles between:
o SRE
o Production Support Engineering (PSE)
o Application teams
∙ Ensure SRE practices are embedded into the development lifecycle, not treated as post-production activities

Reliability Standards & Governance
∙ Define and enforce:
o SLIs, SLOs, and Error Budgets
o Production readiness criteria
o Reliability best practices
∙ Lead SLO adoption and compliance reviews across the organization
∙ Establish governance frameworks to ensure consistent application of standards

Cross-Team Coordination & Enablement
∙ Partner with:
o Application product teams
o Production Support Engineering (MG team)
o Platform / Infrastructure / Observability teams
∙ Drive alignment and reduce friction between engineering and operations
∙ Ensure clear handoffs, escalation models, and operational ownership

Observability & Monitoring Strategy
∙ Lead adoption of centralized observability standards across:
o Metrics
o Logging
o Tracing
∙ Align tooling (AppDynamics, Splunk, Prometheus, etc.)
∙ Ensure monitoring and alerting are SLO-driven and actionable, not noise-based

Incident Management & Continuous Improvement
∙ Partner with PSE to strengthen:
o Incident management processes
o RCA (Root Cause Analysis) standards
∙ Drive identification of patterns and systemic issues
∙ Ensure learnings translate into engineering improvements and automation

Automation & Reliability Engineering
∙ Identify opportunities to:
o Reduce manual operational work
o Improve system resilience
o Enable self-healing capabilities
∙ Promote a culture of engineering over reaction

Reporting & Organizational Insight
∙ Define and track reliability metrics across FS&I
∙ Build reporting that provides visibility into:
o System health
o Incident trends
o SLO performance
∙ Translate technical data into actionable business insights

Required Qualifications
∙ 10+ years in engineering, operations, or SRE roles
∙ 5+ years leading SRE, platform, or reliability-focused teams
∙ Proven experience implementing SRE practices at scale (SLIs, SLOs, error budgets)
∙ Strong background in cloud environments (AWS, Azure, GCP)
∙ Hands-on experience with observability tools (Splunk, AppDynamics, Prometheus, etc.)
∙ Experience in incident management and production operations at scale
∙ Ability to operate effectively in high-pressure and complex enterprise environments

Preferred Qualifications
∙ Experience driving organizational transformation (not just technical implementation)
∙ Strong understanding of CI/CD, DevOps, and automation practices
∙ Experience working in regulated or large enterprise environments
∙ Familiarity with AIOps or advanced automation strategies

Key Success Indicators
∙ Increased adoption of SLOs and reliability standards
∙ Reduction in high-severity incidents over time
∙ Improved MTTR and operational efficiency
∙ Increased adoption of standardized observability practices
∙ Reduction in reactive, ticket-driven work across teams
∙ Clear alignment between SRE, PSE, and application teams

Core Competencies
∙ Strategic thinking with strong execution focus
∙ Ability to drive alignment across multiple teams and stakeholders
∙ Strong communication and influence skills
∙ Bias toward structure, clarity, and accountability
∙ Ability to operate with urgency and discipline in complex environments

Related jobs

Other jobs at Expedite Technology Solutions LLC

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.