Match score not available

Site Reliability Engineering Manager

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Proficiency in at least one programming language such as Python, Go, or Java., Solid understanding of operating systems, networking, and distributed systems., Experience in incident management and root cause analysis., Strong communication skills for collaboration with technical and non-technical stakeholders..

Key responsabilities:

  • Lead a team in setting priorities and aligning with organizational goals.
  • Develop tools for automation and improve system reliability.
  • Participate in on-call rotations for incident response and troubleshooting.
  • Collaborate with development teams to ensure service quality and reliability.

General Motors Europe logo
General Motors Europe TPE https://www.gm.com/
11 - 50 Employees
See all jobs

Job description

Job Description

As an SRE Engineering Manager, you will be expected to not only lead your team in setting priorities and ensuring alignment with organizational goals but also to be deeply technical. We expect our managers to be able to contribute directly through coding, reviewing code, and mentoring engineers. While it's unlikely that you'll spend the majority of your time coding, having the capability and willingness to dive into technical details, solve problems hands-on, and support your team's technical decisions is crucial. You'll be a mentor, guide, and a partner, helping engineers grow, and ensuring the reliability and efficiency of the systems they are working on. We believe in setting a high bar for engineering managers who can lead by example in both technical expertise and people leadership.

Required Experience:

  • Automation and Reliability Improvements: Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
  • Observability and Monitoring: Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents.
  • Incident Response: Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution.
  • Collaboration with Development Teams: Work alongside developers to ensure the quality, scalability, and reliability of our services. Practice shared ownership of services in production, fostering a "You build it, you run it" culture.
  • Service Level Management: Manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to manage reliability expectations effectively.
  • Engineering for Reliability: Strong understanding of common application reliability patterns, with hands-on experience implementing them.
  • Failure Analysis and Post-Incident Reviews: Conduct deep-dive analyses of incidents and collaborate on post-incident reviews to derive learnings and prevent recurrence. Champion a culture of continuous improvement.
  • Cost Efficiency: Evaluate system performance and advocate for optimisations that reduce infrastructure costs while maintaining service reliability.

Skills and Qualifications:

  • Programming Skills: Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems.
  • Systems Knowledge: Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
  • Strong Understanding of System Fundamentals: Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures. Ability to optimize or troubleshoot code by understanding its execution and the impact on system resources.
  • Incident Management: Experience handling production incidents, including root cause analysis, mitigation, and working through complex system failures.
  • Communication and Collaboration: Strong communication skills, with an ability to explain technical concepts to both engineering and business stakeholders. Commitment to collaborative problem-solving and shared ownership of services.
  • Automation Focus: Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems

Additional Job Description

Preferred Experience:

  • Experience with cloud platforms (AWS, GCP, Azure).
  • Familiarity with container orchestration systems like Kubernetes.
  • A track record of managing or developing distributed systems.
  • Prior experience with Java in production.

About GM

Our vision is a world with Zero Crashes, Zero Emissions and Zero Congestion and we embrace the responsibility to lead the change that will make our world better, safer and more equitable for all.

Why Join Us 

We aspire to be the most inclusive company in the world. We believe we all must make a choice every day – individually and collectively – to drive meaningful change through our words, our deeds and our culture. Our Work Appropriately philosophy supports our foundation of inclusion and provides employees the flexibility to work where they can have the greatest impact on achieving our goals, dependent on role needs. Every day, we want every employee, no matter their background, ethnicity, preferences, or location, to feel they belong to one General Motors team.

Diversity Information

General Motors is committed to being a workplace that is not only free of discrimination, but one that genuinely fosters inclusion and belonging. We strongly believe that workforce diversity creates an environment in which our employees can thrive and develop better products for our customers.   We understand and embrace the variety through which people gain experiences whether through professional, personal, educational, or volunteer opportunities. 

We encourage interested candidates to review the key responsibilities and qualifications and apply for any positions that match your skills and capabilities.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Systems Thinking
  • Collaboration
  • Communication

Site Reliability Engineer (SRE) Related jobs