Match score not available

Manager, Site Reliability Engineering

Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Experience as a Site Reliability Engineer in cloud environments, Experience managing a team of Site Reliability Engineers, Experience managing infrastructure in Azure, Experience with Kubernetes infrastructure in the cloud, Familiarity with monitoring and observability practices.

Key responsabilities:

  • Manage a 24x7 Azure cloud infrastructure team
  • Ensure security, monitoring, alerting for infrastructure
  • Troubleshoot and resolve escalated issues
  • Develop processes, tools, and documentation
  • Participate in on-call rotation for escalation capacity
Flexera  logo
Flexera Computer Software / SaaS Large https://www.flexera.com/
1001 - 5000 Employees
See more Flexera offers

Job description

Flexera saves customers billions of dollars in wasted technology spend. A pioneer in Hybrid ITAM and FinOps, Flexera provides award-winning, data-oriented SaaS solutions for technology value optimization (TVO), enabling IT, finance, procurement and cloud teams to gain deep insights into cost optimization, compliance and risks for each business service. Flexera One solutions are built on a set of definitive customer, supplier and industry data, powered by our Technology Intelligence Platform, that enables organizations to visualize their Enterprise Technology Blueprint™ in hybrid environments—from on-premises to SaaS to containers to cloud.

We’re transforming the software industry.  We’re Flexera.  With more than 50,000 customers across the world, were achieving that goal. But we know we can’t do any of that without our team Ready to help us re-imagine the industry during a time of substantial growth and ambitious plans?  Come and see why we’re consistently recognized by Gartner, Forrester and IDC as a category leader in the marketplace. Learn more at flexera.com

Build, grow and lead a team that is responsible for implementing the Site Reliability Engineering practices and tools that continually improve the operational readiness, instrumentation, reliability, performance and scalability of Flexera’s Snow Atlas global cloud infrastructure, platform and products. The team is central to the success of Flexera’s SaaS solutions and stakeholders will rely on your knowledge and expertise of SRE and DevOps practices.

Adopting DevOps principles of delivery, the manager is responsible for the deliverables of the central team and works with stakeholders to enable Site Reliability Engineers. The manager will engage with stakeholders to identify and deliver the highest value / priority work that improves SRE capabilities, tools and services. Generation of actionable insights from qualitative and quantitative metrics to continually improve the operational reliability of Snow’s systems.

What you will be doing:

  • Lead, manage and coach a team of Site Reliability Engineers (SREs) responsible for building and maintaining Flexera’s Snow Atlas platform infrastructure and tooling. Manage the day-to-day execution of high-quality, prioritized, deliverables of SRE best practices ensuring the reliability, scalability, instrumentation, automation and performance of Snow’s cloud SaaS products.
  • Being a passionate advocate of the SRE discipline and DevOps principles you will engage, influence, seek feedback, and evangelize best practices with development, operational and support teams to enable stakeholders to support self-service and “you build-it – you run it”.
  • Manage the operational reliability, fault-tolerance, performance, scalability, observability and efficiency of Flexera’s cloud platforms and products across environments.
  • Work on incidents in conjunction with team members and coordinating with wider stakeholders to resolve customer impacting service issues promptly.
  • Partners with security and other “shared services” teams to align, automate, integrate and orchestrate specialist tooling into a common set of SRE best practices that supports the wider Software Delivery Lifecycle and Product Lifecycle.
  • Plan and execute projects in support of the SRE objectives, and ensure projects are delivered with high quality, on time, and within budget
  • Hire, develop and retain a highly skilled SRE team
  • Evaluate hardware and software technologies to improve efficiency and performance

Responsibilities:

  • Manage a team responsible for supporting an international, 24x7, Azure cloud infrastructure powering Flexera’s customer facing service offerings
  • Participate in the design, implementation, and operation of a scalable and reliable systems infrastructure supporting a fast-growth SaaS offering
  • Ensure proper security, monitoring, alerting, and reporting for the infrastructure
  • Troubleshooting and resolving escalated issues
  • Capacity planning for all aspects of the infrastructure
  • Developing and maintaining processes, tools, and documentation in support of the production environment
  • Participate in evaluation of new software, hardware and infrastructure solutions
  • Participation in an on-call rotation and be available 24x7 in an escalation capacity

Required skills and knowledge:

  • Experience as a Site Reliability Engineering in cloud environments
  • Experience managing a team of Site Reliability Engineers
  • Experience managing infrastructure in Azure
  • Experience managing Kubernetes infrastructure in the cloud.
  • Experience in Monitoring & Observability practices in the cloud including tooling, logging, metrics, tracing, and alerting
  • Experience with IaC and Containers to achieve scalable, reliable, performant and secure SaaS platform infrastructure
  • Experience of CI/CD tooling to automate, orchestrate and integrate continuous delivery pipelines

Flexera is proud to be an equal opportunity employer.  Qualified applicants will be considered for open roles regardless of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by local/national laws, policies and/or regulations. 

Flexera understands the value that results from employing a diverse, equitable, and inclusive workforce. We recognize that equity necessitates acknowledging past exclusion and that inclusion requires intentional effort. Our DEI (Diversity, Equity, and Inclusion) council is the driving force behind our commitment to championing policies and practices that foster a welcoming environment for all.

We encourage candidates requiring accommodations to please let us know by emailing careers@flexera.com.

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Industry :
Computer Software / SaaS
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Troubleshooting (Problem Solving)
  • Team Management

Site Reliability Engineer (SRE) Related jobs