Bachelor’s or master’s degree in computer science, Information Systems, or related field., 14+ years of experience in IT operations, with at least 7 years in cloud infrastructure and leadership., Deep expertise in cloud operations, incident response, and service reliability., Proficiency with operational analytics, observability tools, and ITIL, SRE, DevOps practices..
Key responsibilities:
Oversee daily operations of cloud platforms (AWS, Azure, GCP) ensuring high availability.
Lead incident management lifecycle and coordinate a global incident response team.
Build and lead a data-driven team of cloud engineers, SREs, and analysts.
Drive AI and ML adoption to improve uptime and automate tasks.
Report this Job
Help us maintain the quality of our job listings. If you find any issues
with this job post, please let us know. Select the reason you're reporting
this job:
Granicus connects governments with the people they serve by providing the first and only civic engagement platform for the public sector. Over 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers power an unmatched Subscriber Network that turns government missions into quantifiable results. With comprehensive cloud-based solutions for communications, government website design, meeting and agenda management software, records management, and digital services, Granicus empowers stronger relationships between government and residents across the U.S., U.K., Australia, New Zealand, and Canada. By simplifying interactions with residents, while disseminating critical information, Granicus brings governments closer to the people they serve—driving meaningful change for communities around the globe.
The Senior Director of Cloud Operations is responsible for the operational integrity, performance, and reliability of enterprise cloud environments. This role leads a global, datadriven operations team with a strong emphasis on incident management, service continuity, and continuous improvement. This role reports directly to the Vice President of Cloud.
This position will be responsible for leading a global team of cloud engineers, SRE practice, service management tools and operations using a metricsfirst approach.
What your impact will look like here:
Cloud Infrastructure Operations
Oversee the daily operations of cloud platforms (AWS, Azure, GCP), ensuring high availability and performance across global regions.
Lead the development and execution of operational runbooks, SOPs, and escalation paths.
Incident Management & Response
Own the endtoend incident management lifecycle: detection, triage, escalation, resolution, and postincident review.
Lead a global incident response team with 247 coverage, ensuring seamless handoffs across time zones.
Implement realtime monitoring, alerting, and automated remediation to reduce MTTD and MTTR.
Use data analytics to identify incident trends, recurring issues, and systemic risks.
Conduct blameless postmortems and ensure corrective actions are prioritized and tracked to closure.
DataDriven Operational Leadership
Build and lead a global team of cloud engineers, SREs, and operations analysts using a metricsfirst approach.
Define and track operational KPIs (e.g., uptime, incident frequency, resolution time, change success rate) to drive accountability and performance.
Leverage dashboards and analytics platforms (e.g., Datadog, Grafana, Splunk, ServiceNow) to provide realtime visibility into system health and team performance.
Use data to inform staffing models, oncall rotations, and workload balancing across regions.
Foster a culture of continuous improvement through databacked retrospectives and operational reviews.
AI enabled Focus
Drive AI and ML adoption in operational workflows (e.g., predictive monitoring, incident pattern analysis etc.,) to improve uptime and automate repetitive tasks.
Define and execute AIdriven observability strategy using tools like AIOps platforms for intelligent alerting and root cause analysis.
Collaborate with Engineering, Security, and Product teams to embed AIenabled automation in deployment pipelines, change management etc.,.
Establish and maintain SLOsSLAs leveraging AIgenerated insights to prioritize engineering work that improves reliability and customer experience.
Oversee incident management, postmortems, and continuous improvement, incorporating AI tools for impact analysis and knowledge retention.
Operational Governance
Define and enforce SLAs, SLOs, and operational KPIs.
Ensure compliance with security, regulatory, and audit requirements.
Manage change control, configuration management, and release processes to minimize operational risk.
Cost & Vendor Management
Monitor and optimize cloud spend through cost governance and usage analysis.
Manage vendor relationships, contracts, and servicelevel agreements.
Collaboration & Communication
Partner with engineering, security, and business teams to align operations with product and service goals.
Provide regular reporting and updates to executive leadership on operational health, risks, and incident trends.
Education
Bachelor’s or master’s degree in computer science, Information Systems, or related field.
Experience
14+ years in IT operations, with 7+ years in cloud infrastructure and operations leadership.
Proven experience leading global teams and managing highseverity incidents in largescale environments.
Skills
Deep expertise in cloud operations, incident response, and service reliability.
Strong knowledge of ITIL, SRE, and DevOps practices.
Proficiency in operational analytics and observability tools.
Excellent leadership, communication, and crossfunctional collaboration skills.
Strong presentation skills, including experience presenting to large global audiences.
Certifications (Preferred)
AWS Certified DevOps Engineer – Professional
Azure Administrator Associate
ITIL Foundation or Practitioner
The Team
We area globally distributed workforce across the United States, Canada, United Kingdom, India, Armenia, Australia, and New Zealand.
The Culture
At Granicus, we are building a transparent, inclusive, and safe space for everyone who wants to be a part of our journey. A few culture highlights include –
Employee Resource Groups to encourage diverse voices
Coffee with Mark sessions – Our employees get to interact with our CEO on very important and sometimes difficult issues ranging from mental health to work life balance and current affairs.
Embracing diversity & fostering a culture of ideation, collaboration & meritocracy
We bring in special guests from time to time to discuss issues that impact our employee population
The Company
Serving the People Who Serve the People
Granicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. We are on a mission to support our customers with meeting the needs of their communities and implementing our technology in ways that are equitable and inclusive. Granicus has consistently appeared on the GovTech 100 list over the past 5 years and has been recognized as the best companies to work on BuiltIn.
Over the last 25 years, we have served 5,500 federal, state, and local government agencies and more than 300 million citizen subscribers power an unmatched Subscriber Network that use our digital solutions to make the world a better place. With comprehensive cloudbased solutions for communications, government website design, meeting and agenda management software, records management, and digital services, Granicus empowers stronger relationships between government and residents across the U.S., U.K., Australia, New Zealand, and Canada. By simplifying interactions with residents, while disseminating critical information, Granicus brings governments closer to the people they serve—driving meaningful change for communities around the globe.