Logo for Switchfly

Director, Site Reliability & Operations

Roles & Responsibilities

  • 7+ years in SRE, DevOps, cloud infrastructure, or security engineering, with 4+ years leading technical teams
  • Deep AWS experience across compute, networking, storage, and managed services in a production enterprise environment
  • Hands-on familiarity with the security and compliance discipline — experience operating in PCI, SOC 2, or equivalent regulated environments and understanding what compliance actually requires
  • Ability to operate as a business stakeholder within SAFe or similar delivery frameworks to integrate security, reliability, and compliance work into the roadmap alongside feature development

Requirements:

  • Own site reliability and availability for our cloud-hosted platform — 24x7 uptime, monitoring, alerting, anomaly detection, and incident response programs
  • Drive security outcomes across the platform — tracking findings from SonarQube, penetration tests, and vulnerability tools, and prioritizing remediation work in the engineering delivery pipeline
  • Own PCI Level 1 compliance currency — when standards evolve, you understand the requirement, translate it into engineering terms, and drive adoption; you do not just surface the factoid
  • Participate as a business stakeholder in SAFe planning — bringing security, compliance, and reliability work into the engineering delivery pipeline alongside feature development

Job description

Switchfly is hiring a Director of Site Reliability & Operations to own the operational backbone of a platform that processes travel and loyalty transactions for some of the world’s largest airlines and financial institutions. This is not a role for someone who manages by dashboard and delegates by email. We need a technically engaged leader with a full tool belt — someone who understands our platform deeply enough to participate in the hard conversations, and who owns outcomes rather than activities. 

This role enables 50+ developers to ship secure, PCI-compliant releases at least weekly — supporting the DevOps culture that makes that pace sustainable. We need a leader who supports delivery velocity, spends the reliability and change budget strategically, and drives toward faster, safer delivery. Security and compliance set the floor; velocity is the ambition. And as AI reshapes how software is built and operated, we expect this leader to help us embrace it thoughtfully — not lock it out. You will lead a team of SRE, DevOps, DBA, network, security, and corporate IT professionals, working in close partnership with engineering leadership across a PCI Level 1-compliant, 24x7 enterprise platform. 



Responsibilities 

  • Own site reliability and availability for our cloud-hosted platform — 24x7 uptime, monitoring, alerting, anomaly detection, and incident response programs 
  • Drive security outcomes across the platform — tracking findings from SonarQube, penetration tests, and vulnerability tools, and acting as a business stakeholder to get remediation work scoped, prioritized, and into the engineering delivery pipeline 
  • Own PCI Level 1 compliance currency — when standards evolve, you understand the requirement, translate it into engineering terms, and drive adoption; you don’t just surface the factoid 
  • Participate as a business stakeholder in SAFe planning — bringing security, compliance, and reliability work into the engineering delivery pipeline alongside feature development          
  • Lead infrastructure patching and maintenance — OS, database, and system-level currency within our AWS environment, coordinating monthly maintenance windows and CI-driven image refresh cadence 
  • Manage and develop an internationally distributed team across SRE, DevOps, DBA, network, security, and corporate IT functions 
  • Own the AWS cost and capacity budget — monitoring spendoptimizing resource utilization, and making strategic tradeoff decisions in partnership with engineering leadership 
  • Partner with engineering directors to define the boundary between infrastructure and application-layer security, and ensure nothing falls between the cracks 
  • Own vendor outcomes across our cloud and tooling ecosystem — holding partners accountable and ensuring contracts reflect our operational needs 
  • Guide personal and career development of your people 
  • Foster a culture where reliability and security are shared team values, not external mandates 



About You 

  • You have a full tool belt — technically engaged, platform-curious, and willing to log into systems, participate in firefights, and develop genuine understanding of what you’re operating 
  • 7+ years in SRE, DevOps, cloud infrastructure, or security engineering, with 4+ years leading technical teams 
  • Deep AWS experience across compute, networking, storage, and managed services in a production enterprise environment 
  • Hands-on familiarity with the security and compliance discipline — you’ve operated in PCI, SOC 2, or equivalent regulated environments and understand what compliance actually requires versus what it looks like on paper 
  • You operate as a business stakeholder, not just a technical function — comfortable working within SAFe or similar delivery frameworks to get security and reliability work into the roadmap alongside feature development 
  • You manage through credibility and technical engagement, not just title — your team respects you because you understand their work 
  • You are inquisitive, direct, and outcome-oriented — you form opinions, communicate them clearly, and own what happens next 
  • Your colleagues are inspired to follow your lead 



About the Environment 

Our infrastructure is AWS-native — we have no on-premises footprint. The ecosystem includes CloudFlare, Splunk, AppDynamics, Grafana, OpsGenie, Snowflake, Cloud HSM, GitLab, and Jenkins hosted on EC2, alongside standard AWS managed services. PostgreSQL is our primary database. The platform runs Java/Spring Boot and Python backends serving Ember.js and React frontends. Familiarity with any part of this stack accelerates your effectiveness; deep expertise across all of it is not expected. 

Corporate IT supports a primarily remote, technically self-sufficient workforce with a small number of call center, sales, and executive users requiring additional support. The function runs on Okta and Microsoft 365. 



Company Perks & Benefits

At Switchfly, we believe in giving people the flexibility, support, and benefits they need to do their best work.

  • Discretionary Time Off (DTO) – Take time off when you need it. We trust our employees to manage their time responsibly while meeting business needs.
  • 15 Company-Paid Holidays – Including a company-wide break from Christmas Eve through New Year’s Day.
  • Comprehensive Benefits Package – Switchfly offers a full suite of health benefits, with the company covering an average of 87% of employee premiums.
  • 401(k) with Company Match – We support long-term financial wellness with a competitive retirement plan.

 

Switchfly Core Values to consider for this position:  

- Adaptability & Calculated Risk

Our culture of learning is powered by an iterative mindset, a shared desire for high performance, and a willingness to take risks and push through limits.  Our data driven approach fosters new ideas and continuous learning, but also enables us to be flexible and adjust, learning from success and failure.
 

- Ownership and Accountability

Taking professional responsibility reflects an approach that understands the stakes, and how success requires everyone to be truly accountable. We’re visionaries, not mercenaries, and from the CEO to our newest colleague, we own what we do, because the buck stops with each of us. 


At Switchfly, we don’t just accept difference — we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products, and our community. At this time, we're unable to provide visa sponsorship for this role. 


Compensation$165,000-$185,000

Site Reliability Engineer (SRE) Related jobs

Other jobs at Switchfly

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.