Strong technical expertise in Site Reliability Engineering (SRE) and cloud infrastructure., Experience designing, building, and maintaining scalable, secure systems in AWS., Proven ability to define and evolve SRE strategies, SLIs/SLOs, and error budgets., Excellent incident response, troubleshooting, and automation skills..
Key responsibilities:
Own and evolve the Production Operations strategy and roadmap.
Lead incident response efforts and conduct postmortems.
Design and scale cloud infrastructure focusing on high availability and security.
Mentor engineers and promote best practices in reliability and scalability.
Report this Job
Help us maintain the quality of our job listings. If you find any issues
with this job post, please let us know. Select the reason you're reporting
this job:
Greenlight is a debit card and money app for families. Our mission is to shine a light on the world of money for families and empower parents to raise financially-smart kids.
Millions of parents and kids use Greenlight to earn, save, spend wisely, give, and invest. Parents can set flexible spend controls, manage chores, automate allowances, and invest for their kids’ futures.
The Greenlight team calls Atlanta home, but we have team members across the country. We’re pet enthusiasts, PTA presidents, fantasy football champs, kickball-mates, and volunteer dance teachers. We’re backed by Drive Capital, JP Morgan Chase, Wells Fargo, TTV Capital, Relay Ventures, NEA, Amazon, Ally Financial, SunTrust Bank, and Synchrony Financial. We were picked for CB Insights’ Fintech 250.
Inspired? Apply to join our team: https://greenlight.com/careers/
Greenlight is the leading family fintech company on a mission to help parents raise financially smart kids. We proudly serve more than 6 million parents and kids with our awardwinning banking app for families. With Greenlight, parents can automate allowance, manage chores, set flexible spend controls, and invest for their family’s future. Kids and teens learn to earn, save, spend wisely, and invest.
At Greenlight, we believe every child should have the opportunity to become financially healthy and happy. It’s no small task, and that’s why we leap out of bed every morning to come to work. Because creating a better, brighter future for the next generation depends on it.
Greenlight is looking for a Principal Engineer, Production Operations to join our growing team!
As a Staff Engineer, you will be a technical leader and individual contributor within our production operations function. You will be responsible for designing, building, and maintaining highly reliable, scalable, and performant cloud infrastructure and systems. You will play a critical role in driving technical excellence, mentoring junior engineers, and solving our most complex scalability and reliability challenges.
Your daytoday:
Deep technical expertise in Site Reliability Engineering (SRE) and cloud infrastructure, with a strong track record of driving operational excellence at scale.
Proven ability to define and evolve SRE strategy, SLIsSLOs, and error budgets in alignment with business and product goals.
Extensive experience architecting, building, and maintaining highly available, secure, and scalable systems in AWS.
Strong incident response and triage skills, with experience leading critical outages and conducting blameless postmortems to drive systemic improvements.
A systemsthinking mindset focused on longterm reliability, root cause analysis, and continuous improvement.
Passion for automation and infrastructureascode, with a history of reducing manual toil and improving system resilience through tooling and process innovation.
Curiosity and technical depth to assess, prototype, and scale emerging SRE and cloud technologies.
Ability to influence and collaborate across engineering, product, and security teams to embed reliability and scalability into product architecture and development workflows.
Thought leadership in the SRE & cloud space, including the ability to mentor engineers informally and drive best practices across teams.
What you’ll bring to the team:
Own and evolve the Production Operations strategy and roadmap in partnership with engineering and product leadership.
Define and implement reliability standards across services, including SLIsSLOs and error budgets.
Design, implement, and scale cloud infrastructure with a focus on high availability, security, and performance (primarily on AWS).
Lead highimpact incident response efforts, and drive followups to ensure longterm resolutions and knowledge sharing.
Identify and eliminate sources of operational toil through automation and tooling.
Continuously improve monitoring, alerting, and observability across systems and services.
Evaluate and introduce new tools, platforms, and practices to enhance system reliability and engineering velocity.
Collaborate with crossfunctional teams to embed SRE principles throughout the development lifecycle.
Act as a technical expert and advocate for reliability engineering across the organization.