Key Facts

Remote From:

Michigan (USA) , Texas (USA)

Category: Site Reliability Engineer (SRE)

Full time

Senior (5-10 years)

English

Hard Skills

Cloud Computing Terraform Kubernetes Site Reliability Engineering Datadog Linux Prometheus (Software) Observability Containerization PkMS +24 more

Other Skills

•
Collaboration
•
Adaptability
•
Leadership
•
Teamwork
•
Lateral Communication
•
Mentorship
•
Lifelong Learning
•
Problem Solving

Roles & Responsibilities

5+ years of experience in SRE, DevOps, Platform Engineering, or Infrastructure Engineering
Cloud experience supporting production SaaS systems on Azure (preferred), AWS, or GCP
Strong Linux, networking, and distributed systems troubleshooting skills
Experience with containers and orchestration (Kubernetes/EKS/AKS) and Infrastructure-as-Code (Terraform)

Requirements:

Own and improve reliability, availability, and performance of production systems, defining and operationalizing SLIs/SLOs and error budgets
Design and implement autonomous AI agents for monitoring distributed systems and applications, consuming multi-source observability data (metrics, logs, traces, etc.)
Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems
Build automated workflows and maintain Infrastructure-as-Code with Terraform to eliminate manual work

Job description

Cybercrime is rising, reaching record highs in 2024. According to the FBI's IC3 report, total losses exceeded $16 billion. With investment fraud and BEC scams at the forefront, the message is clear: the real estate sector remains a lucrative target for cybercriminals. At CertifID, we take this threat seriously and provide a secure platform that verifies the identities of parties involved in transactions, authenticates wire transfer instructions, and detects potential fraud attempts. Our technology is designed to mitigate risks and ensure that every transaction is conducted with confidence and peace of mind.

We know we couldn’t take on this challenge without our incredible team. We have been recognized as one of the Best Startups to Work for in Austin, made the Inc. 5000 list, and won Best Culture by Purpose Jobs three years in a row. We are guided by our core values and our vision of a world without wire fraud. We offer a dynamic work environment where you can contribute to meaningful impact and be part of a team dedicated to enhancing security and fighting fraud.

We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive reliability improvements across our production SaaS environment. You’ll play a critical role in building scalable infrastructure patterns, advancing observability, improving incident response, and partnering with engineering teams to embed reliability into system design and delivery.

This role is ideal for an experienced Sr. SRE who enjoys solving complex operational problems, building automation, and mentoring others.

What You’ll Do

Reliability & Platform Operations: Own and improve the reliability, availability, and performance of production systems while defining and operationalizing SLIs/SLOs and error budgets.

AI Agent Enablement: Design and implement autonomous and semi-autonomous AI agents for monitoring distributed systems and applications. Build agents capable of consuming multi-source observability data (metrics, logs, traces, etc.).

Incident Response: Participate in and help lead an on-call rotation, serving as an escalation point for major incidents and facilitating blameless postmortems.

Automation & Infrastructure: Build automated workflows to eliminate manual work and design/maintain Infrastructure-as-Code with Terraform.

Observability: Improve metrics, logs, traces, and alerting using tools like Datadog or Prometheus to reduce noise and increase signal.

Collaboration & Mentorship: Partner with application teams to implement reliability best practices and mentor junior engineers to foster a culture of knowledge sharing.

Who You Are

Strategic Architect: You look beyond the "what" to understand the "why," providing insights that influence our GTM and technical direction.

Startup Veteran: You are comfortable moving fast and staying proactive in an environment where the playbook is still being written.

Relatable & Adaptable: You can navigate different personalities across the organization, from high-energy sales teams to analytical engineering partners.

Lifelong Learner: You have a thirst for learning, keeping up with emerging technologies and industry trends.

What We're Looking For

Experience: 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.

Cloud Expertise: Proven experience supporting production SaaS systems in Azure (preferred), AWS, or GCP.

Technical Stack: Strong Linux, networking, and distributed systems troubleshooting skills.

Containers: Strong experience with containers and orchestration (Kubernetes/EKS/AKS).

IaC & Tooling: Expertise with Infrastructure-as-Code (Terraform strongly preferred).

Programming: Strong scripting/programming skills in Python, Go, Bash, or C#/.NET.

Observability: Hands-on experience with Datadog, Prometheus/Grafana, or OpenTelemetry.

What We Offer

Flexible vacation

12 company-paid holidays

10 paid sick days

No work on your birthday

Health, dental, and vision Insurance (including a $0 option)

401(k) with matching, and no waiting period

Equity

Life insurance

Generous parental paid leave

Wellness reimbursement of $300/year

Remote worker reimbursement of $300/year

Professional development reimbursement

Competitive pay

An award-winning culture

Not sure if you check all the boxes? Apply anyway!

We know that great talent comes in many forms, and we value potential just as much as experience. If you're excited about this role and believe you can grow into it, we’d love to hear from you. We’re looking for people who are eager to learn, adapt, and solve challenges—so if that sounds like you, don’t let a checklist hold you back!

Change doesn't happen overnight, and the same goes for us here at CertifID. We evolve collectively and individually as we grow by leaning into the core values that define us. As we grow, we embody GRIT—collectively and individually—to raise the bar and influence outcomes in everything we do. Guard the Customer - Raise the Bar - Influence Outcomes - Teamwork Wins

Ready to apply?

APPLY

Share ·

Site Reliability Engineer (SRE) Related jobs

Michigan (USA)Site Reliability Engineer (SRE)

Site Reliability Engineer - Insurance Platform (Remote, China)

Today

Bjak

Full time

Site Reliability EngineeringObservabilityCI/CDDistributed ComputingIncident Response

Senior Site Reliability Engineer

1 day ago

Airalo

Full time

Amazon Web ServicesKubernetesObservabilityContainerizationIncident Management

Staff Site Reliability Engineer

2 days ago

Assured

Full time

Infrastructure ManagementDatabase ManagementSoftware As A Service (SaaS)PostgreSQLTerraform

Reliability Engineer

2 days ago

Chelsea Avondale

Full time

Amazon Web ServicesPython (Programming Language)Reliability EngineeringSystems DesignSystem Monitoring

Site Reliability Engineer (SRE) at Avyka

2 days ago

InOrg Global

Full time

Elastic (ELK) StackScriptingDevOpsGrafanaPrometheus (Software)

Other jobs at CertifID

Senior Data Engineer

30+ days ago

CertifID

Full time
Senior (5-10 years)

Data ModelingData Warehouse ArchitecturesSQL (Programming Language)Data ArchitectureDimensional Modeling

Staff Software Engineer

30+ days ago

CertifID

Full time
Senior (5-10 years)

Payment ProcessingFraud DetectionSystems ArchitectureTechnical LeadershipSystem Programming

Support Specialist (Pacific Time)

30+ days ago

CertifID

Full time
Mid-level (2-5 years)

Issue ManagementHelp Desk SupportEscalation ManagementDebuggingKnowledge Base

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.