Logo for Commit

Tech Ops - Production Support & Reliability (AWS)

Roles & Responsibilities

  • 3+ years of production support, SRE, NOC, or ops engineering experience
  • Hands-on AWS experience with EC2/ECS, VPC networking, and IAM
  • Operational PostgreSQL / RDS experience (slow query analysis, basic tuning, vacuum awareness)
  • Incident triage across infrastructure and application layers with structured incident response and SLA management in a ticketed environment (Jira or similar)

Requirements:

  • Provide front-line production support for Braviant's AWS multi-account stack, including monitoring systems, triaging alerts, executing runbooks, and escalating cleanly to developers
  • Conduct incident triage across infrastructure and application layers and participate in structured incident response (ITIL, NIST, or equivalent)
  • Manage SLAs in a ticketed environment (Jira or similar) and document post-incident learnings
  • Maintain a defensive ownership mindset, owning reliability issues without performing development work

Job description

Description

We are looking for Tech Ops - Production Support & Reliability Lead

Front-line production support for Braviant's AWS multi-account stack. Monitor systems, triage alerts, execute runbooks, escalate cleanly to developers. Defensive ownership role - not a developer role despite "Lead" in title.

Stack:

  • AWS - VPC, ECS, Lambda (SAM/CloudFormation), IAM, NAT, security groups
  • PostgreSQL on Amazon RDS (~15 instances)
  • Datadog + CloudWatch (APM, logs, alerting)
  • Java microservices / API-heavy app stacks
  • Jira (ITSM) + Slack (ops channels)
  • Nice-to-have: AWS data services (Glue, S3, Athena, EventBridge), Metaplane


Requirements

Must-have:

  • 3+ years production support / SRE / NOC / ops engineering
  • Hands-on AWS - EC2/ECS, VPC networking, IAM
  • Operational PostgreSQL / RDS - slow query reading, basic tuning, vacuum awareness
  • Incident triage across infra + app layers
  • Structured incident response (ITIL, NIST, or equivalent)
  • SLA management in a ticketed environment (Jira or similar)
  • Strong written English for escalation + post-incident write-ups

Nice-to-have:

  • Datadog / CloudWatch fluency
  • AWS data services (Glue, S3, Athena, EventBridge)
  • Basic IaC (CloudFormation, SAM, Terraform)
  • Financial services or other regulated-environment background
  • AWS SysOps Administrator or Solutions Architect cert
  • Scripting / automation


Related jobs

Other jobs at Commit

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.