Logo for Commit

Tech Ops - Production Support & Reliability (AWS)

Roles & Responsibilities

  • 3+ years of production support / SRE / NOC / ops engineering
  • Hands-on AWS experience (EC2, ECS, VPC networking, IAM)
  • Operational PostgreSQL / RDS experience (slow query analysis, basic tuning, vacuum awareness)
  • Structured incident response and SLA management in a ticketed environment (ITIL/NIST or equivalent) with strong written English

Requirements:

  • Provide front-line production support for Braviant's AWS multi-account stack, including monitoring, alert triage, runbook execution, and clean escalation to developers
  • Manage incidents across infrastructure and application layers with structured response (ITIL/NIST); perform post-incident reviews and maintain SLA metrics (Jira)
  • Operate and tune monitoring in Datadog and CloudWatch; coordinate with on-call and engineering teams via Slack/Jira
  • Maintain operational AWS services (EC2/ECS, VPC, RDS) with security, IAM, and reliability focus; uphold defensive ownership (not a developer role)

Job description

Description

We are looking for Tech Ops - Production Support & Reliability Lead

Front-line production support for Braviant's AWS multi-account stack. Monitor systems, triage alerts, execute runbooks, escalate cleanly to developers. Defensive ownership role - not a developer role despite "Lead" in title.

Stack:

  • AWS - VPC, ECS, Lambda (SAM/CloudFormation), IAM, NAT, security groups
  • PostgreSQL on Amazon RDS (~15 instances)
  • Datadog + CloudWatch (APM, logs, alerting)
  • Java microservices / API-heavy app stacks
  • Jira (ITSM) + Slack (ops channels)
  • Nice-to-have: AWS data services (Glue, S3, Athena, EventBridge), Metaplane


Requirements

Must-have:

  • 3+ years production support / SRE / NOC / ops engineering
  • Hands-on AWS - EC2/ECS, VPC networking, IAM
  • Operational PostgreSQL / RDS - slow query reading, basic tuning, vacuum awareness
  • Incident triage across infra + app layers
  • Structured incident response (ITIL, NIST, or equivalent)
  • SLA management in a ticketed environment (Jira or similar)
  • Strong written English for escalation + post-incident write-ups

Nice-to-have:

  • Datadog / CloudWatch fluency
  • AWS data services (Glue, S3, Athena, EventBridge)
  • Basic IaC (CloudFormation, SAM, Terraform)
  • Financial services or other regulated-environment background
  • AWS SysOps Administrator or Solutions Architect cert
  • Scripting / automation


Related jobs

Other jobs at Commit

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.