Key Facts

Remote From:

Full time

Mid-level (2-5 years)

English

Hard Skills

Amazon Web Services Incident Response AWS Cloud Services Amazon Relational Database Services Database Tuning Slack (Software) Query Performance Log Monitoring Microservices Application Programming Interface (API) +13 more

Other Skills

•
Collaboration

Roles & Responsibilities

3+ years of production support / SRE / NOC / ops engineering
Hands-on AWS experience (EC2, ECS, VPC networking, IAM)
Operational PostgreSQL / RDS experience (slow query analysis, basic tuning, vacuum awareness)
Structured incident response and SLA management in a ticketed environment (ITIL/NIST or equivalent) with strong written English

Requirements:

Provide front-line production support for Braviant's AWS multi-account stack, including monitoring, alert triage, runbook execution, and clean escalation to developers
Manage incidents across infrastructure and application layers with structured response (ITIL/NIST); perform post-incident reviews and maintain SLA metrics (Jira)
Operate and tune monitoring in Datadog and CloudWatch; coordinate with on-call and engineering teams via Slack/Jira
Maintain operational AWS services (EC2/ECS, VPC, RDS) with security, IAM, and reliability focus; uphold defensive ownership (not a developer role)

Commit

About Commit

Commit is a global tech services company with offices in New York, Israel, and Europe. The company was founded in 2005 and has over 700 multi-disciplinary innovation experts who serve a broad range of companies, from small startups to large enterprises in multiple business sectors. Commit specializes in advanced technologies and applications with dedicated practices in Software, IoT, Big Data, Cloud, Cyber, Collaboration, Data center migration projects, and more. Commit offers innovative, end-to-end technology solutions by developing custom software and IoT platforms for clients looking to build their next-gen products within the modern ICT world. Commit’s complete and comprehensive engineering powerhouse of resources, and proprietary Flexible R&D methodology helps transform its clients’ technology visions into high-quality products while reducing costs and improving time-to-market.

Company type: SME

Founded: 2018

Company size: 501 - 1000

Website LinkedIn See all jobs →

Job description

Description

We are looking for Tech Ops - Production Support & Reliability Lead

Front-line production support for Braviant's AWS multi-account stack. Monitor systems, triage alerts, execute runbooks, escalate cleanly to developers. Defensive ownership role - not a developer role despite "Lead" in title.

Stack:

AWS - VPC, ECS, Lambda (SAM/CloudFormation), IAM, NAT, security groups
PostgreSQL on Amazon RDS (~15 instances)
Datadog + CloudWatch (APM, logs, alerting)
Java microservices / API-heavy app stacks
Jira (ITSM) + Slack (ops channels)
Nice-to-have: AWS data services (Glue, S3, Athena, EventBridge), Metaplane

Requirements

Must-have:

3+ years production support / SRE / NOC / ops engineering
Hands-on AWS - EC2/ECS, VPC networking, IAM
Operational PostgreSQL / RDS - slow query reading, basic tuning, vacuum awareness
Incident triage across infra + app layers
Structured incident response (ITIL, NIST, or equivalent)
SLA management in a ticketed environment (Jira or similar)
Strong written English for escalation + post-incident write-ups

Nice-to-have: