Key Facts

Remote From:

Anywhere

Freelance

Senior (5-10 years)

English

Hard Skills

Other Skills

•
Communication
•
Leadership
•
Time Management
•
Teamwork
•
Problem Solving

Roles & Responsibilities

7+ years of SRE/DevOps/Platform Engineering with production coding experience (not just scripting)
Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for building internal tools and automation
Deep AWS expertise including ECS, EKS, RDS, ElastiCache, SQS, VPC networking, IAM, and CloudWatch
Strong IaC experience (Terraform, CloudFormation, or Pulumi) with module design, state management, and drift detection

Requirements:

Implement SLI/SLO frameworks and drive reliability decisions based on data
Design release strategies including blue/green deployments, canary releases, automatic rollback, and version tracking
Lead incident response, author post-mortems, and build automated runbooks to reduce MTTR
Develop internal tooling, automation frameworks, and self-service platforms in TypeScript/Python to improve developer productivity and operational efficiency

Job description

Senior SRE DevOps Engineer

Department: Engineering / Platform Infrastructure

Reports To: VP of Engineering / CTO

Location: Remote (with availability for on-call rotations)

Type: Full-Time (40hs/week)

Timezone: aligned w/ Eastern or Central Time business hours (EST, CST)

WebRTC.ventures is one of the few software development agencies in the world dedicated exclusively to real-time applications. Originally founded as AgilityFeat in 2010, we began specializing in WebRTC in 2015. We are headquartered in Charlottesville, VA, with a QA/testing center in Panama City, Panama, and a remote office in Bogotá, Colombia. With primary operations in North and South America, we serve clients around the globe. Our team has always been remote, which fuels our passion for real-time communications.

About the Role

For this project, we're building a satellite communication platform that enables voice calls and messaging through satellite networks when terrestrial connectivity is unavailable. Our system bridges mobile devices with custom satellite hardware, operating across dual modes (SAT/TER) with stringent bandwidth, latency, and reliability constraints.

We're looking for a Senior SRE DevOps Engineer who combines strong software development skills with deep operational expertise. This isn't a pure ops role, you'll write production-grade code for infrastructure tooling, automation frameworks, reliability services, and internal platforms while owning the full lifecycle of our cloud infrastructure.

What You'll Do

Implement SLI/SLO frameworks with error budgets, driving data-informed reliability decisions across the platform
Design release strategies including blue/green deployments, canary releases, automatic rollback, and version tracking
Lead incident response, author post-mortems, and build automated runbooks that reduce MTTR
Develop internal tooling, automation frameworks, and self-service platforms in TypeScript/Python to improve developer productivity and operational efficiency
Write reliability-focused services: health checkers, auto-remediation controllers, capacity managers, deployment orchestrators, and chaos testing frameworks
Build and maintain production AWS infrastructure using IaC (Terraform/CloudFormation), with focus on ECS, EKS/Kubernetes, and microservices orchestration
Build and maintain end-to-end CI/CD pipelines for backend services, mobile apps (iOS/Android), and IoT firmware across on-prem and AWS cloud environments
Define and enforce security policies: network segmentation, IAM, secrets management, encryption, compliance auditing, vulnerability management, and incident response
Build comprehensive observability with OpenTelemetry, distributed tracing, custom metrics exporters, and alerting across WebSocket connections, message delivery pipelines, and real-time communication services
Manage PostgreSQL (RDS), Redis/ElastiCache, SQS, S3, and NLB/ALB configurations including Elastic IPs for SIP/RTP traffic

Required Qualifications

7+ years in SRE/DevOps/Platform Engineering with a strong software development background — you write code daily, not just scripts
Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for building internal tools, CLIs, operators, and automation services
Deep AWS expertise: ECS, EKS, RDS, ElastiCache, SQS, VPC networking, IAM, CloudWatch
Strong IaC proficiency (Terraform, CloudFormation, or Pulumi) including module design, state management, and drift detection
Proven CI/CD pipeline design on both on-prem and cloud (GitHub Actions, CodeBuild/CodePipeline, self-hosted runners)
Container orchestration at scale: Docker, ECS task definitions, Kubernetes, Helm, with experience writing custom controllers or operators
Solid security background: network security, secrets management, compliance, incident response
Experience implementing SLI/SLO frameworks, error budgets, and toil reduction strategies
Production PostgreSQL, Redis, and message queue operations (SQS, Redis Streams)
Strong understanding of distributed systems patterns: circuit breakers, retries, backpressure, graceful degradation

Preferred (Non-Exclusive but Important)

Experience operating real-time communication servers: SIP proxies (Kamailio), media servers (FreeSWITCH), RTPEngine, and WebRTC infrastructure at scale
Familiarity with telecom protocols (SIP, RTP/SRTP, DTLS) and VoIP service provisioning
Exposure to satellite communication systems or ultra-low-bandwidth network optimization
Experience building developer platforms or internal PaaS/tooling
IoT device management and firmware deployment pipelines
Kubernetes backup/migration strategies (Velero) and data warehouse pipelines (Athena, Redshift)

Tech Stack

AWS (ECS, EKS, RDS, ElastiCache, SQS, S3, NLB/ALB, CloudWatch) · Terraform · Docker · GitHub Actions · CodeBuild · OpenTelemetry · SigNoz · PostgreSQL · Redis · Kamailio · FreeSWITCH · RTPEngine · WebRTC · TypeScript/Node.js · Python · Bash

What We Offer

Build critical communication infrastructure connecting people in the most remote areas of the world
A role where engineering and operations merge, you'll ship code that keeps the platform running
Technically challenging environment spanning cloud, IoT, telecom, and satellite systems
Full ownership of the infrastructure stack with direct impact on reliability and scale
Competitive compensation, flexible remote work and a great work environment

Compensation$5000 - $7000 usd/mo

Ready to apply?

APPLY

Share ·