Senior SRE DevOps Engineer
Department: Engineering / Platform Infrastructure
Reports To: VP of Engineering / CTO
Location: Remote (with availability for on-call rotations)
Type: Full-Time (40hs/week)
Timezone: aligned w/ Eastern or Central Time business hours (EST, CST)
WebRTC.ventures is one of the few software development agencies in the world dedicated exclusively to real-time applications. Originally founded as AgilityFeat in 2010, we began specializing in WebRTC in 2015. We are headquartered in Charlottesville, VA, with a QA/testing center in Panama City, Panama, and a remote office in Bogotá, Colombia. With primary operations in North and South America, we serve clients around the globe. Our team has always been remote, which fuels our passion for real-time communications.
About the Role
For this project, we're building a satellite communication platform that enables voice calls and messaging through satellite networks when terrestrial connectivity is unavailable. Our system bridges mobile devices with custom satellite hardware, operating across dual modes (SAT/TER) with stringent bandwidth, latency, and reliability constraints.
We're looking for a Senior SRE DevOps Engineer who combines strong software development skills with deep operational expertise. This isn't a pure ops role, you'll write production-grade code for infrastructure tooling, automation frameworks, reliability services, and internal platforms while owning the full lifecycle of our cloud infrastructure.
What You'll Do
- Implement SLI/SLO frameworks with error budgets, driving data-informed reliability decisions across the platform
- Design release strategies including blue/green deployments, canary releases, automatic rollback, and version tracking
- Lead incident response, author post-mortems, and build automated runbooks that reduce MTTR
- Develop internal tooling, automation frameworks, and self-service platforms in TypeScript/Python to improve developer productivity and operational efficiency
- Write reliability-focused services: health checkers, auto-remediation controllers, capacity managers, deployment orchestrators, and chaos testing frameworks
- Build and maintain production AWS infrastructure using IaC (Terraform/CloudFormation), with focus on ECS, EKS/Kubernetes, and microservices orchestration
- Build and maintain end-to-end CI/CD pipelines for backend services, mobile apps (iOS/Android), and IoT firmware across on-prem and AWS cloud environments
- Define and enforce security policies: network segmentation, IAM, secrets management, encryption, compliance auditing, vulnerability management, and incident response
- Build comprehensive observability with OpenTelemetry, distributed tracing, custom metrics exporters, and alerting across WebSocket connections, message delivery pipelines, and real-time communication services
- Manage PostgreSQL (RDS), Redis/ElastiCache, SQS, S3, and NLB/ALB configurations including Elastic IPs for SIP/RTP traffic
Required Qualifications
- 7+ years in SRE/DevOps/Platform Engineering with a strong software development background — you write code daily, not just scripts
- Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for building internal tools, CLIs, operators, and automation services
- Deep AWS expertise: ECS, EKS, RDS, ElastiCache, SQS, VPC networking, IAM, CloudWatch
- Strong IaC proficiency (Terraform, CloudFormation, or Pulumi) including module design, state management, and drift detection
- Proven CI/CD pipeline design on both on-prem and cloud (GitHub Actions, CodeBuild/CodePipeline, self-hosted runners)
- Container orchestration at scale: Docker, ECS task definitions, Kubernetes, Helm, with experience writing custom controllers or operators
- Solid security background: network security, secrets management, compliance, incident response
- Experience implementing SLI/SLO frameworks, error budgets, and toil reduction strategies
- Production PostgreSQL, Redis, and message queue operations (SQS, Redis Streams)
- Strong understanding of distributed systems patterns: circuit breakers, retries, backpressure, graceful degradation
Preferred (Non-Exclusive but Important)
- Experience operating real-time communication servers: SIP proxies (Kamailio), media servers (FreeSWITCH), RTPEngine, and WebRTC infrastructure at scale
- Familiarity with telecom protocols (SIP, RTP/SRTP, DTLS) and VoIP service provisioning
- Exposure to satellite communication systems or ultra-low-bandwidth network optimization
- Experience building developer platforms or internal PaaS/tooling
- IoT device management and firmware deployment pipelines
- Kubernetes backup/migration strategies (Velero) and data warehouse pipelines (Athena, Redshift)
Tech Stack
AWS (ECS, EKS, RDS, ElastiCache, SQS, S3, NLB/ALB, CloudWatch) · Terraform · Docker · GitHub Actions · CodeBuild · OpenTelemetry · SigNoz · PostgreSQL · Redis · Kamailio · FreeSWITCH · RTPEngine · WebRTC · TypeScript/Node.js · Python · Bash
What We Offer
- Build critical communication infrastructure connecting people in the most remote areas of the world
- A role where engineering and operations merge, you'll ship code that keeps the platform running
- Technically challenging environment spanning cloud, IoT, telecom, and satellite systems
- Full ownership of the infrastructure stack with direct impact on reliability and scale
- Competitive compensation, flexible remote work and a great work environment
Compensation$5000 - $7000 usd/mo