Site Reliability Engineer

extra parental leave - fully flexible
Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Minimum of 3 years experience in Site Reliability Engineering or System Engineering roles., Hands-on experience managing production Kubernetes environments, such as EKS., Proficiency with scripting languages like Python, Ruby, or Bash., Strong understanding of DevOps principles and AWS technologies..

Key responsibilities:

  • Maintain and improve the reliability, scalability, and performance of the platform.
  • Support and operate Kubernetes clusters and other systems like Elasticsearch and RDS.
  • Contribute to deploying and tuning observability tools such as Prometheus and Grafana.
  • Participate in incident response and on-call rotations.

Roadie logo
Roadie Information Technology & Services Scaleup https://www.roadie.com/
201 - 500 Employees
See all jobs

Job description

Roadie, a UPS company, is a leading logistics and delivery platform that helps businesses tackle the complexities of modern retail with unmatched delivery coverage, flexibility and visibility. Reaching 97% of U.S. households across more than 30,000 zip codes — from urban hubs to rural communities — Roadie provides seamless, scalable solutions that meet a variety of delivery needs. 

With a network of more than 310,000 independent drivers nationwide, Roadie offers flexible delivery solutions that make complex logistics challenges easy, including solutions for local same-day delivery, delivery of big and bulky items, ship-from-store and DC-to-door. 

Roadie is seeking a Site Reliability Engineer to join our growing Technical Operations Team. We're looking for someone with a solid understanding of site reliability practices and hands-on experience working with production Kubernetes environments. The ideal candidate is a skilled problem solver with intimate knowledge of site reliability practices, standard Dev Ops principles, AWS, scripting languages and Kubernetes.

What You'll Do

  • Support the reliability, scalability, and performance of our platform through hands-on work with our infrastructure and deployment pipelines
  • Assist in maintaining and operating Kubernetes clusters (EKS), as well as other systems including Elasticsearch, MSK, RDS, and Redis
  • Contribute to the deployment, tuning, and upkeep of observability tools like Prometheus, Loki, Grafana, OpenTelemetry, and New Relic
  • Partner with more senior engineers to identify and remediate system bottlenecks and improve resource utilization
  • Participate in the monitoring and tracking of service level indicators (SLIs) and service level objectives (SLOs)
  • Write scripts and build automation to streamline operations and reduce manual work
  • Help troubleshoot production and non-production issues as part of the incident response process
  • Participate in an on-call rotation 

Technology We're Using Now

  • Python, Ruby on Rails, Golang
  • React/Redux, Objective-C and Swift, Android
  • Postgres, Redshift, Redis, Kafka
  • AWS/GCP
  • Docker/Kubernetes
  • OpenTelemetry/Prometheus/Thanos/Loki/Grafana/New Relic/Sentry
  • Git/CircleCI
  • ArgoCD

What You Bring

  • 3+ Years in various SRE roles
  • 3+ Years in various DevOPS/System Engineering roles
  • 3+ Years of experience building and managing production Kubernetes infrastructure
  • 3+ Years experience with popular scripting languages (Python, Ruby, Bash, etc.)
  • Experience with Infrastructure as code such as Terraform or Crossplane
  • Experience with CI/CD Development tools (CircleCI, etc.)
  • Experience with GitOPS Tools (ArgoCD)
  • Experience using a broad range of AWS technologies (RDS, ElasticSearch, VPC, EKS, S3, CloudFront, MSK, Elasticache, CloudWatch, etc.)
  • Experience developing and maintaining YAML templating systems (Helm charts, Kustomize, etc)
  • Must be able to work independently, be self-motivated and handle multiple priorities
  • Comfortable working in a fast-paced agile environment

Finally, a willingness to admit what you don’t know, and learn what you need to learn quickly.

Why Roadie? 

  • Competitive compensation packages 
  • 100% covered health insurance premiums for yourself
  • 401k with company match
  • Tuition and student loan repayment assistance (that’s right - Roadie will contribute directly to your existing student loans!) 
  • Flexible work schedule with unlimited PTO 
  • Monthly 3-day weekends
  • Monthly WFH stipend 
  • Paid sabbatical leave- tenured team members are given time to rest, relax, and explore
  • The technology you need to get the job done

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving
  • Adaptability
  • Time Management
  • Self-Motivation

Site Reliability Engineer (SRE) Related jobs