Key Facts

Remote From:

Full time

Senior (5-10 years)

English

Hard Skills

AWS Cloud Services Microsoft Azure Infrastructure as Code (IaC) Systems Engineering Observability Prometheus (Software) Site Reliability Engineering Containerization Storages Capacity Planning +25 more

Other Skills

•
Collaboration
•
Communication
•
Analytical Thinking
•
Troubleshooting (Problem Solving)

Roles & Responsibilities

5+ years of progressive experience in IT, Software Engineering, Technology Operations, or Business Continuity.
2+ years of hands-on experience in a Site Reliability, DevOps, or IT Observability role.
Proficiency with production monitoring and alerting tools (DataDog is a major plus).
Basic proficiency in an AWS containerized environment with infrastructure as code (e.g., Terraform).

Requirements:

Define and monitor Service Level Objectives (SLOs) and ensure high availability, reliability, and scalability of user-facing services and production systems.
Develop and maintain automation for deployments, configuration management, and day-to-day operational tasks.
Implement, manage, and tune monitoring and alerting systems to detect issues quickly; respond to incidents and perform post-mortems.
Conduct capacity planning and collaborate with stakeholders to mitigate operational risks and optimize performance and system design.

SafeRide Health

About SafeRide Health

SafeRide Health is a technology and services company dedicated to reducing barriers to care by improving the delivery of non-emergency medical transportation (NEMT) to people nationwide. SafeRide Health leverages proprietary technology and a nationwide network of vetted transportation providers to elevate human dimensions of care and close the gap between need and access for the nation’s most vulnerable populations. SafeRide’s scalable and intuitive platform gives payers and health systems a more intelligent way to deliver cost-effective, on-demand transportation that connects health plan members to critical healthcare services. SafeRide serves the country’s largest Medicare Advantage, Medicaid, and provider programs.

Company type: Scaleup

Founded: 2018

Company size: 501 - 1000

Website LinkedIn See all jobs →

Job description

About the Role

SafeRide Health is seeking a Site Reliability Engineer to develop and implement new processes that support software delivery excellence and operational discipline, to ensure that user-facing services and production systems remain highly available, reliable, and scalable. Key responsibilities include defining and monitoring Service Level Objectives (SLOs), responding to and resolving incidents, developing automation for operational tasks, performing capacity planning, and collaborating with development teams to mitigate operational risks and improve system design.

Core Responsibilities

Reliability and Availability: Keeping systems and services running smoothly with minimal downtime by focusing on availability, reliability, and scalability.
Automation: Developing and maintaining tools and scripts to automate repetitive tasks such as deployments, configuration management, and monitoring.
Monitoring and Alerting: Implementing and managing monitoring and alerting systems to provide visibility into system performance and quickly detect potential issues.
Incident Management: Responding to, diagnosing, and resolving system incidents, including conducting post-mortems to prevent future occurrences.
Capacity Planning: Monitoring system resource usage to forecast future needs and scale systems accordingly to handle increasing user load.
Risk Mitigation: Collaborating with stakeholders to identify operational risks and implementing strategies to reduce their likelihood and impact.
Performance Optimization: Analyzing metrics from operating systems and applications to identify areas for performance improvement.

Key Skills

Cloud Technologies: Expertise in major cloud platforms such as AWS and Azure.
Systems Engineering: Deep knowledge of operating systems, networking, storage, and distributed systems.
Tools and Technologies: Experience with tools for infrastructure as code (e.g., Terraform), containerization (e.g., Docker), and APM/monitoring (e.g., Prometheus, DataDog, New Relic, Grafana, Splunk).
Programming and Scripting: Proficiency in coding languages like Python, Ruby, and JavaScript for developing automation and managing infrastructure.
Collaboration: Strong communication and collaboration skills to work effectively with development, operations, and other cross-functional teams.

Minimum Requirements

Minimum of 5 years progressive experience in an IT, Software Engineering, Technology Operations, or Business Continuity role.
Minimum of 2 years of hands-on experience in a Site Reliability, DevOps, or IT Observability role.
Demonstrated proficiency with production monitoring and alerting tools (DataDog is a major plus!).
Basic proficiency in an AWS containerized environment running infrastructure as code.

Benefits

SafeRide Health offers a comprehensive benefits package including:

Competitive compensation and performance-based bonus potential
Full medical, dental, and vision coverage
Generous PTO and paid company holidays
401(k) with employer match
Paid parental leave and family support benefits

About SafeRide Health

SafeRide Health is a technology and services company dedicated to reducing barriers to care by improving the delivery of non-emergency medical transportation to people across America. SafeRide employs proprietary technology, paired with a nationwide network of vetted transportation providers. This enables payers and health systems to deliver cost-effective, on-demand transportation intelligently, enhancing the patient experience in the process. SafeRide serves the largest Medicare Advantage, Medicaid, and provider programs in the country. Learn more at www.saferidehealth.com.

Ready to apply?

APPLY

Share ·