Logo for Dealer Tire

Senior Site Reliability Engineer

Roles & Responsibilities

  • 5+ years in a Site Reliability Engineering, DevOps, or Production Support role at a software or e-commerce company
  • Hands-on experience with AWS (EC2, CloudWatch, or equivalent) for day-to-day operational tasks
  • Experience with Datadog, New Relic, PagerDuty, or equivalent platforms for monitoring, alerting, and incident detection
  • Working knowledge of MySQL/relational databases and ability to read/analyze complex SQL queries to diagnose production data issues

Requirements:

  • Production triage and incident ownership: serve as primary on-call responder, triage, investigate, and drive issues to resolution with clear communication throughout the incident lifecycle
  • Root cause analysis and hands-on remediation: lead RCA for production failures and execute infrastructure-level remediation including EC2 restarts, Gearman worker pool resets, Rundeck recovery, and queue restoration
  • Regression identification and deployment risk management: identify deployment-related regressions and coordinate revert requests with development teams when causal links are established
  • Incident coordination and cross-functional leadership: direct cross-functional teams during active incidents, assign investigation tasks, track affected orders/customers, and keep stakeholders informed via Slack and JIRA

Job description

Who We Are

We’re Dealer Tire, a family-owned, international distributor of tires and parts established in 1918 in Cleveland, OH. We’re laser focused on helping the world’s largest and most trusted auto manufacturers grow their tire business—in fact, we’ve sold more than 60 million tires to date. We’re a thriving company, and we’re looking for driven individuals to join our team. That’s where you come in!

Base Pay Range:

$110,000 - $125,000

As a Senior Site Reliability Engineer, you will be a hands-on technical individual contributor embedded within the Core Systems team, responsible for the daily health, stability, and performance of our production environment. You will serve as a primary responder for production incidents, owning triage through resolution — including root cause analysis, infrastructure remediation, and order automation recovery. You will work directly alongside the Manager, Consumer Technology Site Reliability, and Helpdesk to handle day-to-day triage and fix responsibilities, enabling leadership to focus on strategic decisions and team direction. You will also partner with development teams to evaluate production risk before deployment.

As Senior Site Reliability Engineer - Core Systems, your essential job responsibilities will include the following:

  • Production Triage: Includes all incidents surfaced via the #triage Slack channels, Datadog alerts, Rundeck failures, contact center reports, and proactive monitoring across all business units.

  • Incident Ownership: Serve as the primary on-call responder for production incidents. Acknowledge, investigate, and drive issues to resolution with clear communication throughout the incident lifecycle.

  • Root Cause Analysis: Lead RCA (Root Cause Analysis) for production failures, including order automation breakdowns, Gearman/worker queue degradation, API integration outages, batch job timeouts, and database performance events. Document findings with sufficient detail to support post-mortem review.

  • Hands-On Remediation: Execute infrastructure-level remediation, including EC2 instance restarts, Gearman worker pool resets, Rundeck job recovery, order status resets, and inventory and pricing queue restoration.

  • Regression Identification: Identify deployment-related regressions by correlating incident timelines to recent deployments. Initiate and coordinate revert requests with development teams when causal links are established.

  • Incident Coordination: Direct cross-functional teams during active incidents — assigning investigation tasks, managing parallel workstreams, tracking affected order or customer counts, and keeping all stakeholders informed via Slack threads and JIRA ticket updates.

Focus Areas

  • Monitor the entire Consumer Enterprise Group (CEG) Platform processing environment and proactively surface anomalies, enhancement opportunities, and risk areas to leadership.

  • Assist with data cleanup and order recovery operations following production incidents.

  • Support testing and validation of infrastructure changes prior to production deployment.

  • Ensure accurate and timely entry of incident details, findings, and resolutions into JIRA tracking systems.

  • Continue to develop expertise in the CEG codebase, third-party integrations, and operational tooling through working sessions and self-directed learning.

  • Attend improvement opportunities for personal growth and certifications that will enhance effectiveness in the role.

Other Duties as assigned.

Position Requirements

  • 5+ years in a Site Reliability Engineering, DevOps, or Production Support role at a software or e-commerce company.

  • Demonstrated ability to independently diagnose and resolve production incidents, including infrastructure-level failures (servers, queues, batch jobs, APIs).

  • Hands-on experience with AWS (EC2, CloudWatch, or equivalent) for day-to-day operational tasks.

  • Experience with Datadog, New Relic, PagerDuty, or equivalent platforms for monitoring, alerting, and incident detection.

  • Working knowledge of MySQL/relational databases for investigative queries and data validation. Ability to read and analyze complex SQL queries to diagnose production data issues.

  • Familiarity with PHP, Python, Bash, or similar languages sufficient to read, debug, and modify production scripts and automation jobs.

  • Experience with Rundeck, cron, or equivalent batch job management and monitoring tools.

Competencies Required

  • Problem-Solving

  • Composure

  • Accountability

  • Detail-Oriented

  • Adaptability

  • Collaborative

  • Proactive

  • Communication

  • Results Orientation

Physical Job Requirements

  • Continuous viewing from and inputting data to a computer screen

  • Talking through the computer for many meetings and one-to-one conversations

  • Sitting for long periods of time

  • Travel required (<10%)

Drug Policy

Dealer Tire is a drug-free environment.  All applicants being considered for employment must pass a pre-employment drug screening before beginning work.

Why Dealer Tire: An amazing opportunity to join a growing organization, built on the efforts of hard working, innovative, and team-oriented people. The compensation offered for this position will depend on qualifications, experience, and geographic location. The total compensation package may also include commission, bonus or profit sharing. We offer a competitive & comprehensive benefit package including: paid time off, medical, dental, vision, and 401k match (50% on the dollar up to 7% of employee contribution). For more information on our benefit offerings, please visit our Dealer Tire Family of Companies Benefits Highlights Booklet.

EOE Statement: Dealer Tire is an Equal Employment Opportunity (EEO) employer and does not discriminate on the basis of race, color, national origin, religion, gender, age, veteran status, political affiliation, sexual orientation, marital status or disability (in compliance with the Americans with Disabilities Act*), or any other legally protected status, with respect to employment opportunities. 

*ADA Disclosure: Any candidate who feels that they may need an accommodation to complete this application, or any portions of same, based on the impact of a disability should contact Dealer Tire’s Human Resources Department to discuss your specific needs. Please feel free to contact us at 1-800-933-2537 x6550.

Site Reliability Engineer (SRE) Related jobs

Other jobs at Dealer Tire

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.