6–10 years of experience in Site Reliability Engineering, DevOps, or Production Support roles.
Strong hands-on expertise in Dynatrace including monitoring, alerting, dashboards, and problem analysis.
Solid understanding of observability, logging, and monitoring frameworks.
Experience with cloud platforms such as AWS, Azure, or GCP.
Requirements:
Design, implement, and manage end-to-end monitoring solutions using Dynatrace.
Configure alerting, dashboards, problem detection, and performance optimization strategies.
Monitor application health, infrastructure performance, and user experience across distributed systems.
Troubleshoot production incidents and perform root cause analysis for system and application issues.
Job description
Site Reliability Engineer (SRE) – Dynatrace Remote (New York, USA) Location: Remote
(New York, USA) Experience Required: 6–10 Years Job Summary
We are seeking an experienced Site Reliability Engineer (SRE) with strong expertise in Dynatrace and modern observability practices. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of enterprise applications and infrastructure across cloud and hybrid environments. This role requires hands-on experience with monitoring, automation, cloud platforms, CI/CD pipelines, and containerized environments. Key Responsibilities
Design, implement, and manage end-to-end monitoring solutions using Dynatrace.
Configure alerting, dashboards, problem detection, and performance optimization strategies.
Monitor application health, infrastructure performance, and user experience across distributed systems.
Troubleshoot production incidents and perform root cause analysis for system and application issues.
Collaborate with DevOps, Cloud, and Engineering teams to improve system reliability and operational efficiency.
Automate operational tasks and monitoring workflows using scripting languages such as Python, Bash, or Shell.
Support and optimize cloud-based environments on AWS, Azure, or GCP.
Manage and troubleshoot Linux/Unix-based systems.
Work with containerization and orchestration technologies including Docker and Kubernetes.
Build and maintain CI/CD pipelines using tools such as Jenkins, GitLab CI/CD, or Azure DevOps.
Ensure observability best practices across microservices and distributed architectures.
Participate in on-call support and incident response activities as needed.
Required Skills & Qualifications
6–10 years of experience in Site Reliability Engineering, DevOps, or Production Support roles.
Strong hands-on expertise in Dynatrace including monitoring, alerting, dashboards, and problem analysis.
Solid understanding of observability, logging, and monitoring frameworks.
Experience with cloud platforms such as AWS, Azure, or GCP.
Strong knowledge of Linux/Unix systems administration and troubleshooting.
Experience with Docker and Kubernetes in enterprise environments.
Proficiency with CI/CD tools including Jenkins, GitLab, or Azure DevOps.
Strong scripting and automation skills using Python, Bash, or Shell scripting.
Understanding of microservices architecture and distributed systems.
Excellent troubleshooting, analytical, and communication skills.
Preferred Qualifications
Experience implementing SRE best practices and reliability engineering principles.
Knowledge of Infrastructure as Code (Terraform, Ansible, etc.) is a plus.
Exposure to enterprise-scale monitoring and cloud-native technologies.
Relevant cloud or Dynatrace certifications are an advantage