8-10+ years of Site Reliability / DevOps Engineering experience
Proficiency in Python and Shell scripting
Experience with Infrastructure as Code using Terraform
Experience with Monitoring and Observability, specifically Datadog, on Azure or AWS
Requirements:
Participate in the on-call rotation for operations support
Lead reliability initiatives and exercise decision-making to improve systems
Implement and maintain monitoring and observability using Datadog and related tools
Collaborate on cloud infrastructure using Terraform on Azure or AWS
Job description
SRE with Datadog expertise Location – Remote Duration – 6 months +
Rate: DOE
US Citizens and those authorized to work in the U.S are encouraged to apply. We are unable to sponsor at this time. Unfortunately, this is not open for third-party C2C
8-10+ years of Site Reliability / DevOps Engineering
Should be experienced with Python and Shell Scripting.
Should have extensive experience with Azure or AWS. (azure preferred)
Experience with Monitoring and Observability. - Datadog
Experience with Infrastructure as a Code – specifically Terraform
• Strong leadership, initiative taking, and capacity for decision making
• Expert knowledge in any or all of these is a huge plus: Prometheus Operator, Grafana, Loki, ELK Stack, OpenTelemetry, Jaeger/OpenTracing (and yes, we use ALL of these!)
• Participate in the on-call rotation for Operations support
• Bachelor's degree in CS or a related STEM engineering field strongly preferred