8+ years of Site Reliability / DevOps engineering experience, with Azure expertise
Proficiency in PowerShell scripting; extensive experience with Azure PaaS (e.g., Azure Key Vault, Cosmos DB)
Experience with Monitoring and Observability and Infrastructure as Code (Terraform); on-call rotation experience
Strong leadership and decision-making abilities; Bachelor's degree in CS or related STEM field strongly preferred; knowledge of Prometheus Operator, Grafana, Loki, ELK Stack, OpenTelemetry, Jaeger/OpenTracing is a plus
Requirements:
Operate and optimize Azure-based infrastructure and PaaS resources (Key Vault, Cosmos DB, etc.) with focus on reliability and observability
Develop and maintain automation and infrastructure as code using Terraform and PowerShell scripts
Participate in the on-call rotation for production support and lead incident response and post-incident reviews
Provide technical leadership in SRE/DevOps practices and collaborate with teams to drive architecture decisions and mentor engineers
Job description
Azure DevOps / SRE Location Remote must be willing to work PST Duration 12 months Rate: DOE
U.S. Citizens and those authorized to work in the U.S. are encouraged to apply. We are unable to sponsor at this time. Unfortunately, this is not open for third-party C2C.
8-10+ years of Site Reliability / DevOps Engineering
Expertise in Azure
Should be experienced with PowerShell Scripting
Should have extensive experience with Azure PaaS Azure Key Vault, Cosmos DB, Etc.
Experience with Monitoring and Observability.
Experience with Infrastructure as a Code specifically Terraform
Strong leadership, initiative taking, and capacity for decision making
Expert knowledge in any or all of these is a huge plus: Prometheus Operator, Grafana, Loki, ELK Stack, OpenTelemetry, Jaeger/OpenTracing (and yes, we use ALL of these!)
Participate in the on-call rotation for Operations support
Bachelor's degree in CS or a related STEM engineering field strongly preferred