Site Reliability Engineer – Azure & Microsoft 365 Automation (Remote Opportunity)

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

12+ years of experience in cloud platform engineering, DevOps, or site reliability engineering with a focus on automation., Proficiency in PowerShell scripting and Infrastructure as Code using Bicep., Strong understanding of CI/CD processes and experience with YAML pipelines in Azure DevOps., In-depth knowledge of Microsoft 365 platform and Azure-native services..

Key responsabilities:

  • Lead investigation and resolution of critical incidents in Azure and Microsoft 365 automation workflows.
  • Debug and optimize PowerShell, Bicep, and .NET components within automated provisioning workflows.
  • Collaborate with product owners to introduce new automation use cases and conduct post-incident reviews.
  • Mentor L1 and L2 engineers and stay updated with changes in Azure and Microsoft 365 APIs.

Zealogics Inc logo
Zealogics Inc Information Technology & Services SME https://www.zealogics.com/
501 - 1000 Employees
See all jobs

Job description

Key Responsibilities: 

  • Lead investigation and resolution of critical, recurring, or high-impact incidents across Azure and Microsoft 365 automation workflows. 

  • Deep-dive into PowerShell, Bicep, and YAML scripts to identify logic errors, misconfigurations, or scalability limitations within automated provisioning workflows. 

  • Debug and optimize .NET (C#) components within Azure Functions or related application layers used in workflow orchestration. 

  • Analyze usage patterns and telemetry data from Azure Monitor, Application Insights, and Log Analytics to identify systemic issues or opportunities for automation enhancement. 

  • Implement fixes and design improvements to automation logic that reduce manual intervention and improve workflow reliability (e.g., auto-remediation scripts, retry logic). 

  • Own and evolve the automation framework for Teams and SPO lifecycle operations — including operations like create/delete, external sharing restrictions, and role/ownership changes. 

  • Collaborate with product owners and architects to introduce new automation use cases or extend existing workflows. 

  • Conduct post-incident reviews (PIRs) for high-severity incidents, drive root cause analysis (RCA), and implement corrective actions. 

  • Mentor L1 and L2 engineers, conduct knowledge-sharing sessions, and support onboarding of new team members. 

  • Stay updated with changes in Azure, Microsoft 365 APIs, and automation tooling (PowerShell modules, Bicep schema updates, etc.) 

  • Provide guidance on architecture and best practices for automation reliability 

Required Skills & Experience: 

  • 12+ years of experience in cloud platform engineering, DevOps, or site reliability engineering (SRE) roles with a focus on automation and operational excellence. 

  • Proficiency in PowerShell scripting, including writing reusable modules, automation logic, and error handling for production workloads. 

  • Extensive experience with Infrastructure as Code using Bicep, including authoring, debugging, and deploying templates for complex Azure resources. 

  • Strong understanding of CI/CD processes and YAML pipelines, with hands-on experience in automating build/release workflows in Azure DevOps. 

  • Proficient in .NET (C#) — especially for debugging Azure Functions or working on backend components integrated into M365 automation flows. 

  • In-depth knowledge of Microsoft 365 platform, including API usage, Teams & SharePoint Online provisioning, governance, and permissions management. 

  • Proven ability to troubleshoot and optimize Azure-native services such as API Management, Azure Functions, Storage, Service Bus, Key Vault, and Container Apps. 

  • Skilled in telemetry and observability — leveraging Azure Monitor, Log Analytics, Kusto queries, and custom logging to proactively identify issues. 

  • Experience conducting root cause analysis, post-incident reviews, and implementing system-wide improvements to reduce incident frequency and MTTR. 

  • Experience in mentoring support engineers, contributing to runbook creation, and improving team capability over time. 

  • Strong analytical, documentation, collaboration and stakeholder communication skills 

Required profile

Experience

Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Analytical Thinking
  • Collaboration
  • Communication

Site Reliability Engineer (SRE) Related jobs