Logo for Merge by Merative

Azure CloudAI Ops Engineer

Roles & Responsibilities

  • Strong Python coding for operations, and automation using PowerShell/Bash
  • Experience with Azure AIOps, including Azure Monitor/Log Analytics (KQL), Application Insights, and Issues Investigations
  • Kubernetes/AKS expertise with rollout strategies, horizontal pod autoscaling (HPA/KEDA), and health checks
  • Infrastructure as Code and CI/CD pipelines using Terraform, Bicep/ARM, and Azure DevOps or GitHub Actions/Jenkins

Requirements:

  • Operate with AIOps to detect anomalies using Azure Monitor/Log Analytics and Application Insights, triage issues, and guide RCA
  • Enhance PagerDuty operations by configuring Event Intelligence for deduplication/correlation and automating safe remediation with Event Orchestration/Automation Actions (e.g., AKS rollouts restarts, pod autoscaling, Kafka queue drain) with approvals/audit
  • Run SaaS on AKS by owning Kubernetes health, rollouts, scaling, and config hygiene across regions/tenants; lead on-call incidents and postmortems
  • Automate and govern reliability: define SLIs/SLOs, embed reliability gates in CI/CD, manage infrastructure as code, and maintain runbooks/SOPs; ensure security/compliance for healthcare SaaS

Job description

Join a team dedicated to supporting the crucial mission of improving health outcomes.

At Merative,  you can apply your skills – and grow new ones – with colleagues who have deep expertise in health and technology. Merative provides data, analytics and software for the health industry. Our clients include providers, health plans, employers, life sciences companies and governments around the world. With industry-leading products and focused innovation, we help customers improve decision-making and performance so that together, we drive real progress in health. Learn more at merative.com

Merge medical imaging solutions, offered by Merative, combine intelligent, scalable imaging workflow tools with deep and broad expertise to help healthcare organizations improve their confidence in patient outcomes and optimize care delivery.
We’re evolving our CloudOps function into AIOps‑driven SRE. You’ll keep our multi‑tenant SaaS services reliable on Azure/AKS, reduce alert noise, and automate safe fixes using PagerDuty AIOps and Azure AIOps. This role blends coding for operations with hands‑on incident management, and it’s ideal for candidates who enjoy partnering with dev/product while improving patient‑impacting healthcare technology.

What You’ll Do

  • Operate with AIOps: Use Azure Monitor/Log Analytics and Application Insights to detect anomalies, triage issues, and speed up RCA; apply Issues & Investigations (preview) to guide troubleshooting.
  • Make PagerDuty smarter: Configure Event Intelligence for dedup/correlation and set up Event Orchestration/Automation Actions to trigger safe autoremediation (AKS rollout restarts, pod auto scaling, Kafka queue drain) with approvals/audit.
  • Run SaaS on AKS: Own Kubernetes health, rollouts, scaling, and config hygiene across regions/tenants.
  • Automate everything: Build tools and runbooks in Python (primary) and PowerShell/Bash; integrate APIs and ChatOps for oneclick remediation.
  • Ship reliably: Define SLIs/SLOs and error budgets; embed reliability gates in CI/CD (Azure DevOps/GitHub Actions/Jenkins), support canary/bluegreen, and enable fast rollback.
  • Provision via code: Manage infra with Terraform, Bicep, and ARM for repeatable, auditable changes.
  • Lead incidents: Take point on oncall response, clear stakeholder communication, and blameless postmortems; convert learnings into durable SOPs/runbooks.
  • Protect data: Apply Entra ID/RBAC, secrets hygiene, policyascode, and privacy/security practices appropriate for healthcare SaaS.

MustHave Skills

  • Coding for ops: Strong Python; PowerShell/Bash for automation and tooling.
  • Azure AIOps: Azure Monitor/Log Analytics (KQL), Application Insights (Smart Detection, Application Map), and Issues & Investigations (observability agent).
  • PagerDuty AIOps: Event Intelligence (dedup/correlation) and Automation Actions/Event Orchestration for safe remediation; confident oncall operations.
  • AKS/Kubernetes: Rollout strategies, HPA/KEDA autoscaling, health checks.
  • IaC & CI/CD: Terraform/Bicep/ARM and pipelines in Azure DevOps/GitHub Actions/Jenkins.
  • SRE fundamentals: SLIs/SLOs, error budgets, RCA, blameless postmortems → runbooks/SOPs.
  • Security & compliance: Entra ID/RBAC, secrets hygiene, policyascode; familiarity with healthcare privacy/security expectations.

NicetoHave

  • Kafka ops (lag monitoring, safe catchup)
  • Teams/ChatOps remediation and status broadcasting
  • Cost & capacity automation (tagging, idle cleanup, forecasting)
  • Resilient routing & security (service mesh, App Gateway/WAF)
  • Multiregion DR, chaos/game days with documented improvement

How We Work

  • Oncall: Rotating 24×7 coverage; you’ll lead response and keep comms clear.
  • Collaboration: Partner closely with dev/product/security; automation over tickets.
  • Growth: No deep ML required—willingness to learn AIassisted ops (e.g., Copilotgenerated KQL, AIOps triage findings) is valued.

Compensation


The salary range provided in this job posting is intended to reflect the general market value for the position. The actual salary offered may vary based on factors such as the candidate’s experience, qualifications, skills, and the specific requirements of the role. This range may also be subject to change as market conditions evolve. We encourage open communication throughout the interview process to discuss compensation expectations. For base-salary + commission sales roles, the range represents On-Target Earnings.

Min – Max :

$85,276.80 - $127,915.20 (CAD)

 

 

Benefits

The benefits described represent the current offerings at our organization, however, benefits are subject to change and may vary by location and employment status.  We strive to provide a comprehensive benefits package that supports our employee’s health, wellness, and financial goals.  Please note that benefits may be discussed in more detail during the hiring process. 

  • Vacation to help you rest, recharge, and connect with loved ones

  • Paid leave benefits

  • Extended health, paramedical, dental, and vision benefits

  • Registered retirement and tax-free savings plans

  • Tuition reimbursement, life insurance, EAP – and more!

 

 

It is the policy of Merative to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, HIV status, or any other characteristic protected by federal, state or local law. In addition, Merative will provide reasonable accommodations for qualified individuals with disabilities.

Azure Architect Related jobs

Other jobs at Merge by Merative

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.