Offer summary

Qualifications:

Strong expertise in cloud operations and communications technologies., Experience with monitoring, alerting, and anomaly detection tools like Prometheus and Grafana., Proficiency in automation and infrastructure provisioning using Terraform and Azure DevOps., Background in system performance tuning, security, and disaster recovery planning..

Key responsibilities:

Maintain real-time monitoring and alerting systems to ensure system health.

Identify and resolve system anomalies proactively to minimize downtime.

Automate infrastructure provisioning, scaling, and patching across various environments.

Optimize system performance and implement security best practices.

Job description

Founded in 2010 in The Netherlands, AnywhereNow is a global leader in Enterprise Dialogue Management, with a vision to ensure every employee and customer feels heard, understood, and valued. With around 240 employees in working from 22 different countries, we partner with over 2,000 leading enterprises, including Mazda, the UN International Organization for Migration, Adecco Group, and the University of Cape Town, to deliver exceptional customer experiences through the power of Microsoft Teams and AIdriven insights. Our commitment to innovation, customer focus, and accountability drives our success.

The opportunity

We are looking for a highly skilled and driven Cloud Operations Engineer to join our team with a strong emphasis on communications technologies, cloud operations, and system performance. This role requires expertise in monitoring, alerting, anomaly detection, automation, security, and performance tuning across our critical communications platforms. You will be responsible for the reliability, availability, and performance of services such as SIP, Skype for Business, and Azure Communication Services (ACS). Your role will also focus on optimizing resource utilization, cost management, and ensuring disaster recovery and business continuity (BCPDR).

What you’ll be doing

Develop and maintain realtime monitoring and alerting systems using tools like Prometheus, Grafana, and the ELK stack to ensure system health and performance.
Identify and resolve anomalies and bottlenecks proactively, reducing downtime through automated detection and alert mechanisms.
Automate infrastructure provisioning, scaling, and patching using tools like Terraform and Azure DevOps across Kubernetes, Windows, and Linux environments.
Build selfhealing systems and leverage Kubernetes operators, CICD pipelines, and eventdriven automation to improve reliability.
Analyze and optimize system performance for latencysensitive services, including VoIP, video, and messaging.
Implement cloud cost optimization strategies, such as using Reserved Instances, rightsizing virtual machines, and leveraging Azure Cost Management tools.
Strengthen system security by enforcing best practices for hardening, vulnerability patching, and incident management in collaboration with security teams.
Design and execute robust disaster recovery plans, ensuring faulttolerant architectures and reliable backup and restore strategies.