Job description

Faptic Technology is a leading provider of IT consulting and managed services, specializing in Azure cloud solutions, software development, and site reliability engineering (SRE). We partner with enterprises to optimize their IT operations, ensuring scalability, reliability, and innovation in the cloud.

As part of our growth in managed cloud services, we are looking for an Engagement Manager to oversee Azure cloud services and software development services. You will be responsible for managing service delivery, coordinating technical teams, and ensuring a seamless experience for customers—both directly and via strategic partners.

Who We Are

At Faptic, we thrive on solving complex challenges. By blending design, engineering, and analytics, we help organizations harness the power of their data to create innovative digital experiences and shape their future.

We are looking for an Azure Site Reliability Engineer (SRE) with proven expertise in Azure cloud platforms and cloud native software development to join our dynamic team.

Your Role at Faptic

We are seeking a highly skilled Azure Site Reliability Engineer (SRE) to join our team. The ideal candidate will have expertise in managing cloud infrastructure, monitoring services, and ensuring system reliability using Microsoft Azure. This role requires a deep understanding of Terraform, Azure DevOps (CI/CD), and Azure monitoring services to optimize performance and maintain system availability. A strong focus on service operations, incident response, proactive monitoring, disaster recovery, and service reporting is essential.

Requirements

Key Responsibilities:

Apply Site Reliability Engineering (SRE) principles to ensure system reliability and performance.

Ensure the reliable operation and monitoring of Azure-based services.

Utilize Terraform to automate infrastructure provisioning and management.

Develop, configure, and manage CI/CD pipelines using Azure DevOps.

Implement log management and diagnostics using Azure Monitor, Application Insights, and Log Analytics.

Set up and manage alerts to proactively detect and mitigate system issues.

Conduct incident response, root cause analysis, and implement long-term fixes to enhance system stability.

Collaborate with teams to improve observability and system health metrics.

Establish and maintain incident management processes to minimize downtime.

Track and report service uptime, performance, and quality metrics.

Ensure compliance with security and governance policies within Azure environments.

Optimize cost management strategies across Azure resources.

Prepare, test, and execute disaster recovery plans to ensure business continuity.

Required Skills & Qualifications:

Microsoft certification at an intermediate level or above (e.g., Microsoft Certified: Azure Administrator Associate, Azure DevOps Engineer Expert, or Azure Solutions Architect Expert).