We are looking for skilled Site Reliability Engineering (SRE) / Lead Engineer with a minimum of 8 years of experience to join a dynamic team within a leading organization. This role must have deep expertise in Application Performance Monitoring (APM), Infrastructure as Code (IaC), automation, and distributed tracing using OpenTelemetry.
As a SRE lead, he will guide the design, implementation, and continuous improvement of observability solutions, ensuring system reliability, performance, and scalability while fostering best practices in SRE and DevOps.
· -Lead the strategic development and management of observability and reliability frameworks across the organization, ensuring alignment with business goals and technical requirements.
· -Design and implementation of monitoring and observability solutions, collaborating with engineering teams to define standards and best practices.
· -Manage Infrastructure as Code (IaC) initiatives using Terraform, coordinating with cloud and infrastructure teams to ensure scalable and secure deployments.
· -Drive automation strategies for monitoring, alerting, and logging pipelines, focusing on process improvements and operational efficiency.
· -Develop and maintain comprehensive observability roadmaps, including distributed tracing, logging, and metrics collection strategies.
· -Collaborate with product management, sales, and pre-sales teams to provide technical expertise and support during solution design and customer engagements.
· -Lead cross-functional teams to enhance CI/CD pipelines and deployment reliability, ensuring smooth integration of observability tools and practices.
· -Engage with vendors and strategic partners to evaluate, select, and integrate observability and monitoring solutions, ensuring alignment with organizational needs and fostering strong collaborative relationships.
· -Mentor and develop junior engineers and analysts, fostering a culture of reliability, observability, and operational excellence.
Technical Skills Required:
· - 8-10+ years of experience in SRE, Observability, or DevOps roles, with leadership responsibilities.
· - Hands-on experience with OpenTelemetry for distributed tracing and observability instrumentation.
· -Proven expertise with Application Performance Monitoring (APM) tools such as New Relic, Datadog, AppDynamics, or Dynatrace.
· -Strong proficiency in Infrastructure as Code (IaC) using Terraform.
· -Solid understanding of cloud platforms including AWS, GCP, or Azure.
· -Experience with automation/configuration management tools like Ansible, Chef, or Puppet.
· -Deep knowledge of CI/CD pipelines and tools such as GitHub Actions, Jenkins, or Azure DevOps.
· -Experience managing Kubernetes and containerized environments (Docker, Helm).
· -Familiarity with log aggregation and analysis platforms like ELK Stack or Splunk.
· -Excellent leadership, communication, and collaboration skills.
Candidates must include their compensation expectations in their applications and resumes in English.

Qonto

LivePerson

AIS (Applied Information Sciences)

Xideral

Jobtome

Xideral

Xideral

Xideral