Logo for HighLevel

Lead Site Reliability Engineer

Key Facts

Remote From: 
Full time
Senior (5-10 years)
English

Other Skills

  • β€’
    Collaboration
  • β€’
    Problem Solving
  • β€’
    Troubleshooting (Problem Solving)

Roles & Responsibilities

  • 7+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
  • Hands-on experience with GCP and AWS
  • Proficiency in Python, Bash, or Shell scripting
  • Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools

Requirements:

  • Develop and improve observability using monitoring, logging, tracing, and alerting tools
  • Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues
  • Collaborate with developers to enhance application reliability, scalability, and performance
  • Drive cost optimization efforts in cloud environments

Job description

About HighLevel:
HighLevel is an AI-powered business operating system that gives agencies, entrepreneurs and SMBs the infrastructure to build, automate and scale. Today, HighLevel supports SMBs across 150+ countries, fueling community-driven growth rooted in real customer outcomes.To date, businesses operating on HighLevel have generated over $7 billion in ecosystem value, demonstrating the impact of shared infrastructure at scale. By centralizing conversations, automation and intelligence into one system, we help businesses move faster, reduce complexity and execute efficiently.Behind the platform, HighLevel powers more than 4 billion API hits and 2.5 billion message events daily. With 250 terabytes of distributed data, 250+ microservices and over 1 million domain names supported, our architecture is built for performance, resilience and long-term scalability.

Our PeopleWith over 2,000 team members across 10+ countries, HighLevel operates as a global, remote-first organization built for speed and ownership. We value initiative, clarity and execution, creating space for ambitious people to build systems that support millions of businesses worldwide. Here, innovation thrives, ideas are celebrated and people come first, no matter where they call home.

Our ImpactEvery month, HighLevel enables more than 1.5 billion messages, 200 million leads and 20 million conversations for the more than 1 million businesses we support. Behind those numbers are real people building independence, expanding opportunity and creating measurable impact. We’re proud to be a part of that.Learn more about us on our YouTube Channel or Blog Posts

About the Role:

We are looking for a Site Reliability Engineer (SRE) to join our team and help ensure the availability, performance, and scalability of our critical systems. You will work closely with development and operations teams to automate processes, enhance system reliability, and improve observability.


Responsibilities:
  • Develop and improve observability using monitoring, logging, tracing, and alerting tools (Prometheus, Grafana, ELK, OpenTelemetry, etc.).
  • Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues.
  • Collaborate with developers to enhance application reliability, scalability, and performance.
  • Drive cost optimization efforts in cloud environments.
  • Experience with multiple databases Mongo, Redis, ES, Queue based etc

  • Requirements:
  • Experience: 7+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
  • Cloud Expertise: Hands-on experience with GCP and AWS.
  • Infrastructure as Code (IaC): Terraform, Helm, or equivalent tools.
  • Containerization & Orchestration: Docker, Kubernetes (GKE).
  • Observability: Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools.
  • Programming/Scripting: Proficiency in Python, Bash, or Shell scripting. Basic understanding of API parsing and JSON manipulation.
  • CI/CD Pipelines: Hands-on experience with Jenkins, GitHub Actions, ArgoCD, or similar tools.
  • Incident Management: Experience with on-call rotations, SLOs, SLIs, SLAs, Escalation Policies, and incident resolution.
  • Databases: Experience in monitoring Mongo, Redis, ES, Queue based etc
  • EEO Statement:
    The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.

    We encourage you to review our Privacy Policy before submitting your application.

    Site Reliability Engineer (SRE) Related jobs

    Other jobs at HighLevel

    We help you get seen. Not ignored.

    We help you get seen faster β€” by the right people.

    πŸš€

    Auto-Apply

    We apply for you β€” automatically and instantly.

    Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

    ✨

    AI Match Feedback

    Know your real match before you apply.

    Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

    Upgrade to Premium. Apply smarter and get noticed.

    Upgrade to Premium

    Join thousands of professionals who got noticed and hired faster.