Role overview

Qualifications

5–8 years in SRE, infrastructure, or platform/backend roles operating production systems at scale
Deep, practical Kubernetes experience
Comfortable in code (Go preferred)
Strong communication skills

Responsibilities

Own the reliability, scalability, and operability of the SigNoz cloud platform
Work on scaling the ingest path and maintaining data freshness
Operate and tune ClickHouse and the data layer for performance and cost
Handle Kubernetes infrastructure operations and automation

Key facts

Remote from: India
Full time
Senior (5-10 years)
Site Reliability Engineer (SRE)
English

Hard skills

Kubernetes Go (Programming Language) Telemetry Distributed Computing Performance Profiling Capacity Planning Infrastructure as Code (IaC) CI/CD Data System Observability Continuous Monitoring Well Logging Track And Trace

Other skills

Communication

About the company

SigNoz

Open source Observability platform

Company details

Company typeStartup

Industry

Company size11 - 50

Links

Website LinkedIn See all jobs

Your match analysis

See how your profile stacks up against this role.

We compared the job requirements to your profile to show where you're strong and where you fall short.

Job description

About SigNoz

SigNoz is an open-source observability platform that helps modern engineering teams monitor, debug, and optimize their applications with deep visibility into metrics, traces, and logs — all in one place. We're built natively on OpenTelemetry and offer both self-hosted and cloud options, so teams can run observability the way they want, without vendor lock-in.

We are growing fast and building core developer infra products. And we are not fooling around:

27,000+ GitHub stars
800+ customers
7,000+ members in our Slack community

Role: Sr Site Reliability Engineer (SRE)

We're looking for an SRE to own the reliability, scalability, and operability of the SigNoz cloud platform. You'll keep a petabyte-scale observability system fast and dependable — making sure the people who trust us to watch their systems can always trust ours. The platform team handles infra, scalability of SaaS, ingest pipelines, staging environments, automation, and the operational backbone of the product.

This is a deeply hands-on role for someone who understands what actually breaks in production at scale — and enjoys fixing it for good.

What we're looking for

Kubernetes at scale — not just "I've deployed to k8s," but real fluency with the nuances and gotchas: resource tuning, autoscaling behavior, networking, stateful workloads, upgrades, and the failure modes that only show up under load
Working knowledge of ClickHouse — operating it, tuning queries, and understanding its behavior at scale — is a strong plus
Knowledge of Golang is a plus (most of our stack and tooling is in Go)
Familiarity with OpenTelemetry and running large-scale data ingest pipelines is a plus

What you'll work on

You'll work with a high-caliber team across areas like:

Reliability of the SigNoz cloud platform: SLOs/SLIs, error budgets, incident response, and on-call practices that don't burn people out
Scaling the ingest path — making it robust to bursts while maintaining data freshness
SaaS auto-scalability and capacity planning across a petabyte-scale system
Operating and tuning ClickHouse and the data layer for performance and cost
Kubernetes infrastructure: cluster operations, upgrades, multi-tenancy, and the automation that keeps it boring
Observability of SigNoz itself — we dogfood our own product, so you'll help make it world-class
Infrastructure-as-code, CI/CD, and the tooling that lets a small team operate big systems

What will make you successful

5–8 years in SRE, infrastructure, or platform/backend roles operating production systems at scale
Deep, practical Kubernetes experience — you know where the bodies are buried
Strong grasp of distributed systems failure modes, performance debugging, and capacity planning
Comfortable in code (Go preferred) — you automate and fix things, not just configure them
Loves open source — ideally with prior contributions to OSS projects (any size)
Comfortable in a high-ownership, fast-moving, remote-first environment
Strong communication — can write clear runbooks and tech docs and explain trade-offs

Nice-to-haves

Past experience on platform/infra/SRE teams of Series B+ startups
Hands-on experience operating ClickHouse, Kafka, or similar high-throughput data systems
Experience in observability (monitoring / logging / tracing) and with OpenTelemetry

Why you'll love working at SigNoz

Work on a globally used open-source project that engineers actually love
Huge scope and ownership — your work directly shapes how teams adopt SigNoz
Collaborate with a high-caliber team who just can't stop shipping
Remote-first, async-friendly culture
Opportunity to help define the future of open-source observability

Apply once. Then go straight to the hiring manager.

After you apply, unlock the direct contact details of the people who actually make the call. A quick follow-up makes you 5x more likely to land an interview.

Marcus Rivera

Chief Revenue Officer

m.rivera@company.com

linkedin.com/in/marcusrivera

Unlocked after you apply

Site Reliability Engineer (SRE) Related jobs

India Site Reliability Engineer (SRE)

Site Reliability Engineer (Contract)

Today

SweetRush

Fixed term

Site Reliability EngineeringAmazon Web ServicesKubernetesTerraformHardware Architecture

Senior Site Reliability Engineer

3 days ago

Latitude AI

Full time

LinuxPython (Programming Language)KubernetesAmazon Web ServicesgRPC

Senior Site Reliability Engineer (SRE) – CloudVision as a Service (CVaaS)

3 days ago

Arista Networks

Full time

KubernetesPython (Programming Language)Go (Programming Language)Bash (Scripting Language)Cloud Computing

Staff Reliability Engineer (Full Stack)

2 days ago

Feeld

Full time

TypeScriptNode.js (Javascript Library)ObservabilityBack End (Software Engineering)Debugging

Senior Site Reliability Engineer (SRE) – CloudVision as a Service (CVaaS)

3 days ago

Arista Networks

Full time

Python (Programming Language)Go (Programming Language)KubernetesCloud ComputingBash (Scripting Language)

Other jobs at SigNoz

Staff Backend Engineer - Core

22 days ago

SigNoz

Full time
Senior (5-10 years)

Go (Programming Language)Distributed ComputingBack End (Software Engineering)Concurrency PatternPerformance Improvement

Exceptional Engineer

22 days ago

SigNoz

Full time

ObservabilityContinuous MonitoringWell LoggingTrack And TraceTelemetry

Sr Backend Engineer - AI

13 days ago

SigNoz

Full time
Senior (5-10 years)

Python (Programming Language)Back End (Software Engineering)Distributed ComputingData ModelingAI Testing

Sr Site Reliability Engineer

Role overview

Qualifications

Responsibilities

Key facts

Hard skills

Other skills

About the company

Company details

Links

Your match analysis

Job description

Apply once. Then go straight to the hiring manager.

Site Reliability Engineer (SRE) Related jobs

Site Reliability Engineer (Contract)

Senior Site Reliability Engineer

Senior Site Reliability Engineer (SRE) – CloudVision as a Service (CVaaS)

Staff Reliability Engineer (Full Stack)

Senior Site Reliability Engineer (SRE) – CloudVision as a Service (CVaaS)

Other jobs at SigNoz

Staff Backend Engineer - Core

Exceptional Engineer

Sr Backend Engineer - AI

Reach out to the hiring manager directly.