Who we are
Kentik is the network intelligence platform for modern infrastructure teams. Unlike traditional monitoring and observability tools, we demystify complex network operations, enabling organizations to deliver applications and innovation at scale. Built by network experts to make critical insight accessible to every engineer, Kentik is the real-time source of truth that understands every network in context — from data center to cloud to the internet. This single platform unifies and correlates cloud, device, flow, synthetic data to turn telemetry into action. Market leaders like Akamai, Booking.com, Dropbox, and Zoom rely on Kentik to run, manage, and optimize their networks.
As the Sr Manager, Production SRE you will lead our team responsible for the operational reliability of our bare metal infrastructure, networking, and system configuration that powers our product offerings. This is a hands-on "player/coach" role requiring both technical depth and leadership maturity.
This role represents a transformative opportunity - evolving what was traditionally a Network Operations Center (NOC) into a modern, proactive SRE function that leverages automation, data science, and reliability engineering principles.. This is your opportunity to help shape a critical function in a growing company.
- Lead and expand a high-performing, distributed Production SRE team
- Contribute directly to service provider network automation and infrastructure reliability efforts that help us scale safely
- Drive execution with ownership and accountability for bare metal and network reliability
- Explore and lead the implementation of AI approaches to enhance situational awareness and predictive capabilities
- Champion a customer-first mindset while positioning your team as the central source of situational awareness
- Implement network automation to replace manual configurations and reduce operational toil
- Support incident management, including on-call rotations and postmortem processes
- Partner with other engineering leaders to ensure seamless operations through the development of collaborative reliability initiatives
- Foster data-driven decision making and measurable outcomes for infrastructure components
- Contribute to maturing our SRE practices and governance as we scale
- Lead the cultural and technical transformation from traditional operations to modern SRE practices
What you'll bring
Studies have shown that some candidates tend to apply to jobs only if they meet 100% of the qualifications. We encourage you to apply if you meet most of the criteria - even if you don’t match all of the qualifications, your skills and experience could be valuable in this role!
- 6+ years of SRE/DevOps/Infrastructure experience with 3+ years managing engineering teams
- Strong hands-on technical expertise with infrastructure technologies including:
- Network fundamentals and protocols (e.g., TCP/IP, BGP, OSPF)
- Network hardware configuration and management (e.g., Juniper, Arista)
- Internet transit and peering, and data center relationship management
- DDoS detection and mitigation services (e.g., CDN, Akamai Prolexic)
- System administration and configuration management (e.g., Puppet)
- Network automation tools and frameworks (e.g., NAPALM, Ansible, SaltStack)
- Infrastructure as Code approaches for network configuration
- Bare metal infrastructure deployment and management
- CI/CD pipelines for infrastructure automation (e.g., Jenkins, GitHub Actions)
- Experience implementing or working with AIOps tooling for anomaly detection and predictive analytics
- Proven ability to implement network automation and contribute to IaC efforts
- Demonstrated ownership and bias for execution in critical infrastructure environments
- Strong customer orientation and stakeholder engagement
- Experience implementing SLO frameworks and reliability practices
- Proven incident management leadership and best-practice implementation
- Excellent communication and influence skills
Our tech stack
- Our core data engine and platform are primarily written in Go
- We use Node.js + Express for application serving, and React as our primary UI framework
- We also use some JS and Python for tooling/scripting
- In addition to our own database, we use Postgres, Kafka, Mysql, and Redis
- Internal and public APIs expose both rest/json and gRPC endpoints
- Haproxy, Envoy for API traffic routing and balancing
- Github for source control, PRs, issues
- Jenkins for automated builds
What we offer
Kentik is a fully remote company that operates globally. We seek professionals that will help us thrive as an organization, and in turn, to broaden and enhance your career. We’re very thorough in the interview process to understand your skills and how they will relate to your successful growth here at Kentik. Our compensation philosophy encompasses a fair program for all in order to attract, engage and retain talented individuals who will drive our business and wow our customers.
The compensation range for this position is: $221,000 - $299,000. This range reflects the low and high end of the U.S. compensation range Kentik reasonably and generally expects to pay the hired candidate in this role. The actual compensation offered may be lower or higher than the stated range depending on various factors, including but not limited to:
- Experience with the skill sets required for success
- Demonstrated competencies and potential
- A geographic market-based approach
In addition to a great career opportunity, Kentik offers stellar benefits for our employees, which include:
- 100% of premiums are paid by company for health, vision and dental coverage for you and your dependents
- Additionally, an annual Health Reimbursement Account (HRA) of $3,000 for an individual or $4,500 for a family
- Paid family & medical leave
- Open PTO, a quarterly Wellness Day, and a minimum of 10 paid holidays
- 401(k) retirement account
- Home office reimbursement
- Stock options
Note: Benefits are as listed for all US full-time employees. For compensation, international applicants will be treated equitably in relation to the laws applicable within the countries in which we operate.
Come work with us
The true meaning of Kentik is visibility. We’re committed to making sure everyone feels empowered to use their voice, has a sense of belonging, and is represented at Kentik.
We don’t look for individuals who fit the culture, but those who will continue to add to the culture.
We encourage everyone to apply, especially those individuals who are underrepresented in the industry: people of color, LGBTQI+ community, women, individuals with disabilities (both seen and unseen), veterans, and people of any age or family status.
Kentik is committed to creating an inclusive interview process. If you require a reasonable accommodation during the application or interview process, please reach out to recruiting@kentik.com.
Come as you are!
You will be working at a fast-growing, well-funded startup alongside industry thought leaders and network aficionados as we build the future of observability and set the high bar for how network operations and digital businesses should run. With a competitive salary and amazing benefits on top of the meaningful and challenging projects you’ll take on, we’re sure you’ll enjoy joining the Kentik team.
#li-remote