Job description

About Us

Chorus One is one of the leading operators of infrastructure for Proof-of-Stake networks and decentralized protocols. Tens of thousands of retail customers and institutions are staking billions in assets through our infrastructure helping to secure protocols and earn rewards. Our mission is to increase freedom and speed of innovation through decentralized technologies.

We are a diverse team of 70+ people distributed all over the globe. We value radical transparency, striving for excellence and improvement while treating each other with kindness and generosity. If this sounds like you, we'd love to hear from you.

Position Overview:
As a senior software engineer, you will join one of our engineering teams to assist in building and maintaining tools and automation to support our validator operations. We take the upstream node software from projects like Ethereum, Solana, Cosmos, or Avalanche; compile it; run it on one of our servers; and then make sure it is reliable and secure, monitor it, and keep it up to date. We do this for more than 60 blockchain networks, which means that it is not feasible to do all of this by hand. Instead, we build automation. Some of the things we do:

Contribute to upstream software to improve observability, and build monitoring tools from scratch where none exist. The teams that build the node software are not the teams that operate this software at scale, and as such, observability is often not a first priority. We develop our own tools for on-chain and off-chain monitoring, both for short-term metrics (to alert on) and long-term metrics to measure our performance, and to support optimisation decisions.
Build tools to track and manage our fleet of servers. We work mostly with bare-metal servers across multiple providers. This means that no vendor-specific portal is going to give us a complete overview of our infrastructure, instead we have an in-house tool that integrates with vendor APIs and gives us a central overview.
Automate machine provisioning. Instead of working with 10+ cloud and bare metal providers’ flavor of installing Ubuntu, we build our own installer that is uniform across our infrastructure.
Track and automate builds. Each of the 60+ networks we operate regularly releases updates. It would be tedious to manually git pull && make for every release, instead we have automation watching for new releases that automatically builds them and registers them in our package registry.
Automate updates and failover. When we have a new package, we still need to roll it out to our fleet and restart any nodes, in a controlled manner and without downtime. For validating nodes, we also need to fail over before we restart them, and confirm the new node is healthy. To automate this, we need to have 100% confidence in our tooling, because a mistake here can lead to double-signing, which incurs a financial penalty.
Automate snapshot creation and storage. Blockchains node software is stateful in nature: the chains are often terabytes in size. While it is possible for new nodes to sync from the p2p network, this can take days to weeks, which means it is not a suitable method when we move workloads between machines. We automate taking snapshots of this data, so we can be more flexible about what runs where, without compromising on security.

Our internal tooling:

It is written in a mix of Rust, Python, Go, and a bit of TypeScript. We use Postgres as our database of choice. We deploy our code either directly onto Ubuntu hosts, running under systemd or in Docker containers, and we also have a Kubernetes cluster running various stateless applications. Due to the diverse nature of the software we run, we also have to occasionally dive into codebases written in C, C++, OCaml, or TypeScript.

You can learn more about our approach to operating nodes in our Network Handbook.

Key Responsibilities

Design and develop new features. Discuss with internal stakeholders to clarify how our next feature should look, discuss with fellow engineers how it should be implemented, and then drive implementation to completion.
Support and collaborate. Review and discuss engineering designs, review code, help fellow engineers, and mentor them on a technical level.
Innovation and continuous improvement. Seek to simplify, optimize, and secure our staking services and systems.
Take part in on-call rotation approximately 2 days per month. You will be responsible for automation that manages our validators and infrastructure, including its uptime and incident response.

Required profile