2+ years of professional experience in data engineering or high-performance backend roles., Production expertise with ClickHouse and Apache Spark on multi-terabyte datasets., Hands-on experience operating MongoDB for semi-structured workloads., Proficiency in Python and/or Scala, with solid Git and CI/CD habits..
Key responsibilities:
Ingest and model Beacon-chain data into ClickHouse and MongoDB at multi-TB scale.
Develop scalable ETL/ELT pipelines in Apache Spark orchestrated via GitHub Workflows.
Expose clean, version-controlled datasets to internal stakeholders through APIs and dashboards.
Collaborate with Protocol & DevOps teams to surface validator health and protocol-level anomalies.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Obol Labs Inc is an R&D development team focused on proof-of-stake infrastructure for public blockchain networks. Specific topics of focus are Internet Bonds, Distributed Validator Technology and Multi-Operator Validation.
DV Labs is building the next generation of distributed validators to make Ethereum staking more resilient, decentralized, and secure.
Our software allows groups of operators—large and small—to collaboratively run a single validator while minimizing single‑point‑of‑failure risk and maximizing client diversity. We are a venture‑backed, remote‑first team that values open‑source ethos, long‑term thinking, and empirical decision‑making.
We are searching for a dedicated Data Engineer to design, build, and own the data platform that powers product decisions, validator‑performance analytics, on‑chain research, and community transparency.
Responsibilities
Ingest & model Beacon‑chain data — blocks, attestations, sync‑committee aggregates, deposits, and slashings—into ClickHouse and MongoDB at multi‑TB scale.
Develop scalable ETL/ELT pipelines in Apache Spark (PySpark/Scala) orchestrated via GitHub Workflows and containerized CI/CD.
Implement columnar schemas & partition strategies to achieve sub‑second analytical queries and reduce storage footprint.
Expose clean, version‑controlled datasets & metrics to internal stakeholders through APIs, dashboards, and notebooks.
Collaborate with Protocol & DevOps teams to surface validator health, slash‑risk events, and protocol‑level anomalies in real time.
Own data quality, lineage, testing, and documentation across the stack; champion best practices and continuous improvement.
Contribute to open‑source tooling around consensus‑layer data, distributed‑validator monitoring, and Ethereum research.
Requirements
2+ years of professional experience in data engineering or high‑performance backend roles.
Production expertise with ClickHouse and Apache Spark on multi‑terabyte datasets.
Hands‑on experience operating MongoDB for semi‑structured/operational workloads.
Proficiency in Python (pandas/PySpark) and/or Scala; solid Git and CI/CD habits (GitHub Actions/Workflows or similar).
Deep understanding of the Ethereum consensus layer (Beacon chain architecture, validator lifecycle, slashing conditions, client diversity—Lighthouse, Prysm, Teku, etc.).
Comfortable working in a remote, asynchronous startup environment with high ownership and autonomy.
Nice to have
Familiarity with Ethereum execution‑layer JSON‑RPC, MEV‑Boost, and block‑building economics.
Experience operating distributed systems on Kubernetes, Nomad, or similar orchestrators.
Fluency in Python.
Exposure to data‑observability stacks (dbt, Great Expectations, Dagster) and time‑series monitoring (Prometheus/Grafana).
Prior contributions to web3 or other open‑source projects.