Role
At Sword, data powers our mission to build a pain-free world. The Data Platform Team is building the foundation that makes data usable across clinical, operations, AI, and product teams: a modern streaming lakehouse, distributed processing for low-latency data movement, durable workflows coordinating complex operations, and an API-first surface that lets producers and consumers serve themselves — without tickets, without infra context, and increasingly through agentic interfaces.
This role is for an engineer who thinks about data platforms as products. You’ve built or operated data systems at scale, you reason fluently from storage format up to query engine, and you want to shape the self-service and agentic experiences layered on top of that foundation.
To get to know more about our Tech Stack, check here.
AI fluency is a core expectation at Sword Health. Every candidate is assessed against our three-level framework — be ready to share real examples of how AI is already part of how you work.
Explorer (Level 1) — Uses AI daily to boost personal productivity
Builder (Level 2) — Creates workflows and tools that elevate the whole team
Integrator (Level 3) — Embeds AI into products and processes at scale
Every hire must demonstrate at least Level 1. The expected level will vary depending on the seniority of the role.
Design and evolve Sword’s streaming lakehouse — the foundation that every data consumer in the company depends on.
Build and operate distributed streaming pipelines that move data at low latency and high reliability.
Own the durable workflows that coordinate complex data movement across systems.
Shape the platform’s API surface — the interface producers and consumers use so they never need to touch infrastructure.
Drive evaluations and integrations with vendor data platforms, sitting inside the architectural trade-offs rather than just consuming the output.
Contribute to the self-service and agentic layer: interfaces designed to be consumed by humans, systems, and AI agents alike.
Partner with data engineers and analysts on contracts, governance, and lineage.
Build and maintain AI-ready data infrastructure that powers ML and AI-driven products across Sword.
Leverage AI coding assistants and LLMs to accelerate development, automate documentation, and raise code quality.
Work in a regulated environment where audit, compliance, and governance are part of every design.
Proven experience designing and operating data platforms at scale — warehouse, data lake, or lakehouse architectures in production.
Hands-on experience with a modern lakehouse table format — Iceberg strongly preferred; Delta Lake or Hudi also welcome. You understand how the format works under the hood: metadata layout, snapshots, manifests, compaction, copy-on-write vs. merge-on-read.
Clear mental model of catalogs (REST, Polaris, Glue, Unity, Hive) — their trade-offs, and how compute stays detached from storage.
Exposure to at least one vendor lakehouse or query platform — Snowflake, Starburst, or Databricks — at the level where you can reason about its architecture, not just use its UI.
Strong experience with a distributed processing engine — Flink strongly preferred; Spark also fine. You can reason about its internals, fine-tune a running job, and debug a pipeline that’s silently degrading.
Familiarity with durable execution — Temporal, Restate, or similar — or at minimum a solid mental model of what durable execution means and why it matters for data workflows.
Production experience building and operating APIs (REST or gRPC) at scale — good instincts about contracts, versioning, retries, rate limiting, and observability.
Solid understanding of Kafka and event-driven architectures (producers/consumers, partitioning, delivery semantics).
Comfortable in regulated environments (healthcare, fintech, gov) where audit, compliance, and data governance are part of every design.
Platform mindset: you design for self-service, API-first, and with systems and agents — not only humans — as legitimate consumers.
Bonus
Deeper familiarity with open/REST catalogs (Polaris, Nessie, Unity) beyond basic use.
Observability stack fluency (Prometheus, Grafana, OpenTelemetry).
Prior work on agentic or AI-facing API surfaces, or MCP-style interfaces.
Experience in HIPAA, FedRAMP, or SOC 2 environments.
dbt, DataHub, or data contract tooling exposure.
Mindset and Collaboration
Service orientation: you build APIs (and increasingly agent-facing tools) that others love to use.
Reliability-first: failure modes, retries, and observability are part of day-one design.
Cross-functional: you enjoy working with data engineers, analysts, and ML engineers and understanding their problems.
Documentation mindset: good APIs come with great docs — and good docs now means machine-readable too.
Iterative: you ship incrementally and improve based on feedback.

Lingaro

Compass.uol

tem

Xideral

True Logic Solutions

Kaia Health

Kaia Health

Kaia Health