Restate (restate.dev) is a lightweight runtime that turns AI agents, workflows, and backend services into durable processes - so teams can focus on their logic, not failure mechanics.
The role: We're looking for a Senior to Staff-level cloud infrastructure engineer to work across all product pillars (OSS, on-prem deployments, Multi-tenant SaaS, BYOC; bring your own cloud). This means deep work in our Rust-based infrastructure layer, integrating with cloud provider APIs, building infrastructure-as-code tooling, and ensuring reliability and security at scale. You'll have significant ownership over major parts of our cloud infrastructure.
Front-row seat to the biggest infra shift in decades
Durable runtimes like Restate are becoming the next foundational infrastructure component - and increasingly a critical piece for AI applications. As systems become more agentic, long-running, integration-heavy, and failure-prone, durable execution turns reliability from a bespoke engineering tax into a default property. In this role, you’re not watching that shift from the sidelines - you help build the platform that enables it.
State-of-the-art tech, built from first principles
Restate re-imagines durable execution as a lightweight self-contained stack - no database required - and ships as a single Rust binary with an optimized custom storage layer, low latency orchestration, and an analytics engine for observability.
Enterprise Traction
Restate is already used by Fortune 500 companies, including Tier 1 banks running critical financial workflows, and also by cutting-edge AI and infra startups pushing the boundary of what “production-grade agents” mean. You’ll work on problems where reliability, correctness, and operational simplicity are existential.
Work with world-class engineers
You’ll partner directly with engineers who’ve built and operated foundational systems at scale - creators of Apache Flink, and leaders from Meta’s messaging infrastructure. You’ll have the chance to work with incredibly talented individuals who care deeply about their craft.
This is a Cloud Infrastructure Engineering role spanning Restate’s product offering: OSS, on-prem deployments, Multi-tenant SaaS, BYOC. The scope of the role includes but is not limited to:
Build and operate Restate Cloud: extend our managed multi-tenant offering, working across the infrastructure, control plane, networking, storage, and observability of Restate workloads.
Evolve our BYOC product and work with customers on operating on-prem installations: design and build the infrastructure that runs inside customer cloud accounts.
Reliability and observability across the fleet: SLOs, metrics, traces, logs, alerting, and runbooks. Build automation so we can scale our product offering across deployment methods.
On-call: participate in the cloud on-call rotation. A US-based hire materially improves our timezone coverage.
We’re targeting Senior-to-Staff: you’ve operated production SaaS or platform infrastructure before, you’ve seen real failure modes, and you have (strong) opinions about how to run multi-tenant systems. You have an appreciation for operating in a compliance-sensitive environment.
Strong cloud infrastructure background with deep understanding of major cloud provider architectures.
Experience with infrastructure-as-code and cloud orchestration, particularly Kubernetes-based stateful workloads; balancing continuous delivery with safety while maintaining large-scale production systems.
Software engineering skills in a systems language (Rust, Go, C++); willingness and ability to learn Rust on the job.
You should be comfortable taking ownership end-to-end, from design through production operations, and thrive in early-stage startup ambiguity.
Prior experience with Restate or durable execution specifically.
Deep enterprise procurement/compliance navigation.
Kubernetes operator development, experience with IaC systems like Cluster API, Crossplane or Terraform.
You want to work primarily on the runtime core rather than cloud, BYOC, and customer-facing infra.
You’ve mostly architected and reviewed, and aren’t excited to be hands-on.
You are averse to multi-cloud, Kubernetes, operating infrastructure as a shared responsibility with customers
We use Restate extensively: the Restate Cloud control plane is built on Restate and TypeScript.
Rust infrastructure services and Kubernetes operators.
US-based, fully remote. East Coast is a plus as it would materially improve our on-call coverage given the team’s existing geography.
Travel: minimal - occasional team offsites, little required customer travel.

Deutsche Telekom IT Solutions Slovakia

PROMOS consult Projektmanagement, Organisation & Service GmbH

Techtorch

Restate

NetBox Labs

Restate

Restate

Restate