Key Facts

Remote From:

United Kingdom

Full time

Mid-level (2-5 years)

English

Hard Skills

Other Skills

•
Team Effectiveness
•
Communication
•
Virtual Collaboration
•
Mentorship

Roles & Responsibilities

Excellent experience in software engineering with strong backend and systems development (PHP, Python, Go, Rust, or similar)
Proven experience building and operating high-performance, low-latency distributed systems in production
Deep understanding of asynchronous processing, queues, concurrency models, and back pressure
Familiarity with modern cloud infrastructure, CI/CD, and observability stacks (metrics, tracing, profiling)

Requirements:

Own end-to-end inference performance across the platform, with clear responsibility for latency, throughput, and reliability targets
Lead the architecture and design of core inference systems, including request routing, async execution, queuing, GPU scheduling, and result delivery
Drive the platform toward sub-1 second inference where feasible, identifying bottlenecks across networking, services, storage, and GPU execution
Define performance budgets, SLAs, and success metrics, and ensure they are measured, visible, and actively improved

Runware

About Runware

Runware provides lightning-fast text-to-image and image-to-image generation speeds—0.3 seconds for SD 1.5 and 1.9 seconds for SDXL. All through one powerful API.By combining proprietary hardware with accelerated software and orchestration, Runware sets new industry benchmarks with 10x inference efficiency and significant cost savings compared to other providers. No image degradation, just supercharged Stable Diffusion, supporting the entire SD ecosystem, including technologies like ControlNet, IP-Adapters, InstantID, and more.Get complete flexibility in model selection, with more than 150k open source SD models included – and the option to bring any of your own privately trained models or fine-tunes.Runware is backed by some of the world’s leading investors, including: a16z Speedrun, Lunar Ventures, Zero Prime, Begin Capital, and more.

Founded: 2018

Company size: 2 - 10

LinkedIn See all jobs →

Job description

We’re looking for a Staff Engineer to take technical ownership of latency, throughput, and reliability across Runware’s AI inference platform.

This is a senior technical leadership role for someone who obsesses over performance at scale, from request ingress through GPU execution to result delivery, and who can consistently turn ambitious targets such as sub-one-second inference into production reality.

As a Staff Engineer, you will define and drive the architecture, standards, and execution needed to make Runware one of the fastest and most reliable inference platforms in the market. You will work deeply across backend services, distributed systems, GPU workloads, and infrastructure, partnering closely with product, ML, and platform teams.

This role is ideal for someone who enjoys operating at the intersection of systems design, performance engineering, and real-world scale, and who wants clear ownership over outcomes that matter directly to customers.

What you’ll do

Own end-to-end inference performance across the platform, with clear responsibility for latency, throughput, and reliability targets
Lead the architecture and design of core inference systems, including request routing, async execution, queuing, GPU scheduling, and result delivery
Drive the platform toward sub-1 second inference where feasible, identifying bottlenecks across networking, services, storage, and GPU execution
Make high-impact architectural decisions with performance, scalability, and operational simplicity as first-class concerns
Partner with ML and model teams to ensure models are production-ready from a performance perspective (cold starts, batching, memory usage, concurrency)
Define performance budgets, SLAs, and success metrics, and ensure they are measured, visible, and actively improved
Lead deep-dive investigations into latency spikes, throughput degradation, and system-level performance issues
Influence and mentor engineers across teams on performance engineering, distributed systems thinking, and operational excellence
Improve tooling, observability, and profiling capabilities to make performance issues easier to detect and reason about
Advocate for pragmatic engineering best practices around testing, benchmarking, rollouts, and documentation

Requirements

What We’re Looking For

Excellent experience in software engineering, with a strong focus on backend and systems development (PHP, Python, Go, Rust, or similar)
Proven experience building and operating high-performance, low-latency distributed systems in production
Deep understanding of asynchronous processing, queues, concurrency models, and back pressure
Strong intuition for performance trade-offs across CPU, GPU, networking, storage, and application layers
Experience making and defending critical architectural decisions in complex systems
Hands-on experience troubleshooting real production issues under load (latency, saturation, cascading failures)
Familiarity with modern cloud infrastructure, CI/CD, and observability stacks (metrics, tracing, profiling)
Ability to communicate clearly and influence across teams in a remote-first environment
Strong mentorship mindset and a desire to raise the technical bar across the organisation

Nice to haves

Experience working on AI/ML inference platforms, GPU-backed workloads, or performance-critical compute systems
Knowledge of model optimisation techniques (batching, quantisation, warm-starts, memory management)
Experience with infrastructure-as-code and DevOps practices
Background in startups or fast-paced environments where speed, ownership, and pragmatism matter
Prior ownership of latency or throughput SLOs at scale

Benefits

We’re a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.

Our release cycles are fast and intense, but they’re followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.