Core & ML Ops Team Lead Remote

unlimited holidays - extra holidays - extra parental leave - fully flexible
Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

Over 5 years of experience in building distributed systems., At least 3 years of experience in MLOps or ML platform engineering., Proficiency in developing high-performance services using Java, Rust, Go, or C++ with strong Python skills., Knowledge of Linux internals, networking, concurrency, and performance profiling..

Key responsibilities:

  • Design and evolve the core platform infrastructure including Kubernetes and distributed compute.
  • Own and maintain the model platform, including registry, experiment tracking, and serving.
  • Lead the team in roadmap planning, delivery, and mentoring to ensure high engineering standards.
  • Collaborate with cross-functional teams to adopt and roll out platform solutions.

Zyte logo Icon for a company verified by Jobgether
Zyte Information Technology & Services SME https://www.zyte.com/
201 - 500 Employees

Job description

About Us

At Zyte, we eat data for breakfast and you can eat your breakfast anywhere and work for Zyte. Founded in 2010, we are a globally distributed team of over 250 Zytans working from over 28 countries who are on a mission to enable our customers to extract the data they need to continue to innovate and grow their businesses. We believe that all businesses deserve a smooth pathway to data.

For more than a decade, Zyte has led the way in building powerful, easytouse tools to collect, format, and deliver web data, quickly, dependably, and at scale. And today, the data we extract helps thousands of organizations make smarter business decisions, secure competitive advantage, and drive sustainable growth. Today, over 3,000 companies and 1 million developers rely on our tools and services to get the data they need from the web.

Zyte is seeking an experienced Team Lead to manage our Core & MLOps Squad, responsible for Building the bedrock infrastructure that powers Zyte at scale. This handson technical leadership role requires expertise across MLOps, systems programming, and orchestration to lead a crossfunctional team in designing and maintaining the scalable foundation that enables all Zyte teams to build and run their services with confidence.

Requirements

What you’ll do
Technical Leadership
  • Design and evolve the core platform (Kubernetes, Mesos, GPU schedulingautoscaling, distributed compute).
  • Own the model platform: registry, experiment tracking, training orchestration, evaluation, serving, and monitoring.
  • Build the Golden Path: reference repos, a scaffold CLI, opinionated CICD pipelines, runtime contracts (healthmetricstracingSLOs), highperformance clients, circuit breakers and other production‑ready defaults.
    • MLOps Excellence
      • Operate a secure, multi‑tenant model registry and training platform with standardized experimentevaluation harnesses.
      • Provide turnkey serving patterns (online + batch), driftquality monitoring, and rollback playbooks.
      • Integrate publicopen‑source AI capabilities as managed platform services with cost and data‑governance guardrails.
        • Team Management
          • Run the squad: roadmapprioritization, delivery, mentoring, and high engineering standards.
          • Partner with product engineering (Zyte API, Scrapy Cloud), Prod Ops, and Security on adoption and rollout plans.
          • Mentor the team and foster a platformthinking mindset.
            • Ownership Areas
              • Container orchestration (KubernetesKnative), GPU provisioning & autoscaling, environment & secret management.
              • Operators, sidecars, and internal SDKslibraries (GoRustPythonJava) that enforce the golden path contract.
              • Model platform: registry, experiment tracking, training orchestration, evaluation framework, serving infra, model monitoring.
              • Observability: loggingmetricstracing pipelines;
              • Billing pipeline: meteringeventscost tracking abstractions.
              • Golden Path: Java, Python, ML templates + CICD blueprints + docs + scaffold CLI.
              • Reliability enablement (SRE practices), cost governance, supply‑chain security (SBOM, image signing).
                • Qualifications
                  Required
                  • 5+ years experience building distributed systems; 3+ years in MLOpsML platform engineering (or equivalent impact).
                  • Knowledge of LinuxOS internals (process model, cgroupsnamespaces), networking (TCPIP, HTTP2), concurrency, and performance profiling.
                  • Deep understanding of Kubernetes (bonus: Mesos)
                  • Proficiency developing highperformance services in Java, Rust, Go or C++ (bonus: familiarity with vert.x and Netty frameworks); strong Python skills.
                  • Experience with GPU infrastructure (scheduling, containerization, optimization).
                  • Track record of designing and operating model platforms (registry, training, serving, monitoring) in production.
                  • Demonstrated success leading technical teams and implementing organizationwide platform solutions.
                    • Preferred
                      • Streaming & workflows: Kafka plus ArgoTemporalAirflow or equivalents.
                      • eBPF‑based observability, perf tooling, or io_uring experience
                      • Cost optimization for MLAI; multi‑tenant quotas and fairness.
                      • Hands‑on experience authoring Golden Paths (service chassistemplates, CICD blueprints, CLI scaffolds).
                      • SRE practices (SLIsSLOs, incident management)
                        • Benefits

                          Benefits:

                          • We love fostering and nourishing new ideas and bringing them to market
                          • Become part of a selfmotivated, progressive, multicultural team.
                          • Have the freedom and flexibility to work from where you do your best work, as we are a completely remote company.
                          • Get the chance to work with cuttingedge opensource technologies and tools.

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Mentorship
  • Team Management

Related jobs