Location: Remote (Global)
Type: Fulltime
Company: Yotta Labs
Apply: careers@yottalabs.ai
🧠 About Yotta Labs
Yotta Labs is pioneering the development of a Decentralized Operating System (DeOS) for AI workload orchestration at a planetary scale. Our mission is to democratize access to AI resources by aggregating geodistributed GPUs, enabling highperformance computing for AI training and inference on a wide spectrum of hardware—from commodity to highend GPUs. Our platform supports major large language models (LLMs) and offers customizable solutions for new models, facilitating elastic and efficient AI development.
🛠️ Role Overview
We are seeking a GPU Cloud Platform Engineer to join our core infrastructure team and help build the nextgeneration AI compute cloud. In this role, you will design, deploy, and operate largescale, multicluster GPU infrastructure across data centers and cloud environments. You will be responsible for ensuring high availability, performance, and efficiency of containerized AI workloads—ranging from LLMs to generative models—deployed in Kubernetesbased GPU clusters. If youre passionate about highperformance systems, distributed orchestration, and scaling realworld AI infrastructure, this role offers a unique opportunity to shape the backbone of our AI cloud platform.
🎯 Responsibilities
Build and operate largescale, highperformance GPU clusters; ensure stable operation of compute, network, and storage systems; monitor and troubleshoot online issues.
Conduct performance testing and evaluation of multinode GPU clusters using standard benchmarking tools to identify and resolve performance bottlenecks.
Deploy and orchestrate large models (e.g., LLMs, video generation models) across multicluster environments using Kubernetes; implement elastic scaling and crosscluster load balancing to ensure efficient service response under high concurrency for global users.
Participate in the design, development, and iteration of GPU cluster scheduling and optimization systems. Define and lead Kubernetes multicluster configuration standards; Optimize scheduling strategies (e.g., node affinity, taintstolerations) to improve GPU resource utilization.
Build a unified multicluster management and monitoring system to support crossregion resource monitoring, traffic scheduling, and fault failover. Collect key metrics such as GPU memory usage, QPS, and response latency in real time; configure alert mechanisms.
Coordinate with IDC providers for planning and deploying largescale GPU clusters, networks, and storage infrastructure to support internal cloud platforms and external customer needs.
✅ Qualifications
Bachelors degree or higher in Computer Science, Software Engineering, Electronic Engineering, or related fields; 3+ years of experience in system engineering or DevOps.
5+ years of experience in cloudnative development or AI engineering, with at least 2 years of handson experience in Kubernetes multicluster management and orchestration.
Familiarity with the Kubernetes ecosystem; handson experience with tools such as kubectl
,
Helm
, and expertise in multicluster deployment, upgrade, scaling, and disaster recovery.
Proficient in Docker and containerization technologies; knowledge of image management and crosscluster distribution.
Experience with monitoring tools such as
Prometheus
and
Grafana
; Has practical experience in GPU fault monitoring and alerting.
Handson experience with cloud platforms such as AWS, GCP, or Azure; understanding of cloudnative multicluster architecture.
Experience with cluster management tools such as
Ray
,
Slurm
,
KubeSphere
,
Rancher
,
Karmada
is a plus.
Familiarity with distributed file systems such as NFS, JuiceFS, CephFS, or Lustre; ability to diagnose and resolve performance bottlenecks.
Understanding of highperformance communication protocols such as IB, RoCE, NVLink, and PCIe.
Strong communication skills, selfmotivation, and team collaboration
🌟 Preferred Experience
Experience in developing and operating MaaS platforms or largescale model inference clusters. Proven track record of leading multicluster system development or performance optimization projects.
Proficiency in CUDA programming and the NCCL communication library; understanding of highperformance GPUs like H100.
Ability to develop standardized inference APIs (RESTfulgRPC) and automation tools using Golang or Python.
Handson experience with optimization techniques such as model quantization, static compilation, and multiGPU parallelism; capable of profiling inference processes in multicluster setups and identifying bottlenecks like memory fragmentation and low compute efficiency.
Active engagement with opensource communities such as Hugging Face and GitHub; deep understanding of the design principles of inference frameworks like
Triton
,
vLLM
, and
SGLang
; ability to perform secondary development and optimization based on opensource projects and quickly translate cuttingedge techniques into productionready multicluster solutions.
🌐 Why Join Yotta Labs?
Be part of a visionary team aiming to redefine AI infrastructure.
Work on cuttingedge technologies that bridge AI and decentralized computing.
Collaborate with experts from leading institutions and tech companies.
Enjoy a flexible, remote work environment that values innovation and autonomy.
📩 How to Apply
Interested candidates should apply directly or send their resume and a brief cover letter to careers@yottalabs.ai. Please include links to any relevant projects or contributions.
Macee
Prolific
Macee
Rapinno Health Care
Brixio Singapore