7+ years of software engineering experience, including 3+ years in software architecture or technical leadership
Strong experience with Kubernetes-based platforms and cloud-native architecture
Deep understanding of Gen AI/LLM infrastructure and distributed workloads
Experience with CI/CD, deployment automation, upgrades, and rollback mechanisms
Requirements:
Lead the software architecture and technical roadmap for NeuReality’s NR-Nexus
Write system specifications for NR-Nexus product
Research AI infrastructure, SaaS platforms, model serving, and inference trends
Mentor software engineers and provide technical leadership
Job description
Description
NeuReality is seeking a Lead System Architect to join our system architecture team and help define NR-NEXUS, our next-generation AI inference platform.
Responsibilities
Lead the software architecture and technical roadmap for NeuReality’s NR-Nexus
Write system specifications for NR-Nexus product
Research AI infrastructure, SaaS platforms, model serving, and inference trends
Work with engineering to translate technical capabilities into product value
Work closely with engineering teams to optimize performance, scalability, and feature delivery.
Define performance goals and lead profiling, benchmarking, and optimization efforts for GenAI and distributed AI workloads.
Collaborate with customers, partners, and open-source communities to ensure ecosystem compatibility and adoption.
Mentor software engineers and provide technical leadership
Requirements
7+ years of software engineering experience, including 3+ years in software architecture or technical leadership.
Strong experience with Kubernetes-based platforms and cloud-native architecture.
Deep understanding of Gen AI/LLM infrastructure and distributed workloads
Experience designing management software or SaaS platforms for production systems.
Strong background in distributed systems, microservices, APIs, and automation.
Hands-on experience with observability stacks, monitoring, logging, alerting, and SLA/SLO tracking.
Experience with CI/CD, deployment automation, upgrades, and rollback mechanisms.
Good understanding of security, authentication, authorization, and integration with customer data center environments.
Nice to have
Deep understanding of GenAI / LLM inference infrastructure, including model serving, scaling, batching, latency, throughput, and resource utilization.
Experience with production AI inference clusters using GPUs, AI accelerators, or other specialized compute infrastructure.
Understanding of how distributed inference systems operate, including scheduling, load balancing, autoscaling, failover, and cluster-level observability.
Experience with LLM serving frameworks such as vLLM, Triton Inference Server, TensorRT-LLM, or similar.
Familiarity with GPU/accelerator orchestration, device plugins, resource scheduling, and cluster capacity planning.
Familiarity with GPU communication technologies such as GPUDirect RDMA, NCCL, NVLink, or UALink.
Experience optimizing communication for distributed AI/ML workloads.
Knowledge of Prometheus, Grafana, OpenTelemetry, Helm, Argo CD, Istio, KServe, Kubeflow, or similar tools.
Experience deploying software in on-prem, edge, private cloud, or hybrid environments.