Logo for Compass.uol

AI Runtime & Inference Engineer - LLM Platforms | Specialist (Remote)

Roles & Responsibilities

  • Expertise in operating production language models with focus on performance and availability
  • Proficiency with LLM serving frameworks at scale: vLLM, TGI (Text Generation Inference), Triton Inference Server or equivalents
  • Advanced experience with Kubernetes and management of inference workloads with accelerators
  • Expertise in observability in complex environments: Prometheus, Grafana, OpenTelemetry and signal correlation

Requirements:

  • Operate, optimize and evolve the runtime of agents and LLM inference infrastructure in production
  • Define and implement model endpoint architecture with a focus on latency and availability SLOs
  • Design and maintain end-to-end observability pipelines: metrics, structured logs, distributed traces and intelligent alerts
  • Lead incident response and root cause analysis for failures in the inference environment

Job description

JOB DESCRIPTION


.


RESPONSIBILITIES AND ASSIGNMENTS


  • Operar, otimizar e evoluir o runtime de agentes e a infraestrutura de inferência de LLMs em produção;
  • Definir e implementar arquitetura de endpoints de modelo com foco em SLOs de latência e disponibilidade;
  • Projetar e manter pipelines completos de observabilidade: métricas, logs estruturados, traces distribuídos e alertas inteligentes;
  • Conduzir otimizações avançadas de performance: batching dinâmico, caching semântico, quantização e gestão de contexto;
  • Liderar resposta a incidentes e análises de causa raiz em falhas do ambiente de inferência;
  • Definir padrões de resiliência e estratégias de failover para workloads de LLM em produção;
  • Produzir runbooks, playbooks e documentação operacional de referência para o ambiente;

REQUIREMENTS AND QUALIFICATIONS


Habilidades necessárias:


  • Expertise em operação de modelos de linguagem em produção com foco em performance e disponibilidade;
  • Domínio de frameworks de LLM serving em escala: vLLM, TGI (Text Generation Inference), Triton Inference Server ou equivalentes;
  • Experiência avançada com Kubernetes e gerenciamento de workloads de inferência com aceleradores;
  • Expertise em observabilidade em ambientes complexos: Prometheus, Grafana, OpenTelemetry e correlação de sinais;
  • Profundo conhecimento de AWS e seus serviços de ML (SageMaker Endpoints, Bedrock, EKS);

 

Habilidades desejáveis:


  • Experiência com otimização avançada de modelos: quantização (GPTQ, AWQ), distilação e compilação para inferência;
  • Conhecimento prático de GPUs e aceleradores (NVIDIA A100/H100) em contextos de produção;
  • Experiência com caching semântico e estratégias avançadas de gestão de contexto para LLMs;
  • Histórico de atuação em SRE ou engenharia de plataforma em ambientes de missão crítica;
  • Experiência com arquiteturas multi-região e estratégias de disaster recovery para workloads de IA;

 



Become a Compasser, be part of AI/R.


Compass UOL is a global firm and part of the AI Revolution Company, together transforming organizations using Artificial Intelligence, Generative AI, and other of today’s most advanced technologies.


We equip our team with proprietary and external AI-driven tools to design and build digital-native platforms, integrating cutting-edge technologies and enabling companies to innovate, transform their businesses, and drive success in their markets.

To achieve this, we attract and develop the best talent, creating opportunities that enhance people’s lives and highlight the positive impact of disruptive technologies.

We empower borderless talent and promote knowledge and opportunities in the latest market trends, driving significant personal and professional growth.

Join us and be part of the AI-driven revolution.


Platform Engineer Related jobs

Other jobs at Compass.uol

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.