Key Facts

Remote From:

Brazil

Category: Platform Engineer

Full time

Senior (5-10 years)

Portuguese

Hard Skills

AWS SageMaker Endpoint Engineering Inference Engine Batch Processing Data Caching Quantization Incident Response Operational Resilience Fractional Distillation

Roles & Responsibilities

Expertise in operating production language models with focus on performance and availability
Proficiency with LLM serving frameworks at scale: vLLM, TGI (Text Generation Inference), Triton Inference Server or equivalents
Advanced experience with Kubernetes and management of inference workloads with accelerators
Expertise in observability in complex environments: Prometheus, Grafana, OpenTelemetry and signal correlation

Requirements:

Operate, optimize and evolve the runtime of agents and LLM inference infrastructure in production
Define and implement model endpoint architecture with a focus on latency and availability SLOs
Design and maintain end-to-end observability pipelines: metrics, structured logs, distributed traces and intelligent alerts
Lead incident response and root cause analysis for failures in the inference environment

Compass.uol

About Compass.uol

Compasso UOL is a Brazilian technology company owned by the UOL Group, which offers technology services, and provides state-of-the-art solutions, contributing to the digital transformation of its customers so they can become leaders in its sectors of activity. Compasso UOL is a company that cultivates people's talent and use state-of-the-art technologies such as Agile Development, Multicloud, Data&Analytics, Cyber Security, Artificial Intelligence/Machine Learning, APIs/Microservices, IoT and others, providing technology and knowledge that help customers in building digital solutions that enable the transformation and evolution of their business.

Company type: XLarge

Founded: 2018

Company size: 5001 - 10000

LinkedIn See all jobs →

Job description

JOB DESCRIPTION

RESPONSIBILITIES AND ASSIGNMENTS

Operar, otimizar e evoluir o runtime de agentes e a infraestrutura de inferência de LLMs em produção;
Definir e implementar arquitetura de endpoints de modelo com foco em SLOs de latência e disponibilidade;
Projetar e manter pipelines completos de observabilidade: métricas, logs estruturados, traces distribuídos e alertas inteligentes;
Conduzir otimizações avançadas de performance: batching dinâmico, caching semântico, quantização e gestão de contexto;
Liderar resposta a incidentes e análises de causa raiz em falhas do ambiente de inferência;
Definir padrões de resiliência e estratégias de failover para workloads de LLM em produção;
Produzir runbooks, playbooks e documentação operacional de referência para o ambiente;

REQUIREMENTS AND QUALIFICATIONS

Habilidades necessárias:

Expertise em operação de modelos de linguagem em produção com foco em performance e disponibilidade;
Domínio de frameworks de LLM serving em escala: vLLM, TGI (Text Generation Inference), Triton Inference Server ou equivalentes;
Experiência avançada com Kubernetes e gerenciamento de workloads de inferência com aceleradores;
Expertise em observabilidade em ambientes complexos: Prometheus, Grafana, OpenTelemetry e correlação de sinais;
Profundo conhecimento de AWS e seus serviços de ML (SageMaker Endpoints, Bedrock, EKS);

Habilidades desejáveis:

Experiência com otimização avançada de modelos: quantização (GPTQ, AWQ), distilação e compilação para inferência;
Conhecimento prático de GPUs e aceleradores (NVIDIA A100/H100) em contextos de produção;
Experiência com caching semântico e estratégias avançadas de gestão de contexto para LLMs;
Histórico de atuação em SRE ou engenharia de plataforma em ambientes de missão crítica;
Experiência com arquiteturas multi-região e estratégias de disaster recovery para workloads de IA;

Become a Compasser, be part of AI/R.

Compass UOL is a global firm and part of the AI Revolution Company, together transforming organizations using Artificial Intelligence, Generative AI, and other of today’s most advanced technologies.

We equip our team with proprietary and external AI-driven tools to design and build digital-native platforms, integrating cutting-edge technologies and enabling companies to innovate, transform their businesses, and drive success in their markets.

To achieve this, we attract and develop the best talent, creating opportunities that enhance people’s lives and highlight the positive impact of disruptive technologies.

We empower borderless talent and promote knowledge and opportunities in the latest market trends, driving significant personal and professional growth.

Join us and be part of the AI-driven revolution.

Ready to apply?

APPLY

Share ·