Offer summary
Qualifications:
Proven experience with NVIDIA Triton Inference Server, Strong understanding of LLM techniques, Familiarity with Retrieval-Augmented Generation frameworks, Proficient in Python and ML frameworks (e.g., PyTorch, TensorFlow), Strong problem-solving skills.
Key responsabilities:
- Develop and maintain APIs using NVIDIA Triton for LLMs
- Implement and optimize processing pipelines for LLMs
- Work with RAG frameworks for model enhancement
- Collaborate to deploy ML solutions into production
- Troubleshoot issues related to model performance and scalability