Offer summary
Qualifications:
Bachelor's degree in Computer Science or related field, 9+ years in distributed computing and ML systems, 6+ years developing AI/ML algorithms, 3+ years with AI/ML frameworks in public cloud environments, Expertise in HPC and large-scale ML systems.
Key responsabilities:
- Design resilient infrastructure for training tasks
- Develop serving infrastructure for ML models
- Deploy a thousand-node training cluster with optimal resources
- Create benchmarks to evaluate AI system performance
- Innovate applications using large language models