Offer summary
Qualifications:
5+ years of experience in cloud infrastructure, Proficient in Python, Bash, Terraform, Kubernetes, Experience with distributed ML training jobs is ideal, Ability to manage compute clusters running 1,000+ GPUs, Hands-on experience with datacenter management is a plus.
Key responsabilities:
- Lead multi-cloud compute infrastructure team
- Develop monitoring, resource allocation, and deployment automation
- Improve internal job scheduling system
- Increase execution throughput and reliability of compute utilization
- Support computational chemistry and drug discovery efforts