The Chan Zuckerberg Initiative was founded by Priscilla Chan and Mark Zuckerberg in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education to addressing the needs of our local communities. Our mission is to build a more inclusive, just, and healthy future for everyone.
The Team
Our Central Tech team provides technology and security support for CZI and our grantees. We believe that Engineering, IT and Security are most effective when in sync and learning from each other on a daily basis. Across our three pillars of Infrastructure, Security, and Grantee & Partner Support, we enable our teams to achieve their goals faster and more securely. We leverage technology to automate manual processes, constantly innovate to optimize operations, provide first-class support, and build solutions to enable the scale and execution of our business partners' strategies and initiatives.
The Opportunity
We are seeking a highly skilled and strategic High Performance Computing (HPC) Cluster and Cloud Infrastructure Technical Program Manager (TPM) to oversee the management of both on-premises and cloud-based high-performance computing (HPC) clusters across multiple entities within the company. The ideal candidate will be responsible for ensuring optimal utilization of compute resources, planning for future capacity needs, and programmatically managing the allocation of resources across diverse projects. This role requires excellent technical expertise, communication skills, and the ability to manage complex stakeholder relationships, including the capacity to influence senior leadership. This new role will report directly to the Vice President of Central Technology.
What You'll Do
- Manage and maintain HPC clusters across multiple platforms, ensuring peak operational efficiency.
- Capacity planning: Assess current and future compute needs, creating detailed plans and reports for resource allocation and scaling.
- Stakeholder management: Act as a strategic partner, facilitating onboarding of workloads, and aligning engineering, leadership, and customer teams in a dynamic environment.
- Project prioritization: Collaborate with a review committee to evaluate and prioritize capacity requests based on organizational impact.
- Metrics and reporting: Establish and report on operational and capacity metrics through regular cadence and forums.
- Process automation: Develop and optimize workflows to programmatically allocate resources, ensuring transparency and fairness.
- Leadership and communication: Communicate effectively with cross-functional teams and senior leadership, driving alignment and managing resource decisions.
What You'll Bring
- Proven experience managing both cloud and on-prem HPC clusters
- Strong knowledge of compute infrastructure, across on-premises and cloud-based systems
- Expertise in capacity planning and resource management for high-demand computing environments
- Excellent communication skills, with experience in collaborating across multiple departments and levels of an organization
- Ability to handle difficult conversations and manage expectations with senior leadership
- Familiarity with automation tools and processes for resource allocation in HPC environments
- Strong problem-solving skills and attention to detail
Preferred Qualifications:
- Experience working with GPU clusters
- Familiarity with AI/ML workloads and the computational needs associated with them
- Demonstrated ability to develop scalable processes in a multi-tenant environment
Compensation
The Redwood City, CA base pay range for this role is $200,000 - $300,000. New hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. Actual placement in range is based on job-related skills and experience, as evaluated throughout the interview process. Pay ranges outside Redwood City are adjusted based on cost of labor in each respective geographical market. Your recruiter can share more about the specific pay range for your location during the hiring process.
Benefits for the Whole You
We’re thankful to have an incredible team behind our work. To honor their commitment, we offer a wide range of benefits to support the people who make all we do possible.
- CZI provides a generous employer match on employee 401(k) contributions to support planning for the future.
- Annual benefit for employees that can be used most meaningfully for them and their families, such as housing, student loan repayment, childcare, commuter costs, or other life needs.
- CZI Life of Service Gifts are awarded to employees to “live the mission” and support the causes closest to them.
- Paid time off to volunteer at an organization of your choice.
- Funding for select family-forming benefits.
- Relocation support for employees who need assistance moving to the Bay Area
- And more!
Commitment to Diversity
We believe that the strongest teams and best thinking are defined by the diversity of voices at the table. We are committed to fair treatment and equal access to opportunity for all CZI team members and to maintaining a workplace where everyone feels welcomed, respected, supported, and valued. Learn about our diversity, equity, and inclusion efforts.
If you’re interested in a role but your previous experience doesn’t perfectly align with each qualification in the job description, we still encourage you to apply as you may be the perfect fit for this or another role.
Explore our work modes, benefits, and interview process at www.chanzuckerberg.com/careers.
#LI-Hybrid