Overview:
This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.
"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC
“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA
DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.
Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.
Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.
Job Description:
We are seeking an experienced and accomplished Sr. Manager Data Engineering to lead our team. In this role, you will oversee the design, deployment, and optimization of large-scale AI/ML training pipelines using Infinia as foundational data storage as well as the development of connectors to main open-source frameworks for data analysis at scale, such as Apache Spark and Trino. You will guide a talented organization of engineers focused on advanced end-to-end storage platform for data ingestion, transformation on high-performance AI applications. Collaborating closely with software developers, product teams, and partners, you will lead experiments with state-of-the-art models using open-source tools and cloud platforms.
Key Responsibilities:
Leadership & Management:
- Lead, mentor, and grow a team of senior ML and data engineers, fostering a culture of innovation and excellence.
- Track, report, and manage the team’s performance against project milestones, ensuring on-time delivery of high-quality solutions.
- Partner with architects, engineers, and cross-functional teams to ensure the delivery of innovative, high-quality technical designs.
- Implement and refine engineering best practices, driving continuous improvements in quality, performance, and operational efficiency.
Technical Oversight:
- Oversee the design and deployment of large-scale AI/ML training pipelines utilizing tools like Apache Spark and Apache Airflow.
- Guide the integration of MLflow with DDN’s Infinia product for comprehensive experiment tracking, model versioning, and deployment.
- Lead the integration of data ingestion and streaming pipelines open-source tools, like Delta Lake and Apache Iceberg.
- Stay abreast of the latest developments in MLOps, AI/ML frameworks, and tooling.
- Identify and implement solutions to optimize pipeline performance, runtime, and resource utilization on Infinia.
Required Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or a related field.
- 8+ years of experience in machine learning engineering, with at least 4 years in a leadership role.
- Proven track record of building and scaling AI/ML pipelines and managing high-performing engineering teams.
- Extensive experience with Apache Spark, Apache Airflow, and MLflow or equivalent tools.
- Deep understanding of machine learning frameworks and libraries (TensorFlow, PyTorch, NVIDIA NeMo).
- Proficiency with containerization tools (Docker, Kubernetes) and infrastructure as code (Terraform, Ansible).
- Solid understanding of cloud infrastructure (AWS, GCP, Azure) and distributed computing.
- Excellent problem-solving and troubleshooting abilities with a keen eye for performance optimization.
- Strong leadership, communication, and interpersonal skills.
- Ability to drive strategic initiatives and manage multiple projects simultaneously.
Preferred Skills:
- Experience with large-scale data processing and storage solutions (Hadoop, Hive, HDFS, Trino).
- Knowledge of NLP techniques and tools for model deployment.
- Experience with scaling RAG pipelines and integrating them with generative AI models.
- Experience in operationalizing AI/ML models in production environments.
This role offers an exceptional opportunity to lead a high-impact engineering organization at the core of DDN’s cutting-edge storage solutions. If you are passionate about solving complex technical challenges and driving innovation in high-performance systems, we encourage you to apply.
DDN:
Our team is highly motivated and focused on engineering excellence. We look for individuals who appreciate challenging themselves and thrive on curiosity. Engineers are encouraged to work across multiple areas of the company. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers and researchers are expected to have strong communication skills.
Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:
- Coding assessment: Often in a language of your choice.
- Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
- Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
- Meet and greet with the wider team.
- Our goal is to finish the main process in 2-3 weeks at most.
DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
#LI-Remote