Match score not available

ML Data Engineer (Feature Pipeline & ETL)

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Minimum 4 years in data engineering, Expertise in Databricks and medallion architecture, Familiarity with MLflow for model training, Advanced skills in Apache Spark.

Key responsabilities:

  • Develop end-to-end ML feature engineering pipelines
  • Support ML model training and experiment tracking
McAfee logo
McAfee Computer Hardware & Networking XLarge https://www.mcafee.com/
5001 - 10000 Employees
See more McAfee offers

Job description

Role Overview:

McAfee is seeking a skilled ML Data Engineer to join our Consumer ML team, specializing in creating robust feature engineering ETL pipelines tailored for machine learning applications. This role requires hands-on experience with Databricks, a solid understanding of the medallion architecture, and expertise in developing, deploying, and managing scalable data pipelines for low-latency model serving.

The ideal candidate will also have experience supporting the end-to-end ML lifecycle, including model training and experiment tracking, with MLflow experience as a strong asset. As part of our AI and Machine Learning team, you will be instrumental in enabling advanced analytics and delivering personalized user experiences.

This is a remote position based in Canada. We will only consider candidates in Canada and are not offering relocation assistance at this time.

About the role:

  • Feature Engineering & Data Integration: Develop and maintain end-to-end ML feature engineering pipelines using Databricks, ensuring data is consistently structured to support ML models effectively.
  • Pipeline Development & Management: Integrate diverse data sources (clickstreams, user behaviour, demographic data, etc.) and tailor data integration processes to optimize data quality and performance.
  • Medallion Architecture Expertise: Build ETL/ELT pipelines that follow the bronze, silver, and gold layers of the medallion architecture, ensuring efficient data structuring for ML workflows.
  • Model Training & Experiment Tracking: Support ML model training and calibration through optimized data pipelines, using MLflow for experiment tracking, model versioning, and performance monitoring.
  • Query Optimization & Low Latency Pipelines: Design and implement optimized queries and low-latency data pipelines to support real-time and batch model inference in production.
  • CI/CD & Deployment: Apply CI/CD best practices to ensure smooth and efficient pipeline deployments, with automated testing for consistent performance.
  • Data Governance & Compliance: Ensure pipelines meet security and compliance standards, particularly for PII, and manage metadata and master data across the data catalogue.
  • Collaboration: Work closely with data scientists, data stewards, and other teams to align data ingestion and transformation efforts with business requirements.

About you:

  • Experience: Minimum 4 years in data engineering, focusing on ML feature engineering, ETL pipeline development, and data preparation for machine learning.
  • Databricks & Medallion Architecture: Proven expertise in managing ETL/ELT pipelines on Databricks, with a solid understanding of the medallion architecture.
  • ML Lifecycle & MLflow: Familiarity with the ML lifecycle and experience using MLflow for model training, calibration, and experiment tracking is highly desirable.
  • Spark & Big Data Technologies: Advanced skills in Apache Spark for big data processing and analytics.
  • Programming & Querying: Strong skills in Python for data manipulation, SQL for query optimization, and performance tuning.
  • Low Latency Data Pipelines: Experience in building and optimizing pipelines for low-latency model inference and serving in production environments.
  • CI/CD & System Integration: Familiarity with continuous integration and deployment practices for ETL/ELT pipeline development.
  • Data Pipeline Management: Expertise in managing data pipelines, ensuring adherence to security, compliance, and best practices.
  • Metadata & Master Data Management: Competency in managing metadata and master data within a technical data catalogue
  • You are a detail-oriented ML Data Engineer passionate about building scalable, efficient data pipelines tailored for machine learning.
  • You thrive in a collaborative environment, working effectively with cross-functional teams to drive data-driven insights and personalized solutions.
  • You are proactive in troubleshooting, monitoring, and optimizing data pipelines to support high-performance ML models in production.

#LI-Remote



Company Overview

McAfee is a leader in personal security for consumers. Focused on protecting people, not just devices, McAfee consumer solutions adapt to users’ needs in an always online world, empowering them to live securely through integrated, intuitive solutions that protects their families and communities with the right security at the right moment.

Company Benefits and Perks:

We work hard to embrace diversity and inclusion and encourage everyone at McAfee to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.

  • Bonus Program
  • Pension and Retirement Plans
  • Medical, Dental and Vision Coverage
  • Paid Time Off
  • Paid Parental Leave
  • Support for Community Involvement

We're serious about our commitment to diversity which is why McAfee prohibits discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

Required profile

Experience

Industry :
Computer Hardware & Networking
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Troubleshooting (Problem Solving)

Data Engineer Related jobs