Match score not available

Data Engineer, Gen AI

Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 
New York (USA), United States

Offer summary

Qualifications:

Bachelor's or Master's degree in Computer Science or related field, 3+ years of experience in data engineering roles, Strong programming skills in Python and SQL, Experience with Databricks, Apache Spark, Delta Lake, Familiarity with cloud platforms and data services.

Key responsabilities:

  • Design and build scalable data pipelines
  • Implement data preprocessing and feature engineering workflows
  • Develop and maintain data quality checks
  • Collaborate with ML engineers and data scientists
  • Optimize data infrastructure for model serving
Inizio Partners logo
Inizio Partners Small startup
2 - 10 Employees
See more Inizio Partners offers

Job description

Role: Data Engineer

We are seeking a skilled Data Engineer to support our organization's generative AI initiatives. In this role, you will be responsible for designing, implementing, and maintaining the data infrastructure and pipelines necessary to enable large-scale generative AI model deployment and operation.

Key Responsibilities:

  • Design and build scalable data pipelines to ingest, process, and store large volumes of training data for generative AI models Implement data preprocessing and feature engineering workflows using Spark and Delta Lake to prepare data for model training and inference.
  • Develop and maintain data quality checks and monitoring systems to ensure data integrity.
  • Design and implement efficient data storage solutions using Delta Lake and Databricks SQL warehouses to support AI workloads.
  • Collaborate with ML engineers and data scientists to optimize data flows for model training and serving.
  • Implement data governance and security best practices for sensitive training data.
  • Optimize data infrastructure for high-performance AI model serving in production environments.
  • Troubleshoot data-related issues in AI pipelines and implement solutions


Requirements:

  • Bachelor's or Master's degree in Computer Science, Data Science, or related field 3+ years of experience in data engineering roles
  • Strong programming skills in Python and SQL Experience with Databricks, Apache Spark, and Delta Lake
  • Familiarity with cloud platforms (AWS, GCP, or Azure) and their data services Knowledge of data modeling, ETL processes, and data pipeline architectures
  • Experience with version control systems (e.g. Git) and CI/CD practices
  • Understanding of data privacy and security considerations


Preferred Qualifications:

  • Experience supporting machine learning or AI projects in production environments
  • Familiarity with containerization and orchestration tools like Docker and Kubernetes
  • Knowledge of streaming data technologies like Kafka or Kinesis
  • Experience with MLOps practices and tools Understanding of large language models and generative AI architectures


Skills:

  • Experience with Delta Live Tables for building reliable, maintainable data pipelines
  • Familiarity with Databricks SQL for querying and analyzing large datasets
  • Knowledge of Unity Catalog for data governance and access control
  • Strong problem-solving and analytical skills
  • Excellent communication and collaboration abilities
  • Ability to work in a fast-paced, dynamic environment Passion for staying up to date with the latest developments in AI and data technologies

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Adaptability
  • Problem Solving
  • Analytical Skills
  • Verbal Communication Skills
  • Organizational Skills

Data Engineer Related jobs