Offer summary

Qualifications:

Proficiency in Python programming., Experience with distributed data frameworks like Ray or Spark., Track record of designing and managing large-scale data pipelines., Knowledge of synthetic data pipelines and lakehouse paradigms..

Key responsibilities:

Design and implement data sourcing, synthetic generation, and curation pipelines.

Build high-throughput data pipelines for ingesting, generating, and filtering multi-modal data.

Collaborate with ML researchers to develop foundation models.

Ensure data quality, relevance, and integrity at petabyte scale.

Job description

Kaiko’s Multimodal Large Language Model (MLLM) is trained on domainspecific, highcomplexity medical data. To reach clinicalgrade performance, we’ll need to ramp up our data efforts to manage massive scale, ensure consistent quality, and tightly control data relevance and integrity.

As a Senior Research Data Engineer, you will design and implement our data‑sourcing, synthetic‑generation, and curation pipelines. High‑quality datasets are the fuel for frontier‑scale language models, and you will play a pivotal role in producing them.

You will build high‑throughput data pipelines that:

Ingest multi‑modal data at petabyte scale.

Generate large volumes of synthetic data.

Filter & rate content by topic, quality, and policy compliance.

You will work closely with ML researchers and help steer the development of our state‑of‑the‑art foundation models. You will be based in Zurich or Amsterdam, with the expectation of spending half of your time at the office.

Profile

Excellent programming skills in Python and deep experience with distributed frameworks such as Ray or Spark.

Proven track record designing & operating large‑scale data pipelines and running data‑quality experiments.

Experience building or integrating synthetic‑data pipelines for LLMs.

Deep familiarity with lakehouse paradigms (Delta, Iceberg) and columnar formats (Parquet, ORC).

Experience with core data‑processing primitives (hashing, deduplication, chunking etc.) and associated scalabilityperformance trade‑offs.

Strong communication skills and the ability to present experimental results and technical concepts clearly and concisely.

Nice To Have:
Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines.

Expertise in data‑quality & validation frameworks and monitoringobservability tooling.

Required profile

Are you interested?

Data Engineer Related jobs

Data Operations Engineer

Today

Inclusion Cloud

Full time

ETL (Extract Transform Load)SQL (Programming Language)Apache SparkPython (Programming Language)

Principal Data Engineer

Today

Sleek

Full time

BigQuerygRPCETL (Extract Transform Load)Data Modeling

Data Engineer - Senior - Remote -+3hours from Berlin

Today

10Folders

Full time

Google Cloud Platform (GCP)Data EngineeringSQL (Programming Language)Python (Programming Language)

Data Architect Engineer

Today

ALDIA

Full time

Cloud ComputingSQL (Programming Language)ETL (Extract Transform Load)Python (Programming Language)

[Job 24194] MidLevel Data Developer (Afirmativa para Mulheres, Pessoas com Deficiência, Pessoas Pretas, LGBTQIAPN+), Brazil

1 day ago

Ci&T

Full time

Data ModelingAWS Cloud ServicesSQL (Programming Language)ETL (Extract Transform Load)

See more Data Engineer jobs

Senior Research Data Engineer

Offer summary

Qualifications:

Key responsibilities:

Job description

Profile

Nice To Have:
Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines.

Expertise in data‑quality & validation frameworks and monitoringobservability tooling.

Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines.

Expertise in data‑quality & validation frameworks and monitoringobservability tooling.

Required profile

Experience

Hard Skills

Other Skills

Data Engineer Related jobs

Data Operations Engineer

Principal Data Engineer

Data Engineer - Senior - Remote -+3hours from Berlin

Data Architect Engineer

[Job 24194] MidLevel Data Developer (Afirmativa para Mulheres, Pessoas com Deficiência, Pessoas Pretas, LGBTQIAPN+), Brazil

Senior Research Data Engineer

Offer summary

Qualifications:

Key responsibilities:

Job description

Profile

Nice To Have: Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines. Expertise in data‑quality & validation frameworks and monitoringobservability tooling.

Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines. Expertise in data‑quality & validation frameworks and monitoringobservability tooling.

Required profile

Experience

Hard Skills

Other Skills

Data Engineer Related jobs

Data Operations Engineer

Principal Data Engineer

Data Engineer - Senior - Remote -+3hours from Berlin

Data Architect Engineer

[Job 24194] MidLevel Data Developer (Afirmativa para Mulheres, Pessoas com Deficiência, Pessoas Pretas, LGBTQIAPN+), Brazil

Nice To Have:
Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines.

Expertise in data‑quality & validation frameworks and monitoringobservability tooling.

Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines.

Expertise in data‑quality & validation frameworks and monitoringobservability tooling.