Kaiko’s Multimodal Large Language Model (MLLM) is trained on domainspecific, highcomplexity medical data. To reach clinicalgrade performance, we’ll need to ramp up our data efforts to manage massive scale, ensure consistent quality, and tightly control data relevance and integrity.
As a Senior Research Data Engineer, you will design and implement our data‑sourcing, synthetic‑generation, and curation pipelines. High‑quality datasets are the fuel for frontier‑scale language models, and you will play a pivotal role in producing them.
You will build high‑throughput data pipelines that:
You will work closely with ML researchers and help steer the development of our state‑of‑the‑art foundation models. You will be based in Zurich or Amsterdam, with the expectation of spending half of your time at the office.
Inclusion Cloud
Sleek
10Folders
ALDIA
Ci&T