Data quality and diversity is the foundation for training the best agents in any domain. As a member of the Data Team at Reflection, you will play a pivotal role in shaping how we collect and analyze human, synthetic, and internet data. This is an interdisciplinary role that primarily requires engineering, research, and communication skills, along with a sharp attention to detail and willingness to “roll up your sleeves” and look at the data.
Key Responsibilities
Experiment and Benchmark Design
Develop techniques for collecting, augmenting, filtering, or synthesizing training and evaluation data using creativity and analytical thinking
Design experiments, in collaboration with machine learning researchers, to assess the impact of different datasets on model performance
When required, manage human annotators working on data collection efforts – this could include tracking payments and hours, training annotators, and providing technical support, feedback, and quality control
Qualitative and Quantitative Data Analysis
Analyze collected data, e.g. coding tasks, both qualitatively and quantitatively
Evaluate model behavior to identify its strengths and weaknesses
Clearly communicate findings with machine learning research and product teams
Data Engineering
Design, implement, and optimize scalable data pipelines to support reinforcement learning and supervised finetuning
Leverage LLMs to perform data filtering, cleaning, and augmentation
Qualifications
Software engineering background with experience building data processing pipelines at scale, particularly with LLM integration
Proficiency in Python or other programming languages (Go, TypeScript, etc.)
Detail-oriented and analytical, with the ability to conduct careful qualitative and quantitative data analysis.
Excellent organizational and communication skills to collaborate closely with cross-functional teams and manage human data operations
Experience with machine learning, reinforcement learning, and LLMs is a plus, but not strictly required.
What We Offer
The opportunity to work at the forefront of AI research and data collection for training cutting-edge models.
Collaboration with a team of world-class researchers and engineers from top AI labs and companies.
Competitive compensation and benefits, with opportunities for professional growth.
\nThis offer from \"ReflectionAI\" has been enriched by Jobgether.com and got a 65% flex score.","identifier":{"@type":"PropertyValue","name":"ReflectionAI","value":"68335931c91e0f6bda54cb0b"},"hiringOrganization":{"@type":"Organization","name":"ReflectionAI","sameAs":"https://www.reflection.ai/","logo":"https://cdn-s3.jobgether.com/reflectionai%2Fprofile.webp"},"datePosted":"2025-05-25T18:40:57.091Z","employmentType":["FULL_TIME"],"applicantLocationRequirements":[{"@type":"Country","name":"US"}],"jobLocation":[{"@type":"Place","address":{"@type":"PostalAddress","addressLocality":"San Francisco","addressCountry":"US"}}],"validThrough":"2026-05-20T18:57:25.449Z"}
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Reflection was founded by former DeepMind and OpenAI researchers to build superintelligent coding agents. We previously built the most powerful LLM (ChatGPT, Gemini) and agent (AlphaGo, AlphaZero) systems in the world.
Reflection’s mission is to build superhuman coding agents. Today’s language models are powerful, but they fall short when it comes to tasks that require acting over many steps. The reason is simple. These models were never trained for autonomy. Our goal is to create the most capable and reliable coding agents in the world. Our product is a Coding Agent API that helps automate rote engineering work.
Data quality and diversity is the foundation for training the best agents in any domain. As a member of the Data Team at Reflection, you will play a pivotal role in shaping how we collect and analyze human, synthetic, and internet data. This is an interdisciplinary role that primarily requires engineering, research, and communication skills, along with a sharp attention to detail and willingness to “roll up your sleeves” and look at the data.
Key Responsibilities
Experiment and Benchmark Design
Develop techniques for collecting, augmenting, filtering, or synthesizing training and evaluation data using creativity and analytical thinking
Design experiments, in collaboration with machine learning researchers, to assess the impact of different datasets on model performance
When required, manage human annotators working on data collection efforts – this could include tracking payments and hours, training annotators, and providing technical support, feedback, and quality control
Qualitative and Quantitative Data Analysis
Analyze collected data, e.g. coding tasks, both qualitatively and quantitatively
Evaluate model behavior to identify its strengths and weaknesses
Clearly communicate findings with machine learning research and product teams
Data Engineering
Design, implement, and optimize scalable data pipelines to support reinforcement learning and supervised finetuning
Leverage LLMs to perform data filtering, cleaning, and augmentation
Qualifications
Software engineering background with experience building data processing pipelines at scale, particularly with LLM integration
Proficiency in Python or other programming languages (Go, TypeScript, etc.)
Detail-oriented and analytical, with the ability to conduct careful qualitative and quantitative data analysis.
Excellent organizational and communication skills to collaborate closely with cross-functional teams and manage human data operations
Experience with machine learning, reinforcement learning, and LLMs is a plus, but not strictly required.
What We Offer
The opportunity to work at the forefront of AI research and data collection for training cutting-edge models.
Collaboration with a team of world-class researchers and engineers from top AI labs and companies.
Competitive compensation and benefits, with opportunities for professional growth.