Key Facts

Remote From:

United States

Category: Data Scientist

Full time

Senior (5-10 years)

English

Hard Skills

Graph Database Python (Programming Language) Pandas (Python Package) Hugging Face (NLP Framework) Data Quality Assessment Azure Cosmos DB Data Lineage Neo4j Azure OpenAI Scikit-Learn (Python Package) +19 more

Other Skills

•
Verbal Communication Skills
•
Analytical Thinking

Roles & Responsibilities

4–7 years of experience in data science, ML engineering, or applied data roles, with substantial time building data assets used by other models or applications.
Hands-on experience designing and operating vector stores for RAG or semantic search, including embedding generation, chunking, indexing, and retrieval evaluation.
Experience building or operating a feature store (e.g., Databricks Feature Store, Feast) with offline training, online serving patterns, and point-in-time correctness.
Strong proficiency in Python (pandas, NumPy, scikit-learn, PySpark) and SQL; comfortable working in Databricks notebooks and jobs.

Requirements:

Build and maintain vector stores for RAG: design embedding pipelines, chunking strategies, indexing approaches, and refresh patterns for retrieval-augmented generation across MeridianLink products.
Own the feature store: design, build, and operate feature store assets for model training and online/offline inference, including definitions, freshness SLAs, lineage, and point-in-time correctness.
Design graph data structures: build graph databases modeling relationships between applicants, applications, products, lenders, decisions, and outcomes; ensure they are queryable for AI use cases and analytics.
Lead data discovery: profile lending, deposit, and behavioral datasets to identify hidden trends, anomalies, and potential model drivers; translate findings into actionable hypotheses for product, risk, and growth teams.

Job description

Data Scientist, AI Data Foundations

About the Role

Reporting into the Data Engineering organization, the Data Scientist is responsible for designing and building the curated data structures that AI and ML applications consume across MeridianLink. You will own the vector stores behind our RAG systems, the feature store that powers model training and inference, and the graph databases that capture relationships across applicants, products, and decisions. You will also lead targeted data discovery work, surfacing hidden trends in our lending and account-opening data that inform both AI use cases and the broader business.

This is a hands-on, build-oriented role. You will not be primarily training large models — you will make sure the people training and serving models have high-quality, well-governed, well-engineered data to work with, and you will use your data science skills to validate that the data is fit for purpose.

What You Will Do

• Build and maintain vector stores for RAG: Design embedding pipelines, chunking strategies, indexing approaches, and refresh patterns for the vector stores powering retrieval-augmented generation across MeridianLink products.

• Own the feature store: Design, build, and operate feature store assets used for model training and online/offline inference, including feature definitions, freshness SLAs, lineage, point-in-time correctness, and reuse across teams.

• Design graph data structures: Build graph databases that model relationships between applicants, applications, products, lenders, decisions, and outcomes — and make them queryable for both AI use cases and analytical investigations.

• Lead data discovery: Profile our lending, deposit, and behavioral datasets to identify hidden trends, segments, anomalies, and potential model drivers; turn findings into actionable hypotheses for product, risk, and growth teams.

• Engineer for AI consumption: Build the curated, AI-ready datasets that downstream model builders, application engineers, and analysts rely on — with appropriate quality, documentation, and governance baked in.

• Evaluate retrieval and feature quality: Define and run evaluation frameworks for RAG retrieval quality, feature drift, embedding quality, and graph completeness; iterate based on what the metrics tell you.

• Partner with model builders: Work closely with ML engineers and applied scientists to make sure the data structures you build accelerate their work rather than slow it down.

• Champion responsible data use: Partner with governance, security, and compliance to ensure that AI-facing data assets respect data classification, customer consent, and regulatory boundaries from day one.

• Communicate findings: Translate discovery work into clear narratives — write-ups, notebooks, dashboards, and short presentations — that help non-technical stakeholders act on what the data is showing.

Required Qualifications

• 4–7 years of experience in a data science, ML engineering, or applied data role, with a meaningful portion of that time spent building data assets that other people's models or applications consumed.

• Hands-on experience designing and operating vector stores for RAG or semantic search, including embedding generation, chunking, indexing, and retrieval evaluation.

• Experience building or operating a feature store (e.g., Databricks Feature Store, Feast, or a custom internal platform), including offline training and online serving patterns and point-in-time correctness.

• Experience modeling and building graph data structures using Neo4j, TigerGraph, Azure Cosmos DB Gremlin, or similar graph databases — and writing graph queries to answer real questions.

• Strong proficiency in Python (pandas, NumPy, scikit-learn, PySpark) and SQL; comfortable working day-to-day in Databricks notebooks and jobs.

• Practical experience with embedding models and LLM tooling (e.g., Hugging Face transformers, OpenAI / Azure OpenAI APIs, LangChain or similar) in a production or near-production context.

• Demonstrated data discovery skills: profiling messy real-world datasets, surfacing non-obvious patterns, validating findings statistically, and explaining them clearly.

• Solid grounding in classical ML concepts — supervised vs. unsupervised learning, train/test discipline, leakage, evaluation metrics — even though you will not own model training day-to-day.

• Strong written and verbal communication skills; able to write up findings for both technical and business audiences.

Preferred Qualifications

• Experience working in a SaaS or FinTech environment, particularly with lending, deposit, credit, fraud, or KYC/AML data.

• Experience with Databricks-native AI/ML tooling: Databricks Vector Search, Databricks Feature Store, MLflow, and Unity Catalog.

• Familiarity with open-source vector databases such as pgvector, Pinecone, Weaviate, Chroma, or FAISS, and a clear point of view on when to use which.

• Experience with Microsoft Azure data and AI services (Azure OpenAI, Azure AI Search, ADLS Gen2).

• Experience evaluating RAG systems end-to-end (recall@k, faithfulness, answer quality, hallucination measurement).

• Exposure to graph algorithms (community detection, link prediction, centrality) applied to real business problems.

• Bachelor's or Master's degree in Computer Science, Statistics, Mathematics, Engineering, or a related quantitative field, or equivalent professional experience.

Our Data & AI Stack

• Lakehouse: Azure Databricks, Delta Lake, Unity Catalog, PySpark, SQL

• AI Data Foundations: Databricks Vector Search, Databricks Feature Store, MLflow

• Vector & Graph (current and exploratory): pgvector, Pinecone, Weaviate, FAISS; Neo4j, TigerGraph, Azure Cosmos DB (Gremlin)

• Cloud: Microsoft Azure (ADLS Gen2, Azure OpenAI, Azure AI Search, Event Hubs)

• AI Models and Agents: Databricks, AWS Bedrock, Azure ML

• Integration & Governance: Informatica Data Management Cloud (IDMC), Unity Catalog

Ready to apply?

APPLY

Share ·

Data Scientist Related jobs

United States Data Scientist

Pessoa Cientista de Dados III (Diretoria de Dados)

Today

Grupo Boticário

Full time

Machine LearningPython (Programming Language)SQL (Programming Language)Statistical AnalysisOperations Research

[Job-29597] Senior Data Scientist , Brazil

1 day ago

Ci&T

Full time

Data ScienceMachine LearningPython (Programming Language)Statistical ModelingMload

Senior Data Scientist, Crypto

Today

Wealthsimple

Full time

SQL (Programming Language)Python (Programming Language)Product KnowledgeData ScienceProduct Analytics

Senior Data Scientist II - Core Delivery

Today

Instacart

Full time

Data ScienceMachine LearningSQL (Programming Language)Python (Programming Language)Statistical Analysis

Data Scientist

Today

MWDN Ltd

Full time

Python (Programming Language)Machine LearningSQL (Programming Language)Data VisualizationArtificial Intelligence

Other jobs at MeridianLink

Senior Product Manager - AI Platform

5 days ago

MeridianLink

Full time
Senior (5-10 years)

Product ManagementData EngineeringToolkitsRisk ManagementFCAPS

Implementation Analyst

5 days ago

MeridianLink

Full time
Mid-level (2-5 years)

Program ImplementationProject ManagementMicrosoft Operating SystemsStakeholder Communications3rd Party Systems Integration

Software Engineer II

5 days ago

MeridianLink

Full time
Mid-level (2-5 years)

Software DevelopmentCloud ComputingDebuggingVersion ControlSoftware Maintenance

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.