At Epiq, your work contributes to complex, global legal outcomes. You’ll join a values‑driven community where integrity guides decisions, relentless service sets the bar, and we thrive on big challenges together. We invest in your growth with enterprise‑wide learning and mobility. We celebrate who you are, and we respect life beyond work with flexibility that’s recognized externally. Enabled by modern platforms and AI, you’ll do the most meaningful work of your career and see your impact at scale.
Job Description:
Job Summary:
Responsible for overseeing and driving the development and success of a product throughout its lifecycle. This role will act as the bridge between
Prepares and manages the data that the Copilot relies on. The Data Quality Engineer’s mission is to ensure the AI always has access to accurate and up-to-date information. They build pipelines to collect and update the knowledge base (documents, FAQs, databases) that the AI uses, and enforce data quality standards so the AI’s answers are based on solid data. This role closely collaborates with the LLM Strategist to provide training/evaluation datasets, and with the Solutions Engineer to integrate these data pipelines into the overall system.
Key Responsibilities:
- Data Pipeline Development: Create and maintain ETL (Extract-Transform-Load) processes that gather data from various enterprise sources into the Copilot’s knowledge repository. For example, develop a pipeline to extract policy documents from SharePoint or a document management system, transform or index them (perhaps splitting into chunks, encoding as vectors or populating a search index), and load them into a format the AI can use for retrieval. Use tools like Azure Data Factory, Logic Apps, or custom Python scripts to schedule regular updates (e.g., sync new or edited documents nightly).
- Data Integration & Indexing: Implement the data storage/indexing solutions that the AI will query at runtime. This could involve setting up an Azure Cognitive Search index or a vector database (for semantic search of text) and feeding it with processed data. Ensure that for each type of data (policies, past Q&As, regulations), the relevant fields (metadata, embeddings, etc.) are properly stored for efficient retrieval. Work with the Solutions Engineer to connect these stores to the AI application (e.g., via APIs or SDKs).
- Quality Assurance & Cleansing: Establish data quality checks at each step of the pipeline. Deduplicate records, ensure consistent formatting (e.g., all dates in a standard format, text is cleaned of strange characters), and filter out irrelevant content. If integrating data from multiple sources, resolve conflicts or overlaps (e.g., if two sources have a definition for a term, determine which one is authoritative or how to consolidate them). Use techniques like sampling and validation scripts to verify that the data loaded is correct and complete (for instance, compare record counts or hash sums to make sure nothing was missed).
- Data Update & Monitoring: Monitor the freshness of data. Set up alerts or reports for data pipeline failures so they can be fixed before users notice stale info. For example, if a nightly update fails and some new documents weren’t indexed, have a way to catch that (via logs or a monitoring dashboard) and rerun the job. Also, design pipelines to be idempotent and recoverable – e.g., if a run is interrupted, it can pick up or safely restart. Coordinate with content owners in the company for any major data changes (if a new data source is added or an old one decommissioned, adjust pipelines accordingly).
- Support Model Training Data Needs: When the LLM Strategist needs curated datasets for fine-tuning or testing the AI, assist in assembling those. For instance, extract historical customer questions and answers from a database to create a training file, or gather a set of paragraphs labeled as relevant/irrelevant to train a classifier. Ensure any data used for model training is cleaned and formatted to the requirements of the ML process. Keep versioned copies of these datasets as they may be needed for future reference or re-training.
- Data Governance & Security: Handle all data in accordance with company policies and regulatory requirements. Ensure sensitive data is protected (e.g., if certain documents are confidential, ensure access controls are in place and that the AI either doesn’t index them or is restricted from exposing them in answers). Work with IT/security on any data handling reviews. Maintain documentation of data sources, data flow diagrams, and data dictionaries so it’s clear where information is coming from and how it’s transformed – this transparency aids compliance checks and team understanding.
- Collaboration:
- With LLM Strategist: Share insights about data coverage and limitations. For example, inform them if certain topics have very few documents, which might affect the AI’s knowledge. Get requirements for what training data is needed and deliver accordingly. Regularly discuss how the AI is performing and whether adjusting data inputs could help (e.g., adding a new data source to cover questions the AI couldn’t answer).
- With Azure AI Solutions Engineer: Work together to integrate the data layer with the application. Ensure APIs or services are in place so the app can query the search index or database you maintain. If the Solutions Engineer finds performance issues (like search queries slow), collaborate on optimizing the data store (maybe adding indexes or denormalizing data). Also, coordinate on deployment – your data pipeline jobs might run in Azure as well (Databricks, Functions, etc.), so ensure the ops pipeline includes those components with help from the AI Ops Engineer.
- With Domain Experts: Sometimes, understanding data quality requires context. You might work with a legal librarian or knowledge manager to verify that the most important sources are included. If automated quality checks flag anomalies (say an extremely large document or an outlier value), you might consult a domain expert to decide if it’s an error or an acceptable exception.
- With UX/Change Designer: Indirectly, the UX person might relay user feedback that the AI gave an outdated answer. Investigate if that was due to stale data, then update the pipeline or schedule as needed. Communicate back once data is refreshed to validate that the issue is resolved.
Qualifications:
- Education & Experience: Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or a related field. 3+ years of experience in data engineering or business intelligence roles, specifically focused on building data pipelines, ETL processes, or data integrations.
- Data Pipeline Skills: Proficient in SQL and experienced in handling relational databases. Strong ability to write scripts or use ETL tools for moving and transforming data (experience with Python for data processing is highly desirable – libraries like pandas, PySpark, or SQLAlchemy could be in use). Familiar with workflow scheduling/orchestration (Azure Data Factory, Apache Airflow, or similar).
- Cloud & Tools: Experience with Azure data services such as Azure Data Factory, Azure Databricks/Synapse, Azure Storage (Blob/ADLS), and Azure SQL/NoSQL offerings. Familiar with search technologies (Azure Cognitive Search, ElasticSearch) or vector databases (like Pinecone, FAISS) as that might be part of the solution for enabling the AI to retrieve info. Comfortable working in a cloud environment, including deploying data processing jobs and monitoring them.
- Data Quality & Management: Keen attention to detail for data correctness. Knowledge of data profiling techniques to understand data distribution and detect issues. Experience implementing data validation rules and handling exceptions (for example, using checksums or referential integrity checks). Understanding of master data management concepts and data governance is a plus.
- Problem Solving: Able to debug why a pipeline failed or why data looks incorrect. This might involve tracing through logs, isolating a problematic data record, or diagnosing a performance problem. Solid troubleshooting skills both for code and for data issues (like figuring out if an apparent discrepancy is due to source data changes vs. a bug in transformation).
- Collaboration & Communication: Ability to document and explain data processes clearly. Comfortable collaborating with technical peers (engineers, ML scientists) as well as explaining data setups to less technical stakeholders in simple terms. Should be proactive in raising concerns if data issues could impact the AI’s performance (for instance, warning the team if a source system is unreliable or if data is too outdated).
- Adaptability: Open to using new tools and adapting to new data sources. The types of data the Copilot uses may evolve (today text documents, tomorrow maybe database records or emails), so a willingness to quickly learn the necessary technology or format is important.
- Security/Compliance Mindset: Prior experience or training in handling sensitive data is preferred, given the likely confidential nature of some content (legal documents, etc.). Aware of practices like data anonymization or encryption where appropriate.
- Preferred: Experience in enterprise search or knowledge management projects is a plus (since this role essentially curates a knowledge base for the AI). Also, familiarity with Azure AI Foundry is not essential for this role, but understanding how the data you prepare might be used in an AI platform context is helpful. For example, Foundry’s connections to Azure Cognitive Search or its dataset management could intersect with your duties, so being broadly aware of such platform features is beneficial.
The Compensation range for this role is 150,000 to 160,000 USD annually and may be eligible for an annual bonus.
Your specific salary will be determined based on several factors:
Location-based market rate for the role
Your abilities in relation to the job specification
Performance during screening and interview
Pay parity with the wider team in the considered location
Further details about the package will be provided during the initial screening call with the Talent Acquisition Team.
Click here to learn about Epiq's Benefits.
Epiq Leadership Compass
Fosters Relationships & Collaboration
Builds trust and alignment through open communication, shared goals, and strong partnerships to drive collective success.
Build trust-based partnerships
Nurture long-term relationships
Remove collaboration barriers
Celebrate cross-team success
Engages & Influences
Inspires action and alignment through clear communication, purposeful influence, and a compelling vision.
Use storytelling to build buy-in
Align communication with organizational goals
Guild alignment through strong engagement
Maximizes Performance
Sets and reinforces performance standards that drive results, ensure accountability, and align with Epiq’s goals.
Use data to identify improvement opportunities
Make informed decisions
Align team goals with boarder strategy
Empower teams to manage their own goals
Translate vision into clear priorities
Prepare for disruptions with strong change management
Achieves Operational Success
Drives continuous improvement and operational excellence through smart processes, data insights, and quality execution.
Improve workflows for team efficiency
Use clear documentation and expectations
Resolve issues quickly using data and feedback
It is Epiq’s policy to comply with all applicable equal employment opportunity laws by making all employment decisions without unlawful regard or consideration of any individual’s race, religion, ethnicity, color, sex, sexual orientation, gender identity or expressions, transgender status, sexual and other reproductive health decisions, marital status, age, national origin, genetic information, ancestry, citizenship, physical or mental disability, veteran or family status or any other basis protected by applicable national, federal, state, provincial or local law. Epiq’s policy prohibits unlawful discrimination based on any of these impermissible bases, as well as any bases or grounds protected by applicable law in each jurisdiction. In addition Epiq will take affirmative action for minorities, women, covered veterans and individuals with disabilities. If you need assistance or an accommodation during the application process because of a disability, it is available upon request. Epiq is pleased to provide such assistance and no applicant will be penalized as a result of such a request. Pursuant to relevant law, where applicable, Epiq will consider for employment qualified applicants with arrest and conviction records.