Overview:
Vibrent is a high-performance organization committed to driving innovation and delivering exceptional value to clinical research, biotech, and academic medical centers. We thrive on a culture of resilience, continuous improvement, and collaboration. Our mission is to grow the business while delighting our customers, and we embrace a 'one team' mindset to achieve shared success.
As a Research Data Engineer at Vibrent Health, you will bridge the gap between research methodology and engineering implementation. You will ensure that the data collected via apps, wearable devices, EHR integrations, and other digital health sources are transformed into research-ready datasets. Your work will empower researchers and data scientists to derive meaningful insights that improve health outcomes at scale.
This role involves working with large, complex datasets, ensuring data quality and compliance with regulatory standards, and collaborating with cross-functional teams to derive actionable insights from healthcare data. You will be responsible for designing, implementing, and maintaining data pipelines and infrastructure to support our customers’ clinical and medical research initiatives. You will ensure smooth end to end processes for data collection/ingestion from all data collection sources, providing an output into a data lake that is fit for use by downstream end users.
To be successful in this role, you must have a strong understanding of end-to-end clinical data collection and extraction processes as well as strong project management and technical experience. You must be expert at end-to-end data extraction and transformations and able to construct data pipelines that conform to the common data model that ensures data ingestion for all research data capture technologies (e.g., EDC, IRT, ePRO, eCOA) as well other data models that may be required by our customers.
We will not consider any candidates who do not have specific experience with clinical or medical research data engineering.
Vibrent is a remote first organization. You will be equipped to work from your home office or other working location.
Responsibilities for this role will include, but not be limited to:
Data Management & Engineering:Design, maintain, and optimize data models to create robust, research-ready datasets.Build required infrastructure for optimal data extraction, transformation and loading of data using cloud technologies like AWS, Azure etc.Design, develop, and optimize data pipelines to ingest, process, and store clinical and medical research data from various sources (e.g., EHRs, clinical trials, wearable devices, genomic data).Ensure efficient ETL/ELT processes for transforming raw data into structured, analysis-ready datasets.Build and refine ETL processes using SQL and Python to transform raw health data into structured formats suitable for analysis.Develop and maintain data warehouses, databases, and data lakes tailored to research needs.Collaborate with engineering teams to ensure data pipelines are reliable, scalable, and performant.Provide technical leadership on various aspects of clinical data flow including assisting with the definition, build, and validation of application program interfaces (APIs), data streams, data staging to various systems for data extraction and integrationCollaboration & Support:Collaborate with clinical researchers, data scientists, and IT teams to understand data requirements and develop solutions that support their research goals.Coordinate with downstream users to ensure that outputs meet requirements of end users.Translate research questions into optimized queries, aggregations, and summaries that facilitate quick, accurate analysis.Provide technical support to research teams by enabling efficient data access and analysis.Work with regulatory and compliance teams to ensure adherence to industry regulations (e.g., HIPAA, GDPR, FDA 21 CFR Part 11).Work closely with engineers, product managers, and researchers to integrate research needs into the product roadmap.Participate in code reviews, agile sprints, and continuous improvement initiatives.Data Visualization & Insights:Develop custom dashboards, reports, and interactive visualizations (e.g., Tableau, Looker, or Python-based libraries) to empower stakeholders with real-time access to quality metrics and research findings.Develop and maintain data visualization best practices and standards to ensure consistency and quality across all reporting.Conduct regular evaluations of existing visualizations to ensure continued accuracy and identify areas for improvement.Data Visualization & Insights:Develop custom dashboards, reports, and interactive visualizations (e.g., Tableau, Looker, or Python-based libraries) to empower stakeholders with real-time access to quality metrics and research findings.Data Quality & Governance:Implement data validation, cleaning, and monitoring processes to ensure data integrity and accuracy.Manage and maintain pipelines and troubleshoot data in data lake or warehouse.Establish and enforce data governance policies, including metadata management and data lineage tracking.Ensure proper documentation of data workflows, schemas, and transformations.Create and maintain comprehensive data dictionaries, metadata standards, and codebooks to enhance data transparency and reproducibility.Conduct periodic data quality checks and audits to ensure compliance with research standards and regulatory requirements.Security & Compliance:Implement security best practices to safeguard sensitive patient data in compliance with industry regulations.Conduct periodic audits and risk assessments to identify potential vulnerabilities and data integrity issues.Technology & Tools:Utilize cloud-based platforms (AWS, Azure, GCP) to build scalable and reliable data infrastructure.Leverage programming languages and frameworks (Python, SQL, R, Spark) to develop efficient data solutions.Employ machine learning pipelines and AI-driven techniques to support advanced research initiatives.Required Education and Experience:Bachelor's or Master's degree in Computer Science, Data Engineering, Biomedical Informatics, or a related field.5+ years of experience in data engineering, preferably in healthcare, clinical research, or life sciences. Relevant experience will include:Building data pipelines to manage heterogenous data ingestions or similar in data integration across multiple sources including collected data.Proven track record of handling health/clinical datasets and supporting research analysis.Experience creating ELT and ETL to ingest data into data warehouse and data lakes.Experience visualizing large datasets with BI tools and other data visualization methods.Experience working with genomic data, imaging data, and wearable device data.Experience of data modeling, database design, and data governance.Experience deploying data pipelines in the cloud.Experience with unstructured data processing and transformation.Experience developing and maintaining data pipelines for large amounts of data efficiently.Required Skills and Knowledge:Knowledgeable of research processes and language in biological or medical fields and be able to effectively communicate and support researchers in these domains with technological and methodological expertise.Strong understating of end-to-end processes for data collection, extraction and analysis needs by end users in research.Strong ability to develop technical specifications based on communication from stakeholders.Knowledge of statistical analysis techniques and tools used in medical research.Expert level proficiency with Python/R; experienced in creating custom functions.Strong SQL and database design skills (PostgreSQL, MySQL, SQL Server, NoSQL databases).Utilizes GitLab, GitHubProficiency in data processing frameworks such as Apache Spark (databricks,) Hadoop, or cloud-based equivalents.Strong proficiency in utilizing cloud platforms (AWS, Azure, or GCP) and relevant services (Redshift, BigQuery, Snowflake) including for setting up and working with data warehouse, data lakes.Must understand database concepts. Knowledge of XML, JSON, APIs.Knowledge of healthcare data standards (FHIR, HL7, CDISC, OMOP) and clinical terminologies (LOINC, SNOMED, ICD).Familiarity with compliance frameworks such as HIPAA, GDPR, or GxP.What you bring to the role:Excellent problem-solving skills and ability to work in a fast-paced research environment.Ability to work independently, take initiative and complete tasks to deadlines.Strong attention to detail, and organizational skillsQuick learner and comfortable asking questions, learning new technologies and systemsExcellent communication and teamwork skills, with the ability to collaborate effectively in an interdisciplinary environment.Passion for advancing digital health research.Preferred Qualifications:Ph.D. in a relevant field (e.g., Epidemiology, Public Health, Health Informatics, Biostatistics, Data Science)Certification in cloud platforms or healthcare data management (e.g., AWS Certified Data Analytics, Certified Health Data Analyst - CHDA).Familiarity with FAIR data principles (Findability, Accessibility, Interoperability, and Reusability).When you work at Vibrent, you are surrounded by a diverse group of people who share a passion for achieving excellence and making a lasting impact. Passion, excellence and impact should be rewarded, and we do that-- offering a competitive compensation package that includes over-average 401k match, the benefits you need to prioritize self and family care, and support for your further education and career development. One of the greatest benefits Vibrent offers is opportunity. At Vibrent, we work with the latest tools and technologies including enterprise mobile apps, fitness sensors, medical devices, cloud computing, machine learning, big data and analytics. We partner with national leaders in healthcare, technology, and research. We create tools that will change and improve healthcare for ourselves and future generations.
Vibrent is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, national origin, age, sexual orientation, gender identity, disability, or veteran status.
#LI-LJ1