Job description

LOOKING FOR FILIPINO CITIZENS CURRENTLY RESIDING IN THE PHILIPPINES ONLY!

Position: Data Engineer

Work Hours: 4 hour overlap CST (40 hours/week, M-F)

Type of Contract: Independent Contractor

We bring deep knowledge of the clinical trial landscape together with cutting-edge AI, offering unprecedented agility to research.

Scope of the Role:

As a Data Engineer, you will be responsible for designing, developing, and maintaining robust data ingestion pipelines to support our AI-driven clinical trial solutions. You will work with modern data engineering tools such as dbt, dlt, Apache Airflow, Docker, and Kubernetes, ensuring scalable and reliable data workflows. Additionally, you will collaborate with cross-functional teams and integrate data engineering solutions with machine learning models, contributing to the enhancement of patient recruiting efficiency and precision.

Duties and Responsibilities

Design, develop, and maintain robust data ingestion pipelines.
Ensure data pipelines are scalable, reliable, and efficient.
Monitor and optimize the performance of data workflows.
Work with dbt for data transformation and modeling.
Use Apache Airflow or Cloud Composer for orchestrating workflows.
Implement containerization with Docker and orchestration with Kubernetes.
Manage code via GitHub and deploy solutions on Google Cloud Platform (GCP).
Implement Continuous Integration/Continuous Deployment (CI/CD) practices.
Utilize Infrastructure-as-Code (IaC) tools to manage and provision infrastructure.
Collaborate with cross-functional teams including data scientists, software engineers, and clinical experts.
Integrate data engineering solutions with machine learning models, LLMs, and NLP frameworks.
Stay updated with the latest trends and best practices in data engineering.
Propose and implement improvements to existing data pipelines and infrastructure.

Requirements

3-5 years of production experience in data engineering roles (open to senior and junior candidates for the right talent).
Proficiency with Python Data Engineering stack.
Hands-on experience with dbt for data transformation.
Experience with Apache Airflow or Cloud Composer for workflow orchestration.
Proficiency with Docker for containerization and Kubernetes for orchestration.
Experience utilizing CI/CD and Infrastructure-as-Code (IaC) tools in a production environment.
Interest or experience in machine learning, LLMs, and Natural Language Processing (NLP) is highly desirable.
Strong understanding of data architecture and data modeling principles.
Familiarity with GitHub for version control and GCP for cloud deployments.

Required profile