Logo for HYRED

Data Engineer (Data Pipelines & RAG)

Roles & Responsibilities

  • 5+ years in Data Engineering or similar role
  • 12-24 months building pipelines for unstructured data extraction (documents, OCR, chunking, indexing) for downstream RAG/Gen AI
  • Proficiency in Python; experience with dlt for ETL/ELT; duckDB or equivalent; DVC for large files
  • Strong SQL skills; experience designing and scaling relational databases; familiarity with NoSQL/columnar databases
  • Familiarity with Prefect; or Azure Data Factory
  • Proficiency with Azure ecosystem and production Azure services
  • Experience with RAG indexing, chunking and storage across file types
  • CI/CD and DevOps experience (CircleCI / Azure DevOps); IaC (Terraform, ARM templates)
  • Experience deploying ML artifacts with MLflow, Docker, or Kubernetes (nice to have)
  • Bonus: Computer vision extraction; agentic AI system design; data governance and privacy/compliance (GDPR)

Requirements:

  • Design, build, deploy, and maintain end-to-end data pipelines and ETL/ELT workflows (batch and streaming via Spark)
  • Data modeling: logical/physical schemas; data dictionaries and cross-system consistency
  • Automate ingestion from diverse sources (Databases, APIs, files, SharePoint) with unstructured content; implement chunking, embeddings, and indexing for efficient RAG/agent retrieval
  • Develop and maintain Gen AI integration: surface real-time context in LLM prompts; implement prompt engineering and RAG workflows for construction/RE vertical
  • Observability & governance: monitoring, alerting, logging; data quality checks; access controls and privacy safeguards (Unity Catalog, IAM)
  • CI/CD automation: automated testing, versioning, deployment; maintain reproducible environments with IaC (Terraform, ARM); use Azure DevOps, GitHub Actions, Prefect/Airflow
  • Collaborate in a fast-growing startup; wear multiple hats and deliver production impact

Job description

Our client is a fast growing Property Tech AI company

About the role

They are seeking a versatile Data & AI Engineer to build, deploy & maintain end-to-end data pipelines for downstream Gen AI applications. You'll design data models and transformations, build scalable ETL/ELT workflows, while learning fast and working on the AI agent space.

Key Responsibilities

Data Modeling & Pipeline development

  • Automate data ingestion from diverse sources (Databases, APIs, files, Sharepoint/ document management tools, URLs). Most files are expected to be unstructured documents with different file formats, tables, charts, process flows, schedules, construction layouts/drawings, etc.
  • Own chunking strategy, embedding, indexing all unstructured & structured data for efficient retrieval by downstream RAG/agent systems
  • Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming)
  • Define and implement logical/physical data models and schemas. Develop schema mapping and data dictionary artifacts for cross-system consistency

Gen AI Integration

  • Instrument data pipelines to surface real-time context into LLM prompts
  • Implement prompt engineering and RAG for varied workflows within the RE/Construction industry vertical

Observability & Governance

  • Implement monitoring, alerting, and logging (data quality, latency, errors)
  • Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM)

CI/CD & Automation

  • Develop automated testing, versioning, and deployment (Azure DevOps, GitHub Actions, Prefect/Airflow)
  • Maintain reproducible environments with infrastructure as code (Terraform, ARM templates)

Required Skills & Experience

  • 5 years in Data Engineering or similar role, with at least 12-24 months of exposure to building pipelines for unstructured data extraction including document processing with OCR, cloud-native solutions and chunking, indexing etc. for downstream consumption by RAG/ Gen AI applications.
  • Proficiency in Python, dlt for ETL/ELT pipeline, duckDB or equivalent tools for analytical in-process analysis, dvc for managing large files efficiently.
  • Solid SQL skills and experience designing and scaling relational databases. Familiarity with non-relational column based databases is preferred.
  • Familiarity with Prefect is preferred or others (e.g. Azure Data Factory)
  • Proficiency with the Azure ecosystem. Should have worked on Azure services in production.
  • Familiarity with RAG indexing, chunking and storage across file types for efficient retrieval.
  • Strong Dev Ops/Git workflows and CI/CD (CircleCI / Azure DevOps)
  • Experience deploying ML artifacts using MLflow, Docker, or Kubernetes is good to have.

Bonus skillsets:

  • Experience with Computer vision based extraction or experience in building ML models for production
  • Knowledge of agentic AI system design - memory, tools, context, orchestration
  • Knowledge of data governance, privacy laws (GDPR) and enterprise security patterns

They are an early-stage startup, so you are expected to wear many hats, working with things out of your comfort zone, but with real and direct impact in production.

Why our client?

  • Fast-growing, revenue-generating proptech startup
  • Flat, no BS environment, high autonomy for the right talent
  • Steep learning opportunities in real world enterprise production use-cases
  • Remote work with quarterly meet-ups
  • Multi-market, multi-cultural client exposure

Data Engineer Related jobs

Other jobs at HYRED

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

✨

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.