Key Facts

Remote From:

Full time

Hard Skills

Digital File Management Python (Programming Language) Data Modeling Database Design Database Schema Object-Relational Mapping SQL (Programming Language) Environment Management Azure DevOps Infrastructure as Code (IaC) +10 more

Other Skills

•
Teamwork
•
Problem Solving
•
Learning Agility

Roles & Responsibilities

5+ years in Data Engineering or similar role
12-24 months building pipelines for unstructured data extraction (documents, OCR, chunking, indexing) for downstream RAG/Gen AI
Proficiency in Python; experience with dlt for ETL/ELT; duckDB or equivalent; DVC for large files
Strong SQL skills; experience designing and scaling relational databases; familiarity with NoSQL/columnar databases
Familiarity with Prefect; or Azure Data Factory
Proficiency with Azure ecosystem and production Azure services
Experience with RAG indexing, chunking and storage across file types
CI/CD and DevOps experience (CircleCI / Azure DevOps); IaC (Terraform, ARM templates)
Experience deploying ML artifacts with MLflow, Docker, or Kubernetes (nice to have)
Bonus: Computer vision extraction; agentic AI system design; data governance and privacy/compliance (GDPR)

Requirements:

Design, build, deploy, and maintain end-to-end data pipelines and ETL/ELT workflows (batch and streaming via Spark)
Data modeling: logical/physical schemas; data dictionaries and cross-system consistency
Automate ingestion from diverse sources (Databases, APIs, files, SharePoint) with unstructured content; implement chunking, embeddings, and indexing for efficient RAG/agent retrieval
Develop and maintain Gen AI integration: surface real-time context in LLM prompts; implement prompt engineering and RAG workflows for construction/RE vertical
Observability & governance: monitoring, alerting, logging; data quality checks; access controls and privacy safeguards (Unity Catalog, IAM)
CI/CD automation: automated testing, versioning, deployment; maintain reproducible environments with IaC (Terraform, ARM); use Azure DevOps, GitHub Actions, Prefect/Airflow
Collaborate in a fast-growing startup; wear multiple hats and deliver production impact

HYRED

Human Resources, Staffing & Recruiting

About HYRED

Hyred is a leading provider of HR solutions, headquartered in Singapore, with a strong focus on the APAC region. We offer a range of services including Recruitment, EOR, and PEO to organisations across the globe.At Hyred, we understand the importance of having the right people in the right roles. That's why we take a personalised approach to recruitment, ensuring that we find the best candidates for your organisation. Our EOR and PEO services provide a hassle-free way to expand your business into new markets, with compliance and risk management taken care of by our expert team.We pride ourselves on our commitment to quality and customer satisfaction, and our team of dedicated professionals is always ready to go the extra mile to ensure that our clients are happy. Our strong focus on the APAC region means that we have a deep understanding of the local markets and can provide tailored solutions that meet the unique needs of businesses in the region.Whether you're a small business looking to expand into new markets or a larger enterprise looking for a trusted partner to manage your HR needs, Hyred is here to help. Get in touch with us today to learn more about how we can help your business succeed!

Company type: Startup

Industry: Human Resources, Staffing & Recruiting

Founded: 2018

Company size: 11 - 50

Website LinkedIn See all jobs →

Job description

Our client is a fast growing Property Tech AI company

About the role

They are seeking a versatile Data & AI Engineer to build, deploy & maintain end-to-end data pipelines for downstream Gen AI applications. You'll design data models and transformations, build scalable ETL/ELT workflows, while learning fast and working on the AI agent space.

Key Responsibilities

Data Modeling & Pipeline development

Automate data ingestion from diverse sources (Databases, APIs, files, Sharepoint/ document management tools, URLs). Most files are expected to be unstructured documents with different file formats, tables, charts, process flows, schedules, construction layouts/drawings, etc.
Own chunking strategy, embedding, indexing all unstructured & structured data for efficient retrieval by downstream RAG/agent systems
Build, test, and maintain robust ETL/ELT workflows using Spark (batch & streaming)
Define and implement logical/physical data models and schemas. Develop schema mapping and data dictionary artifacts for cross-system consistency

Gen AI Integration

Instrument data pipelines to surface real-time context into LLM prompts
Implement prompt engineering and RAG for varied workflows within the RE/Construction industry vertical

Observability & Governance

Implement monitoring, alerting, and logging (data quality, latency, errors)
Apply access controls and data privacy safeguards (e.g., Unity Catalog, IAM)

CI/CD & Automation

Develop automated testing, versioning, and deployment (Azure DevOps, GitHub Actions, Prefect/Airflow)
Maintain reproducible environments with infrastructure as code (Terraform, ARM templates)

Required Skills & Experience

5 years in Data Engineering or similar role, with at least 12-24 months of exposure to building pipelines for unstructured data extraction including document processing with OCR, cloud-native solutions and chunking, indexing etc. for downstream consumption by RAG/ Gen AI applications.
Proficiency in Python, dlt for ETL/ELT pipeline, duckDB or equivalent tools for analytical in-process analysis, dvc for managing large files efficiently.
Solid SQL skills and experience designing and scaling relational databases. Familiarity with non-relational column based databases is preferred.
Familiarity with Prefect is preferred or others (e.g. Azure Data Factory)
Proficiency with the Azure ecosystem. Should have worked on Azure services in production.
Familiarity with RAG indexing, chunking and storage across file types for efficient retrieval.
Strong Dev Ops/Git workflows and CI/CD (CircleCI / Azure DevOps)
Experience deploying ML artifacts using MLflow, Docker, or Kubernetes is good to have.

Bonus skillsets:

Experience with Computer vision based extraction or experience in building ML models for production
Knowledge of agentic AI system design - memory, tools, context, orchestration
Knowledge of data governance, privacy laws (GDPR) and enterprise security patterns

They are an early-stage startup, so you are expected to wear many hats, working with things out of your comfort zone, but with real and direct impact in production.

Why our client?