Logo for NIR-YU

Data Engineer – Web Scraping, LLM Pipelines and Scalable Data Infrastructure

Key Facts

Remote From: 
Category:  Data Engineer
Full time
Mid-level (2-5 years)
English

Other Skills

  • Communication

Roles & Responsibilities

  • 4+ years of experience building data pipelines, backend services and automated data processing systems
  • Strong background in web scraping with tools like Scrapy, Playwright or similar
  • Experience deploying pipelines on cloud platforms such as GCP or AWS
  • Solid knowledge of ETL frameworks, workflow orchestration (Airflow) and modern data stores (BigQuery, PostgreSQL)

Requirements:

  • Build new structured datasets, including scraping accelerators, Form D filings and dynamic web sources
  • Develop automated ETL pipelines that parse, clean and transform content using LLMs
  • Define and maintain database schemas in Supabase or PostgreSQL
  • Create evaluation frameworks to measure and compare LLM performance across pipeline components
  • Contribute to the design of scalable data architectures using GCP services
  • Improve reliability, observability and deployment workflows for scraping and data processing systems

Job description

The Role:

You’ll help build the data foundation of our product: high‑volume web scraping systems, structured datasets and LLM‑driven processing pipelines. The role combines hands‑on engineering with architectural thinking and suits someone who enjoys turning messy web data into reliable, scalable outputs.

Key Responsibilities:

  • Build new structured datasets, including scraping accelerators, Form D filings and dynamic web sources.

  • Develop automated ETL pipelines that parse, clean and transform content using LLMs.

  • Define and maintain database schemas in Supabase or PostgreSQL.

  • Create evaluation frameworks to measure and compare LLM performance across pipeline components.

  • Contribute to the design of scalable data architectures using GCP services.

  • Improve reliability, observability and deployment workflows for scraping and data processing systems.

Requirements:

  • 4+ years of experience building data pipelines, backend services and automated data processing systems.

  • Strong background in web scraping with tools like Scrapy, Playwright or similar.

  • Experience deploying pipelines on cloud platforms such as GCP or AWS.

  • Solid knowledge of ETL frameworks, workflow orchestration (Airflow) and modern data stores (BigQuery, PostgreSQL).

  • Comfortable working with Docker and API frameworks like FastAPI.

  • Clear, fluent communication in English.

Data Engineer Related jobs

Other jobs at NIR-YU

We help you get seen. Not ignored.

We help you get seen faster — by the right people.

🚀

Auto-Apply

We apply for you — automatically and instantly.

Save time, skip forms, and stay on top of every opportunity. Because you can't get seen if you're not in the race.

AI Match Feedback

Know your real match before you apply.

Get a detailed AI assessment of your profile against each job posting. Because getting seen starts with passing the filters.

Upgrade to Premium. Apply smarter and get noticed.

Upgrade to Premium

Join thousands of professionals who got noticed and hired faster.