Strong experience with web scraping and data extraction.
Practical programming experience using Python or similar scripting languages.
Experience working with HTML parsing, APIs, HTTP requests, FTP sources, and structured or unstructured data.
Strong analytical and problem-solving skills.
Requirements:
Research and identify public and government data sources.
Extract and normalize data from websites, APIs, feeds, and online repositories.
Build reusable, maintainable, and re-runnable scripts and scraping workflows.
Document data sources, extraction methodologies, challenges encountered, and re-run procedures.
Job description
This is a remote position.
We are seeking a Data Scraping to help collect, organize, and normalize data from public and government sources into a consistent, structured format. This role focuses on solving complex data acquisition challenges, researching unfamiliar sources, extracting information from websites and feeds, and transforming it into predefined formats that can be consumed by downstream systems. The ideal candidate enjoys working with messy datasets, investigating how websites and data sources are structured, and creating reusable solutions that can be executed repeatedly with consistent results. This position requires strong problem-solving skills, attention to detail, and the ability to work independently while documenting findings and processes clearly.
Responsibilities: Research and identify public and government data sources. Extract and normalize data from websites, APIs, feeds, and online repositories. Build reusable, maintainable, and re-runnable scripts and scraping workflows. Deliver structured outputs in predefined formats. Provide sample outputs for review before processing larger datasets. Document data sources, extraction methodologies, challenges encountered, and re-run procedures. Capture and report any relevant information discovered during extraction, including inconsistencies, amendments, effective dates, repeal notes, or related metadata. Troubleshoot data acquisition issues and propose alternative approaches when needed. Collaborate with stakeholders through regular check-ins and written communication. Maintain version-controlled code repositories and follow standard development practices.
Requisitos
Strong experience with web scraping and data extraction. Practical programming experience using Python or similar scripting languages. Experience working with HTML parsing, APIs, HTTP requests, FTP sources, and structured or unstructured data. Ability to evaluate, debug, and improve scraping solutions. Strong analytical and problem-solving skills. Experience building reusable automation workflows rather than one-off scripts. Familiarity with relational databases (PostgreSQL preferred) and a normal Git workflow. Strong documentation and communication skills. Ability to work independently and take ownership of technical challenges. High attention to detail and commitment to data accuracy. Nice to Have: Experience working with government, regulatory, compliance, or public-sector datasets. Experience with Playwright, Selenium, Puppeteer, Scrapy, or similar scraping frameworks. Experience with data versioning, change detection, or document lineage. Familiarity with AI-assisted development tools and workflows.