LeadStack Inc. is an award-winning, one of the nation's fastest-growing, certified minority-owned (MBE) staffing services provider of contingent workforce. As a recognized industry leader in contingent workforce solutions and Certified as a Great Place to Work, we're proud to partner with some of the most admired Fortune 500 brands in the world.
Job Title: Software Engineer, Data Processing & Privacy Duration: 06 months with possible extension Location: Remote (PST)
Pay Range: $150/hr - $200/hr on W2
Job Description About the role
Client is seeking a detail-oriented Software Engineer on a contract basis to build and run data processing pipelines for datasets used in our research. You'll take raw, heterogeneous inputs — text, code, documents, structured exports — and turn them into clean, well-structured, privacy-safe outputs ready for downstream use.
The work spans ingestion, format normalization, data quality, privacy handling (including PII de-identification), and the supporting tooling that makes the pipeline reliable and self-serve. You'll iterate closely with internal teams on QA findings and harden the pipeline, so each new dataset is cheaper than the last.
Responsibilities
Build and extend per-source processing for new data types as they arrive
Ingest and normalize raw exports across many formats into consistent, well-structured outputs
Handle privacy requirements — for example, PII detection and de-identification — to meet our internal compliance bar
Run data quality QA: automated checks plus LLM-assisted review to flag gaps, malformed inputs, and incompleteness
Iterate on internal feedback: root-cause issues, fix, re-run, re-deliver
Build supporting tools: auditing, data exploration, monitoring, simple search over processed data
Land cleaned data with the right storage layout and access controls
Document and harden the pipeline so each new dataset is cheaper than the last
You may be a good fit if you
Have 4+ years of software engineering experience, with substantial time on data pipelines
Are a proficient user of Claude / Claude Code for day-to-day engineering and know when to verify its output
Are genuinely detail-oriented
Have high integrity and take handling real people's personal data seriously
Are comfortable with sustained, careful data work and find satisfaction in getting it right
Can work independently, ship reliably, and communicate clearly about progress and edge cases
Are proficient in Python and comfortable working across many heterogeneous, semi-structured formats (JSON, NDJSON, code, HTML/XML dumps, archives)
Strong candidates may also have experience with
PII detection and anonymization techniques
Working with large, messy, semi-structured text and code corpora
Data quality monitoring and validation
Cloud storage and access-control patterns (S3/GCS, IAM)
Building internal tools or self-serve data platforms for researchers
Information retrieval, search, or RAG systems
To know more about current opportunities at LeadStack, please visit us at https://leadstackinc.com/careers/
Should you have any questions, feel free to call me (415) 549-3167 on or send an email on deepak.kumar@leadstackinc.com