Match score not available

Sr/Lead Data Engineer (Python/Spark/Jupyter Notebooks/Delta Lake/Data Vault 2.0)

Remote:

Full Remote

Contract:

Freelance

Experience:

Senior (5-10 years)

Work from:

New Jersey (USA), United States

Offer summary

Qualifications:

10-15+ years in data engineering, Strong Python and PySpark skills, Expertise with Delta Lakes architecture, Experience with Jupyter Notebooks, Familiarity with AWS, Azure or GCP.

Key responsabilities:

Design and maintain data pipelines using Python and Spark.
Implement Delta Lake architecture for data integrity.
Collaborate on creating reusable Jupyter Notebooks.
Optimize data storage processes for high performance.
Monitor and improve data processing performance.

Yoh, A Day & Zimmermann Company Human Resources, Staffing & Recruiting Large https://www.yoh.com/

1001 - 5000 Employees

See more Yoh, A Day & Zimmermann Company offers

Job description

Sr/Lead Data Engineer (Python/Spark/Jupyter Notebooks/Delta Lake/Data Vault 2.0)
Location: Remote (MST/CST/EST preferred)
Pay rate: $77-$110/HR W2
Duration: 6-month increments, if going well will extend out yearly

Notes:

Client has built a modern data platform and needs senior data engineers to work on various projects supporting Client’s business
- Storage initiatives
- Data Applications
- Code reviews
- Azure Synapse Analytics
- Delta Lake initiatives
Data Group
- 65-70 resources including Data Engineers, Data Analysts, BA’s, QA, Scrum Masters, etc broken out into 6 teams
- Do NOT use ETL tools, utilize Data Vault 2.0 methods for data transfer
Looking for VERY senior resources, up to hands-on lead level
- Experienced with Assertion based Architecture
- Engineers vs “coders”
- Coding is done in Jupyter Notebooks on Delata Lakes
- Need resources who can articulate design and build highly scalable solutions before jumping into coding
- Do NOT want resources who need to be told what to do
- Need critical thinkers who can troubleshoot and debug
- Independent workers, self starters, who speak up and raise impediments and offer solutions
Required skills:
- Python
- Jupyter Notebooks
- Delta Lake
- Spark, PySpark, Spark SQL
- Serverless data infrastructure
- Data Vault 2.0 methodology experience
- Great Expectations data quality validation
- Automated Testing
Bonus skills:
- Kakfa streaming – HUGE plus if they have solid background here
- Scala

Key Responsibilities:

Design, develop, and maintain data pipelines using Python, PySpark, and Spark SQL to process and transform large-scale datasets.
Implement Delta Lake architecture to ensure data reliability, consistency, and integrity for large, distributed datasets.
Utilize serverless data infrastructure (e.g., AWS Lambda, Azure Functions, Databricks) to build scalable and cost-efficient data solutions.
Collaborate with Data Scientists and Analysts by creating reusable Jupyter Notebooks for data exploration, analysis, and visualization.
Optimize and manage data storage and retrieval processes, ensuring high performance and low latency.
Implement best practices for data security, governance, and compliance within the data infrastructure.
Work closely with cross-functional teams to understand data requirements and deliver solutions aligned with business objectives.
Continuously monitor, troubleshoot, and improve the performance of data processing pipelines and infrastructure.

Qualifications:

10-15+ years of experience in data engineering or related fields.
Strong programming skills in Python with experience in data processing frameworks like PySpark.
Extensive hands-on experience with Apache Spark and Spark SQL for processing and querying large datasets.
Expertise with Delta Lakes for implementing scalable data lakehouse architectures.
Experience with Jupyter Notebooks for prototyping and collaboration with data teams.
Familiarity with serverless data technologies such as AWS Lambda, Azure Functions, or similar platforms.
Proficient in working with cloud platforms such as AWS, Azure, or Google Cloud.
Experience with data pipeline orchestration tools (e.g., Apache Airflow, Prefect, or similar).
Solid understanding of data warehousing, ETL/ELT pipelines, and modern data architectures.
Strong problem-solving skills and ability to work in a collaborative environment.
Experience with CI/CD pipelines and DevOps practices is a plus.

Preferred Qualifications:

Experience with Databricks for data engineering workflows.
Familiarity with modern data governance practices and tools like Apache Atlas or AWS Glue.
Knowledge of machine learning workflows and how data engineering supports AI/ML models.

Note: Any pay ranges displayed are estimations. Actual pay is determined by an applicant's experience, technical expertise, and other qualifications as listed in the job description. All qualified applicants are welcome to apply.

Yoh, a Day & Zimmermann company, is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Visit https://www.yoh.com/applicants-with-disabilities to contact us if you are an individual with a disability and require accommodation in the application process.

For California applicants, qualified applicants with arrest or conviction records will be considered for employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. All of the material job duties described in this posting are job duties for which a criminal history may have a direct, adverse, and negative relationship potentially resulting in the withdrawal of a conditional offer of employment.