Good understanding of AWS Data Engineering Services (particularly Lambda, Step Functions, Glue Jobs, S3)
Working knowledge of Medallion Architecture (Raw, Curated, and Presentation Layer)
Experience implementing data lakes and data warehouse systems
Expertise in PySpark for handling data from diverse sources and structures
Requirements:
Design, build, and maintain data lake and data warehouse solutions across raw, curated, and presentation layers
Develop and orchestrate data pipelines using AWS data services (Lambda, Step Functions, Glue, S3)
Ingest data from various sources into the data lake and data warehouse while ensuring data quality and governance
Collaborate across multiple teams using Jira SCRUM boards and sprints to deliver data engineering initiatives
Job description
This is a remote position.
Good understanding of AWS Data Engineering Services (Particularly Lambda, Step Function, Glue Jobs, S3 etc)
Working knowledge on Medallion Style Architecture (Raw, Curated and Presentation Layer)
Implementation of Data Lakes and Data Warehouse system.
Expertise on Pyspark to handle data from various types of sources and structures.
Knowledge on Data and Integration best practices.
Efficient in handling and ingesting data from Source to Data Lake to Data Warehouse.
Proficiency working in multiple file formats (CSV, Parquet etc)
Familiarity in Life Sciences Domain.
Experience working on multiple JIRA SCRUM boards & Sprints.
Requirements
Additional desirable Skillsets:
Knowledge on PostgreSQL DB (creating/handling stored procedures)
Snowflake SQL knowledge and handling various types of tables & queries.
Knowledge on other Integration Platforms such as HVR, FiveTran etc (Though some of the tools are just being used should at least have a desire to learn and work as needed)
Ability to collaborate across multiple teams.
Proactive in identifying and automating processes.
Knowledge on integrating various sources such as WorkDay, SalesForce, Maximo etc