Good understanding of AWS Data Engineering Services (Particularly Lambda, Step Function, Glue Jobs, S3 etc)
Working knowledge on Medallion Style Architecture (Raw, Curated and Presentation Layer)
Implementation of Data Lakes and Data Warehouse system.
Expertise on Pyspark to handle data from various types of sources and structures.
Knowledge on Data and Integration best practices.
Efficient in handling and ingesting data from Source to Data Lake to Data Warehouse.
Proficiency working in multiple file formats (CSV, Parquet etc)
Familiarity in Life Sciences Domain.
Experience working on multiple JIRA SCRUM boards & Sprints.
Requirements
Additional desirable Skillsets:
Knowledge on PostgreSQL DB (creating/handling stored procedures)
Snowflake SQL knowledge and handling various types of tables & queries.
Knowledge on other Integration Platforms such as HVR, FiveTran etc (Though some of the tools are just being used should at least have a desire to learn and work as needed)
Ability to collaborate across multiple teams.
Proactive in identifying and automating processes.
Knowledge on integrating various sources such as WorkDay, SalesForce, Maximo etc