This is a remote position.
Location : Remote, anywhere in US
Key Responsibilities:
Data Ingestion and Integration:
· Develop and maintain data ingestion processes to collect data from various sources.
· Integrate data from different platforms and databases into a unified data lake.
Data Processing:
· Create data processing jobs using Hive and PySpark for large-scale data transformation.
· Optimize data processing workflows to ensure efficiency and performance.
Data Pipeline Development:
· Design and implement ETL pipelines to move data from raw to processed formats.
· Monitor and troubleshoot data pipelines, ensuring data quality and reliability.
Data Modeling and Optimization:
· Develop data models for efficient querying and reporting using Hive.
· Implement performance tuning and optimization strategies for Hadoop and Spark.
Data Governance:
· Implement data security and access controls to protect sensitive information.
· Ensure compliance with data governance policies and best practices.
Collaboration:
· Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and provide data support.
Qualifications:
Preferred Qualifications:
Calyptus
Mitchell International
Rapinno Health Care
Parsons Corporation
Irium México