This is a remote position.
Location : Remote, anywhere in US
As a Big Data Engineer, you will be responsible for designing, developing, and maintaining our big data infrastructure. You will work with large datasets, perform data processing, and support various business functions by creating data pipelines, data processing jobs, and data integration solutions. You will be working in a dynamic and collaborative environment, leveraging your expertise in Hive, Hadoop, and PySpark to unlock valuable insights from our data.
Key Responsibilities:
Data Ingestion and Integration:
· Develop and maintain data ingestion processes to collect data from various sources.
· Integrate data from different platforms and databases into a unified data lake.
Data Processing:
· Create data processing jobs using Hive and PySpark for large-scale data transformation.
· Optimize data processing workflows to ensure efficiency and performance.
Data Pipeline Development:
· Design and implement ETL pipelines to move data from raw to processed formats.
· Monitor and troubleshoot data pipelines, ensuring data quality and reliability.
Data Modeling and Optimization:
· Develop data models for efficient querying and reporting using Hive.
· Implement performance tuning and optimization strategies for Hadoop and Spark.
Data Governance:
· Implement data security and access controls to protect sensitive information.
· Ensure compliance with data governance policies and best practices.
Collaboration:
· Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and provide data support.