Job Title: Data Engineer
Job Description:
• Assist in building and maintaining ETL/data pipelines using Python and PySpark
• Ingest, transform, and validate data from multiple sources
• Support data modeling and schema design for structured datasets
• Use Git for version control and collaborate with engineering teams
• Perform unit testing, code reviews, and performance optimization
• Contribute to technical documentation of data workflows and pipelines
• Support feature testing and controlled releases in QA/dev environments
• Perform exploratory analysis using Jupyter/Amazon SageMaker notebooks
• Work in a Scrum/Agile environment with clear communication and collaboration
Required Skills:
• Bachelor’s degree in Computer Science, Data Engineering, Data Science, or Statistics
• Experience in Python and PySpark (0–2 years)
• Basic knowledge of Airflow, AWS S3, and AWS Glue
• Familiarity with Git and Jupyter notebooks
• Understanding of Docker/Kubernetes concepts
Tools & Environment:
• JupyterLab / Python IDEs
• GitHub
• Microsoft Teams & Outlook
• EMR Studio
• Jira & Confluence
Additional:
• Strong attention to detail, willingness to learn, and ability to work in a collaborative team environment