Solution Architecture: Design scalable and secure data pipelines using AWS Glue, PySpark, and related AWS services (EMR, S3, Lambda, etc.)
Leadership & Mentorship: Guide junior engineers, conduct code reviews, and enforce best practices in development and deployment.
ETL Development: Lead the design and implementation of end-to-end ETL processes for structured and semi-structured data.
Framework Building: Develop and evolve data frameworks, reusable components, and automation tools to improve engineering productivity.
Performance Optimization: Optimize large-scale data workflows for performance, cost, and reliability.
Data Governance: Implement data quality, lineage, and governance strategies in compliance with enterprise standards.
Collaboration: Work closely with product, analytics, compliance, and DevOps teams to deliver high-quality solutions aligned with business goals.
CI/CD Automation: Set up and manage continuous integration and deployment pipelines using AWS CodePipeline, Jenkins, or GitLab.
Documentation & Presentations: Prepare technical documentation and present architectural solutions to stakeholders across levels.
7–12 years of experience in data engineering or related fields.
Strong expertise in Python programming with a focus on data processing.
Extensive experience with AWS Glue (both Glue Jobs and Glue Studio/Notebooks).
Deep hands-on experience with PySpark for distributed data processing.
Solid AWS knowledge: EMR, S3, Lambda, IAM, Athena, CloudWatch, Redshift, etc.
Proven experience in architecture and managing complex ETL workflows.
Proficiency with Apache Airflow or similar orchestration tools.
Hands-on experience with CI/CD pipelines and DevOps best practices.
Familiarity with data quality, data lineage, and metadata management.
Strong experience working in agile/scrum teams.
Excellent communication and stakeholder engagement skills.
Experience in financial services, capital markets, or compliance systems.
Knowledge of data modeling, data lakes, and data warehouse architecture.
Familiarity with SQL (Athena/Presto/Redshift Spectrum).
Exposure to ML pipeline integration or event-driven architecture is a plus.
Flexible work culture and remote options
Opportunity to lead cutting-edge cloud data engineering projects
Skill-building in large-scale, regulated environments.
Capgemini
Globant
Lean Tech
WellSky
Jobright.ai