Offer summary

Qualifications:

7–12 years of experience in data engineering or related fields., Strong expertise in Python programming for data processing., Extensive experience with AWS Glue and PySpark for distributed data processing., Knowledge of AWS services such as EMR, S3, Lambda, and data governance practices..

Key responsibilities:

Design and develop scalable, secure data pipelines using AWS Glue and related services.

Lead and mentor junior engineers, conducting code reviews and enforcing best practices.

Develop end-to-end ETL processes for structured and semi-structured data.

Collaborate with cross-functional teams to deliver high-quality data solutions.

Job description

We are seeking a highly experienced and hands-on Lead/Senior Data Engineer to architect, develop, and optimize data solutions in a cloud-native environment. The ideal candidate will have 7–12 years of strong technical expertise in AWS Glue, PySpark, and Python, along with experience designing robust data pipelines and frameworks for large-scale enterprise systems. Prior exposure to the financial domain or regulated environments is a strong advantage.

Key Responsibilities:

Solution Architecture: Design scalable and secure data pipelines using AWS Glue, PySpark, and related AWS services (EMR, S3, Lambda, etc.)
Leadership & Mentorship: Guide junior engineers, conduct code reviews, and enforce best practices in development and deployment.
ETL Development: Lead the design and implementation of end-to-end ETL processes for structured and semi-structured data.
Framework Building: Develop and evolve data frameworks, reusable components, and automation tools to improve engineering productivity.
Performance Optimization: Optimize large-scale data workflows for performance, cost, and reliability.
Data Governance: Implement data quality, lineage, and governance strategies in compliance with enterprise standards.
Collaboration: Work closely with product, analytics, compliance, and DevOps teams to deliver high-quality solutions aligned with business goals.
CI/CD Automation: Set up and manage continuous integration and deployment pipelines using AWS CodePipeline, Jenkins, or GitLab.
Documentation & Presentations: Prepare technical documentation and present architectural solutions to stakeholders across levels.

Requirements

Required Qualifications:

7–12 years of experience in data engineering or related fields.
Strong expertise in Python programming with a focus on data processing.
Extensive experience with AWS Glue (both Glue Jobs and Glue Studio/Notebooks).
Deep hands-on experience with PySpark for distributed data processing.
Solid AWS knowledge: EMR, S3, Lambda, IAM, Athena, CloudWatch, Redshift, etc.
Proven experience in architecture and managing complex ETL workflows.
Proficiency with Apache Airflow or similar orchestration tools.
Hands-on experience with CI/CD pipelines and DevOps best practices.
Familiarity with data quality, data lineage, and metadata management.
Strong experience working in agile/scrum teams.
Excellent communication and stakeholder engagement skills.

Preferred/Good to Have:

Experience in financial services, capital markets, or compliance systems.
Knowledge of data modeling, data lakes, and data warehouse architecture.
Familiarity with SQL (Athena/Presto/Redshift Spectrum).
Exposure to ML pipeline integration or event-driven architecture is a plus.