Key Facts

Remote From:

Full time

Senior (5-10 years)

English

Hard Skills

Apache Spark Data Architecture Amazon Managed Streaming for Apache Kafka (Amazon MSK) Data Modeling Data Quality Assessment Data Partitioning Observability Data Lineage Infrastructure Automation Software Development CI/CD Infrastructure as Code (IaC) Docker (Software) AWS Cost Management Google Kubernetes Engine (GKE) Architectural Design

Other Skills

•
Mentorship

Commit

About Commit

Commit is a global tech services company with offices in New York, Israel, and Europe. The company was founded in 2005 and has over 700 multi-disciplinary innovation experts who serve a broad range of companies, from small startups to large enterprises in multiple business sectors. Commit specializes in advanced technologies and applications with dedicated practices in Software, IoT, Big Data, Cloud, Cyber, Collaboration, Data center migration projects, and more. Commit offers innovative, end-to-end technology solutions by developing custom software and IoT platforms for clients looking to build their next-gen products within the modern ICT world. Commit’s complete and comprehensive engineering powerhouse of resources, and proprietary Flexible R&D methodology helps transform its clients’ technology visions into high-quality products while reducing costs and improving time-to-market.

Company type: SME

Founded: 2018

Company size: 501 - 1000

Website LinkedIn See all jobs →

Job description

Description

We are building a greenfield analytics platform supporting both batch and real-time data processing. We are looking for a Senior Data Engineer who can design, implement, and evolve scalable data systems in AWS.

This role combines hands-on development, architectural decision-making, and platform ownership.

Core Responsibilities:

Design and implement batch and streaming data pipelines using Apache Spark.
Build and evolve a scalable AWS-based data lake architecture.
Develop and maintain real-time data processing systems (event-driven pipelines).
Own performance tuning and cost optimization of Spark workloads.
Define best practices for data modeling, partitioning, and schema evolution.
Implement monitoring, observability, and data quality controls.
Contribute to infrastructure automation and CI/CD for data workflows.
Participate in architectural decisions and mentor other engineers.

Requirements

Required Qualifications:

5+ years of experience in Data Engineering.
Strong hands-on experience with Apache Spark (including Structured Streaming).
Experience building both batch and streaming pipelines in production environments.
Proven experience designing AWS-based data lake architectures: S3, EMR, Glue, Athena.
Experience with event streaming platforms such as Apache Kafka or Amazon Kinesis.
Experience implementing lakehouse formats such as Delta Lake.
Strong understanding of partitioning strategies and schema evolution.
Experience using SparkUI and AWS CloudWatch for profiling and optimization.
Strong understanding of Spark performance tuning (shuffle, skew, memory, partitioning).
Proven track record of cost optimization in AWS environments.
Experience with Docker and CI/CD pipelines.
Experience with Infrastructure as Code: Terraform, AWS CDK.
Familiarity with monitoring and observability practices.
Experience in the Financial domain.
Experience running Spark workloads on Kubernetes.
Experience implementing data quality frameworks or metadata/lineage systems.
English - B2, Ukrainian- Native