Location: remote
Project start: ASAP
Project end: December 31, 2025
Workload: Full-time
Tasks and qualifications:
• Advice on architectural discussions and workshops with customers, understand their business and technical requirements to create the desired technical architectures and solutions on Cloud around data engineering, data lakes, data lakehouses, BI and ML/AI.
• Perform requirements scoping exercises with project and use case stakeholders.
• Translate requirements into desired technical solution design.
• Carry out PoCs, prototypes and build MVPs for new innovative solutions and technology scouting on cloud (Azure) and Big Data Technologies.
• Participate in hands-on technical project work including actual project implementation tasks and some production support tasks (e.g. monitoring) as required.
• Evaluate and implement platform cost optimizations
• Create and maintain technical documentation for the use cases, solutions and data platform.
• Perform analysis of best practices and emerging concepts in Cloud based technologies with special focus on Data and Analytics cloud Ecosystem.
Absolute Musts:
• In-depth knowledge of Apache Spark and experience in optimizing and performance
tuning of Apache Spark data processing jobs
• In-depth knowledge of Delta Lake
• In-depth knowledge of Data Engineering on (Azure) Databricks
• Strong hands-on experience in Python programming and in writing complex SQL
queries
• Strong hands-on experience in building complex data pipelines in Azure Data Factory
• Strong hands-on experience in the following Azure Data Services: ADLS Gen2, Synapse
Serverless
• Experience in architecting and building enterprise grade data platforms on Cloud and
developing Big Data solution architectures preferably on Azure, incl.:
o Gathering requirements and mapping those to technical architectures
o Awareness of best practices for selecting a component mix of Cloud services (e.
g. ADF, Databricks, Synapse, etc.)
o Ability to assess pros and cons of architecture variations (e. g. Databricks vs
Snowflake vs. MS Fabric, Synapse vs MS Fabric Lakehouse, Databricks vs. open-
source Spark, …).
Additionally required:
• Multi-year experience working (>6 years) in a data engineer role, incl.:
• Expertise in designing, building and maintaining large scale data pipelines as well as
processing (transforming, aggregating, wrangling) data.
• Proficient with Streaming technologies like Kafka, Spark Structured Streaming or
equivalent cloud services
• Good know-how of cloud computing concepts and of Azure cloud platform, networking,
security and monitoring aspects
• Hands-on experience in using and applying IAC, CI/CD and DevOps practices in real Data
Analytics projects, preferably using Azure DevOps and Terraform.
• Knowledge of Microsoft Fabric will be added advantage.