Offer summary

Qualifications:

Bachelor's or Master's degree in Computer Science, Engineering, or a related field., Proficiency in Big Data technologies such as Databricks, Apache Airflow, and Apache Spark/PySpark., Experience in designing and optimizing data processing pipelines for both streaming and batch workloads., Strong understanding of data security, compliance, and governance best practices..

Key responsibilities:

Design and optimize scalable data processing pipelines for streaming and batch workloads.

Architect and implement end-to-end data platforms ensuring high availability and performance.

Lead the development of CI/CD and MLOps processes for automated deployments and monitoring.

Collaborate with Data Science teams on Machine Learning projects and manage complex data transformations.

Job description

Addepto is a leading consulting and technology company specializing in AI and Big Data, helping clients deliver innovative data projects. We partner with top-tier global enterprises and pioneering startups, including Rolls Royce, Continental, Porsche, ABB, and WGU. Our exclusive focus on AI and Big Data has earned us recognition by Forbes as one of the top 10 AI companies.

As a Senior Data Engineer, you will have the exciting opportunity to work with a team of technology experts on challenging projects across various industries, leveraging cutting-edge technologies. Here are some of the projects we are seeking talented individuals to join:

Design and development of a universal data platform for global aerospace companies. This Azure and Databricks powered initiative combines diverse enterprise and public data sources. The data platform is at the early stages of the development, covering design of architecture and processes as well as giving freedom for technology selection.
Data Platform Transformation for energy management association body. This project addressed critical data management challenges, boosting user adoption, performance, and data integrity. The team is implementing a comprehensive data catalog, leveraging Databricks and Apache Spark/PySpark, for simplified data access and governance. Secure integration solutions and enhanced data quality monitoring, utilizing Delta Live Table tests, established trust in the platform. The intermediate result is a user-friendly, secure, and data-driven platform, serving as a basis for further development of ML components.
Design of the data transformation and following data ops pipelines for global car manufacturer. This project aims to build a data processing system for both real-time streaming and batch data. We’ll handle data for business uses like process monitoring, analysis, and reporting, while also exploring LLMs for chatbots and data analysis. Key tasks include data cleaning, normalization, and optimizing the data model for performance and accuracy.

🚀 Your main responsibilities:

Design and optimize scalable data processing pipelines for both streaming and batch workloads using Big Data technologies such as Databricks, Apache Airflow, and Dagster.
Architect and implement end-to-end data platforms, ensuring high availability, performance, and reliability.
Lead the development of CI/CD and MLOps processes to automate deployments, monitoring, and model lifecycle management.
Develop and maintain applications for aggregating, processing, and analyzing data from diverse sources, ensuring efficiency and scalability.
Collaborate with Data Science teams on Machine Learning projects, including text/image analysis, feature engineering, and predictive model deployment.
Design and manage complex data transformations using Databricks, DBT, and Apache Airflow, ensuring data integrity and consistency.
Translate business requirements into scalable and efficient technical solutions while ensuring optimal performance and data quality.
Ensure data security, compliance, and governance best practices are followed across all data pipelines.

Required profile