This is a remote, contract position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics).
We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of Data products in the Aviation Industry, including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.
Responsibilities
Architect, design, develop, test and maintain high-performance, large-scale, complex data architectures, which support data integration (batch and real-time, ETL and ELT patterns from heterogeneous data systems: APIs and platforms), storage (data lakes, warehouses, data lake houses, etc), processing, orchestration and infrastructure. Ensuring the scalability, reliability, and performance of data systems, focusing on Databricks and Azure.
Contribute to detailed design, architectural discussions, and customer requirements sessions.
Actively participate in the design, development, and testing of big data products..
Construct and fine-tune Apache Spark jobs and clusters within the Databricks platform.
Migrate out of Azure Synapse to Azure Data Lake or other technologies.
Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive).
Design and implement data models and schemas that support efficient data processing and analytics.
Design and develop clear, maintainable code with automated testing using Pytest, unittest, integration tests, performance tests, regression tests, etc.
Collaborating with cross-functional teams and Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components meeting product deliverables.
Evaluating and implementing new technologies and tools to improve data integration, data processing, storage and analysis.
Evaluate, design, implement and maintain data governance solutions: cataloging, lineage, data quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patterns.
Continuously monitor and fine-tune workloads and clusters to achieve optimal performance.
Provide guidance and mentorship to junior team members, sharing knowledge and best practices.
Maintain clear and comprehensive documentation of the solutions, configurations, and best practices implemented.
Promote and enforce best practices in data engineering, data governance, and data quality.
Ensure data quality and accuracy.
Design, Implement and maintain data security and privacy measures.
Be an active member of an Agile team, participating in all ceremonies and continuous improvement activities, being able to work independently as well as collaboratively.
Dandy
PicPay
General Dynamics Information Technology
Extreme Networks
SEPTEO