Offer summary

Qualifications:

Bachelor's degree in computer science, 8+ years of experience in data engineering, Proficiency in Python, SQL, Java or Scala, Experience with Apache Spark and Hadoop.

Key responsibilities:

Design, build, and maintain data pipelines

Collaborate with stakeholders to translate requirements into solutions

Job description

Job Description

Data Engineer is responsible for designing, building, and maintaining the infrastructure and systems required for collecting, storing, and processing large datasets efficiently.

Education: Bachelor's degree in computer science with 8+ years of experience

Experience:

Technical Skills

Programming Languages: Proficiency in Python, SQL, Java, or Scala for data manipulation and pipeline development.
Data Processing Frameworks: Experience with tools like Apache Spark, Hadoop, or Apache Kafka for large-scale data processing.

Data Systems and Platforms

Databases: Knowledge of both relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra).
Data Warehousing: Experience with platforms like Snowflake, Amazon Redshift and Azure Synapse.
Cloud Platforms: Familiarity with AWS, Azure Cloud for deploying and managing data pipelines. Having Good experience in Fabric is advantageous
Experience working with distributed computing systems such as Hadoop HDFS, Hive, or Spark.
Managing and optimizing data lakes and delta lakes for structured and unstructured data.

Data Modeling and Architecture

Expertise in designing efficient data models (e.g., star schema, snowflake schema) and maintaining data integrity.
Understanding of modern data architectures like Data Mesh or Lambda Architecture.

Data Pipeline Development

Building and automating ETL/ELT pipelines for extracting data from diverse sources, transforming it, and loading it into target systems.
Monitoring and troubleshooting pipeline performance and failures.

Workflow Orchestration

Hands-on experience with orchestration tools such as Azure Data Factory, AWS Glue jobs, DMS or Prefect to schedule and manage workflows.

Version Control and CI/CD

Utilizing Git for version control and implementing CI/CD practices for data pipeline deployments.

Key Skills:

Proficiency in programming languages such as Python, SQL, and optionally Scala or Java.
Proficiency in data processing frameworks like Apache Spark and Hadoop is crucial for handling large-scale and real-time data.
Expertise in ETL/ELT tools such as Azure ADF and Fabric Data Pipeline is important for creating efficient and scalable data pipelines.
A solid understanding of database systems, including relational databases like MySQL and PostgreSQL, as well as NoSQL solutions such as MongoDB and Cassandra, is fundamental.
Experience with cloud platforms, including AWS, Azure and their data-specific services like S3, BigQuery, and Azure Data Factory, is highly valuable.
Data modeling skills, including designing star or snowflake schema, and knowledge of modern architectures like Lambda and Data Mesh, are critical for building scalable solutions.

Role and Responsibilities:

Responsible for designing, developing, and maintaining data pipelines and infrastructure to support our data-driven decision-making processes.
Design, build, and maintain data pipelines to extract, transform, and load data from various sources into our data warehouse and data lake.
Proficient in creating data bricks creating notebooks, working with catalogs, native SQL, creating clusters, Parameterizing notebooks, and administrating data bricks. Define security models and assign roles as per requirement.
Responsible for creating data flow in Synapse analytics integrating external source systems, creating external tables, data flows and create data models. Schedule the pipelines using various jobs, creating trigger
Design and develop data pipelines using Fabric pipelines, spark notebooks accessing multiple data sources. Proficient in developing Data bricks notebooks and data optimization
Develop and implement data models to ensure data integrity and consistency. Manage and optimize data storage solutions, including databases and data warehouses.
Develop and implement data quality checks and validation procedures to ensure data accuracy and reliability.
Design and implement data infrastructure components, including data pipelines, data lakes, and data warehouses.
Collaborate with data scientists, analysts, and other stakeholders to understand business requirements and translate them into technical solutions.
Monitoring Azure and Fabric data pipelines, spark jobs and work on fixes based on the request priority.
Responsible for data monitoring activities, having good knowledge on reporting tools like Power Bi and Tableau is required.
Responsible for understanding the client requirements and architect solutions in both Azure and AWS cloud platforms.
Monitor and optimize data pipeline performance and scalability to ensure efficient data processing.

About Us

Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leading technologies. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes, and is an AWS Advanced Tier Consulting Partner, a Microsoft Solutions Partner for Data & AI and Digital & App Innovation (Azure), an Oracle Partner, and a MySQL Partner.

About The Team

Datavail’s Data Management Services:

Datavail’s Data Management and Analytics practice is made up of experts who provide a variety of data services including initial consulting and development, designing and building complete data systems, as well as ongoing support and management of database, data warehouse, data lake, data integration, and virtualization and reporting environments. Datavail’s team is comprised of not just excellent BI & analytics consultants, but great people as well. Datavail’s data intelligence consultants are experienced, knowledgeable and certified in the best in breed BI and analytics software applications and technologies. We ascertain your business objectives, goals and requirements, assess your environment, and recommend the tools which best fit your unique situation. Our proven methodology can help your project succeed, regardless of stage. With the combination of a proven delivery model and top-notch experience ensures that Datavail will remain the Data Management experts on demand you desire. Datavail’s flexible and client focused services always add value to your organization.

Required profile