We are seeking a skilled Data Engineer to design, build, and optimize data pipelines and analytics platforms. This role will focus on developing scalable, efficient, and high-performance data solutions that support advanced analytics, business intelligence, and machine learning applications. The ideal candidate is passionate about data engineering, cloud-based data architectures, and modern data processing frameworks.
Data Architecture & Engineering:
Design and implement scalable data architectures leveraging BigQuery, Iceberg, Starburst, and Trino.
Develop robust, high-performance ETL/ELT pipelines to process structured and unstructured data.
Optimize SQL queries and data processing workflows for efficient analytics and reporting.
Cloud & Big Data Infrastructure:
Build and maintain data pipelines and storage solutions using Google Cloud Platform (GCP) and BigQuery.
Implement best practices for data governance, security, and compliance within cloud-based environments.
Optimize data ingestion, storage, and query performance for high-volume and high-velocity datasets.
Data Processing & Analytics:
Leverage Apache Iceberg for large-scale data lake management and transactional processing.
Utilize Starburst and Trino for distributed query processing and federated data access.
Develop strategies for data partitioning, indexing, and caching to enhance performance.
Collaboration & Integration:
Work closely with data scientists, analysts, and business stakeholders to understand data needs and requirements.
Collaborate with DevOps and platform engineering teams to implement CI/CD pipelines and infrastructure-as-code for data workflows.
Integrate data from multiple sources, ensuring data integrity and accuracy across systems.
Performance Optimization & Monitoring:
Monitor, troubleshoot, and optimize data pipelines for efficiency, scalability, and reliability.
Implement data quality frameworks and automated validation checks to ensure consistency.
Utilize monitoring tools and performance metrics to proactively identify bottlenecks and optimize queries.
Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
Experience:
4+ years of experience in data engineering, with expertise in SQL, BigQuery, and GCP.
Strong experience with Apache Iceberg, Starburst, and Trino for large-scale data processing.
Proven track record of designing and optimizing ETL/ELT pipelines and cloud-based data workflows.
Technical Skills:
Proficiency in SQL, including query optimization and performance tuning.
Experience working with BigQuery, Google Cloud Storage (GCS), and GCP data services.
Knowledge of data lakehouse architectures, data warehousing, and distributed query engines.
Hands-on experience with Apache Iceberg for managing large-scale transactional datasets.
Expertise in Starburst and Trino for federated queries and cross-platform data access.
Familiarity with Python, Java, or Scala for data pipeline development.
Experience with Terraform, Kubernetes, or Airflow for data pipeline automation and orchestration.
Understanding of machine learning data pipelines and real-time data processing.
Experience with data governance, security, and compliance best practices.
Exposure to Kafka, Pub/Sub, or other streaming data technologies.
Familiarity with CI/CD pipelines for data workflows and infrastructure-as-code.
We are an Equal Opportunity Employer, including disability/vets.
Bipi
CI&T
Boulevard
Nagarro
eXperience IT Solutions