5+ years of data engineering experience with expert-level proficiency in Databricks (Unity Catalog, Delta Live Tables, Workflows, and SQL Warehouses) on AWS.
Production experience with DBT (models, tests, documentation) and strong Python for data engineering (PySpark, pandas).
CI/CD expertise and Git workflows (Jenkins, GitLab CI) with experience implementing data governance and SOC 2 security considerations.
Knowledge of Snowflake architecture and migration patterns; familiarity with orchestration tools (Airflow, Databricks Workflows) and multi-cloud environments (AWS/Azure/GCP).
Requirements:
Design and implement enterprise-scale data pipelines on Databricks on AWS, leveraging Bronze/Silver/Gold medallion architecture and Delta Lake tables for ACID transactions.
Build real-time and batch data processing workflows and develop modular DBT transformations with optimized SQL and performance tuning.
Develop Python applications for data ingestion, transformation, and orchestration; implement data quality checks, testing frameworks, and monitoring.
Design, configure, and maintain CI/CD pipelines, cluster setup and scheduling; optimize costs through autoscaling, right-sizing, and cost allocation practices.
Job description
Position Overview
At Hypersonix, we are building the leading Generative AI Platform for Commerce. Our Flagship GenAI Product – Competitor + Pricing AI – scrapes the product catalogue for our Enterprise customers and their competitors, and uses RAG to identify the nearest competitive match for each of our customer's product, facilitating intelligent pricing strategies that were previously impossible to achieve.
We are seeing strong growth in our Enterprise product, and are building out an end-to-end product on Databricks for Shopify Store owners, specializing in Agentic workflows that automate critical business processes (Pricing + Promotion Strategies, Inventory Management, and Competitive Intelligence). We are seeking an experienced Senior Data Engineer to design, build, and optimize scalable data pipelines and infrastructure. The ideal candidate will have deep expertise in Databricks and modern data engineering practices, with a strong focus on building robust, production-grade data solutions that drive business value while maintaining cost efficiency.
Key Responsibilities:
Data Platform Development
Design and implement enterprise-scale data pipelines using Databricks on AWS, leveraging both cluster-based and serverless compute paradigms
Architect and maintain medallion architecture (Bronze/Silver/Gold) data lakes and lakehouses
Develop and optimize Delta Lake tables for ACID transactions and efficient data management
Build and maintain real-time and batch data processing workflows
Engineering Excellence
Create reusable, modular data transformation logic using DBT to ensure data quality and consistency across the organization
Develop complex Python applications for data ingestion, transformation, and orchestration
Write optimized SQL queries and implement performance tuning strategies for large-scale datasets
Implement comprehensive data quality checks, testing frameworks, and monitoring solutions
Cost Management & Optimization
Monitor and analyze Databricks DBU (Databricks Unit) consumption and cloud infrastructure costs
Implement cost optimization strategies including cluster right-sizing, autoscaling configurations, and spot instance usage
Optimize job scheduling to leverage off-peak pricing and minimize idle cluster time
Establish cost allocation tags and chargeback models for different teams and projects
Conduct regular cost reviews and provide recommendations for efficiency improvements
DevOps & Infrastructure
Design and implement CI/CD pipelines for automated testing, deployment, and rollback of data artifacts
Configure and optimize Databricks clusters, job scheduling, and workspace management
Implement version control best practices using Git and collaborative development workflows
Collaboration & Leadership
Partner with data analysts, data scientists, and business stakeholders to understand requirements and deliver solutions
Mentor junior engineers and promote best practices in data engineering
Document technical designs, data lineage, and operational procedures
Participate in code reviews and contribute to team knowledge sharing
Required Qualifications:
Technical Skills
5+ years of experience in data engineering roles Expert-level proficiency in Databricks (Unity Catalog, Delta Live Tables, Workflows, SQL Warehouses)
Strong understanding of cluster configuration, optimization, and serverless SQL compute
Advanced SQL skills including query optimization, indexing strategies, and performance tuning
Production experience with DBT (models, tests, documentation, macros, packages)
Proficient in Python for data engineering (PySpark, pandas, data validation libraries)