Offer summary

Qualifications:

Over 2 years of experience as a Data Engineer., Proficiency in Databricks, Spark, or Scala for data processing., Experience with building scalable data pipelines in cloud environments., Knowledge of CI/CD practices and system monitoring..

Key responsibilities:

Design, build, and maintain ETL/ELT data pipelines in Databricks.

Collaborate with AI/ML teams to support data modeling and retrieval.

Develop serverless APIs to expose data to frontend applications.

Monitor data quality, lineage, and reliability using best practices.

Welcome to our LinkedIn page. We are a multinational technology consulting firm. On LinkedIn, our community is ready to connect, engage, and drive businesses' digital transformation. We have helped over 100 clients worldwide, including top US Fortune 500 companies, to become and remain leaders in their fields. We provide premium end-to-end tailored solutions and technology services across different disciplines, including Staff Augmentation, Software development, Smart Devices Engineering, Customer Experience, Internet of Things, Artificial Intelligence, Data Management, and Process Automation. We accelerate the digital transformation for businesses and corporations by developing scalable and forward-thinking projects to achieve operational excellence, improve customer engagement, and unlock new growth opportunities. Our proven track record of success spans across several industries, including Consumer Goods, Food, Healthcare, Hospitality, Insurance, Market Research, Retail, Telecom, and Utilities. Our teams of highly skilled engineers, creative thinkers, and industry-specific experts from 8 locations across the globe delivered more than 250 innovative projects. To date, we served 80M+ users and contributed to 8 US Patents. If you believe in the transformative power of technology: Join us in Reimagining Everything!

This is a remote position.

We are seeking a selfmotivated, intellectually curious Data Engineer to join our Data Science and Solutions team. This engineer will be responsible for building robust, scalable data pipelines using Databricks on AWS, integrating a wide range of data sources and structures into our AI and analytics platform. We have built our ‘minimum viable product’ and are now scaling up to support multitenancy in a highly secure environment.

The ideal candidate has more than 2 years’ experience in Databricks, and preferably building scalable, highquality data pipelines in a distributed, serverless cloud environment. They will be wellversed in CICD best practices, system monitoring and the Databricks control surface as you will be building infrastructureas code to deploy secure, isolated, and monitored environments and data pipelines for our end users and AI agents. Most of all, you will be an expert in collaboration in a distributed, remote environment, a team player, and always learning.

Data Pipeline Development

Design, build, and maintain ETLELT pipelines in Databricks to ingest, clean, and transform data from diverse product sources.

Construct gold layer tables in the Lakehouse architecture that serve both machine learning model training and realtime APIs.

Monitor data quality, lineage, and reliability using Databricks best practices.

AIDriven Data Access Enablement

Collaborate with AIML teams to ensure data is modeled and structured to support natural language prompts and semantic retrieval using 1^{st^{and 3^{rd^{party data sources, vector search and Unity Catalog metadata.}}}}

Help build data interfaces and agent tools to interact with structured data and AI agents to retrieve and analyze customer data with rolebased permissions.

API & Serverless Backend Integration

Work with backend engineers to design and implement serverless APIs (e.g., via AWS Lambda with TypeScript) that expose gold tables to frontend applications.

Ensure APIs are performant, scalable, and designed with data security and compliance in mind.

Utilize Databricks and other APIs to implement provisioning, deployment, security and monitoring frameworks for scaling up data pipelines, AI endpoints, and security models for multitenancy.

Requirements

3+ years of experience as a Data Engineer or related role in an agile, distributed team environment with a quantifiable impact on business or technology outcomes.

Proven expertise with Databricks, including job and workflow orchestration, change data capture and medallion architecture.

Proficiency in Spark or Scala for data wrangling and transformation on a wide variety of data sources and structures.

Practitioner of CICD best practices, testdriven development and familiarity with the MLOps AIOps lifecycles.

Proven ability to work in an agile environment with product managers, frontend engineers, and data scientists.

Preferred Skills

Familiarity with AWS Lambda (Node.jsTypeScript preferred) and API Gateway or equivalent serverless platforms, knowledge of API design principles and working with RESTful or GraphQL endpoints.

Exposure to Reactbased frontend architecture and the implications of backend data delivery on UIUX performance – including endtoend telemetry to measure performance and accuracy for the enduser experience.

Experience with AB testing, experiment and inference logging and analytics.