Senior Data Engineer

Work set-up: 
Full Remote
Contract: 
Work from: 

CME logo
CME http://www.gotocme.com
201 - 500 Employees
See all jobs

Job description

This is a remote position.

We are seeking a self-motivated, intellectually curious Data Engineer to join our Data Science and Solutions team. This engineer will be responsible for building robust, scalable data pipelines using Databricks on AWS, integrating a wide range of data sources and structures into our AI and analytics platform.  We have built our ‘minimum viable product’ and are now scaling up to support multi-tenancy in a highly secure environment. 

The ideal candidate has more than 2 years’ experience in Databricks, and preferably building scalable, high-quality data pipelines in a distributed, serverless cloud environment.   They will be well-versed in CI/CD best practices, system monitoring and the Databricks control surface as you will be building infrastructure-as-code to deploy secure, isolated, and monitored environments and data pipelines for our end users and AI agents.  Most of all, you will be an expert in collaboration in a distributed, remote environment, a team player, and always learning.


Data Pipeline Development

  • Design, build, and maintain ETL/ELT pipelines in Databricks to ingest, clean, and transform data from diverse product sources.
  • Construct gold layer tables in the Lakehouse architecture that serve both machine learning model training and real-time APIs.
  • Monitor data quality, lineage, and reliability using Databricks best practices.

AI-Driven Data Access Enablement

  • Collaborate with AI/ML teams to ensure data is modeled and structured to support natural language prompts and semantic retrieval using 1st and 3rd party data sources, vector search and Unity Catalog metadata.
  • Help build data interfaces and agent tools to interact with structured data and AI agents to retrieve and analyze customer data with role-based permissions.

API & Serverless Backend Integration

  • Work with backend engineers to design and implement serverless APIs (e.g., via AWS Lambda with TypeScript) that expose gold tables to frontend applications.
  • Ensure APIs are performant, scalable, and designed with data security and compliance in mind.
  • Utilize Databricks and other APIs to implement provisioning, deployment, security and monitoring frameworks for scaling up data pipelines, AI endpoints, and security models for multi-tenancy.


Requirements

  • 3+ years of experience as a Data Engineer or related role in an agile, distributed team environment with a quantifiable impact on business or technology outcomes.
  • Proven expertise with Databricks, including job and workflow orchestration, change data capture and medallion architecture.
  • Proficiency in Spark or Scala for data wrangling and transformation on a wide variety of data sources and structures. 
  • Practitioner of CI/CD best practices, test-driven development and familiarity with the MLOps / AIOps lifecycles. 
  • Proven ability to work in an agile environment with product managers, front-end engineers, and data scientists.

Preferred Skills

  • Familiarity with AWS Lambda (Node.js/TypeScript preferred) and API Gateway or equivalent serverless platforms, knowledge of API design principles and working with RESTful or GraphQL endpoints.
  • Exposure to React-based frontend architecture and the implications of backend data delivery on UI/UX performance – including end-to-end telemetry to measure performance and accuracy for the end-user experience.
  • Experience with A/B testing, experiment and inference logging and analytics.


Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Teamwork
  • Collaboration
  • Adaptability
  • Problem Solving

Data Engineer Related jobs