Key Facts

Remote From:

Tennessee (USA)

Category: Data Engineer

Full time

Senior (5-10 years)

English

Hard Skills

Azure Databricks Health Data Management Data Lakes Apache Spark Azure Data Factory Hl7 V2 Mload MLflow

Other Skills

•
Communication
•
Teamwork
•
Problem Solving

Roles & Responsibilities

5+ years of experience in modern data engineering roles
Expert-level proficiency in PySpark and Spark SQL
Strong experience working with healthcare data formats and standards
Deep understanding of distributed systems, data partitioning strategies, concurrency, and cluster resource tuning

Requirements:

Architect and implement scalable data processing pipelines using Databricks Runtime and Delta Lake
Develop and operate data pipelines leveraging Azure Data Lake Storage, Azure Data Factory or Synapse Pipelines
Design and implement secure PHI pipelines compliant with HIPAA and other regulations
Build and maintain high-volume batch ETL pipelines and low-latency streaming pipelines

Surgery Partners, Inc

About Surgery Partners, Inc

Surgery Partners is a leading operator of surgical facilities and ancillary services with more than 180 locations nationwide. We provide exceptional integrated healthcare experiences between our providers and patients. Our diverse company operates multiple types of healthcare services dedicated to improving the quality of care in a convenient and cost-effective manner. Our integrated approach to advancing markets allows for flexibility to provide care on an individualized, local market basis. Whether entering into a new market with surgical facilities, ancillary services or joint ventures with health systems, or furthering an existing market’s growth potential by focusing on base business, in-market development and new service lines, our experience has shown us that no two markets are alike. We see value in individuality. At Surgery Partners, our mission is to enhance patient quality of life through partnership. Surgery Partners is an organization deeply committed to providing quality, compassionate and personalized care, to meet the needs of our diverse patients, employees and physician partners in the communities in which we serve. Our colleagues are critical in achieving that mission. As it truly brings out the best in all of us, Surgery Partners is committed to diversity and inclusion. Our Surgery Partners team is comprised of more than 7,000 employees and 4,600 affiliated physicians, serving more than 600,000 patients annually. Want to work with us? Check out our website for current employment opportunities.

Company type: XLarge

Founded: 2018

Company size: 10001

Website LinkedIn See all jobs →

Job description

Data Engineer - Hybrid / Remote Opportunity

Hybrid for candidates in Nashville and surrounding areas.
Remote option available for candidates outside of surrounding areas.

This role requires a highly technical Data Engineer with expert-level proficiency in Azure Databricks, distributed data pipelines, and large-scale healthcare data processing. This role focuses on designing and implementing high-throughput ingestion pipelines, transactional lakehouse layers, and secure PHI data flows using Azure-native services and Databricks runtime optimizations.

You will build and operate production-grade data pipelines that meet rigorous requirements for security, lineage, compliance (HIPAA), observability, and operational SLAs, supporting analytics, AI, and clinical insights across the organization.

Core Responsibilities

Platform & Architecture

Architect and implement scalable data processing pipelines using:

Databricks Runtime (Apache Spark, Spark SQL, MLflow, Delta Lake)
Delta Lake ACID transactions, Z-Ordering, OPTIMIZE, and Change Data Feed (CDF)
Unity Catalog for governance, lineage, RBAC, and audit controls

Design and enforce a medallion (Bronze/Silver/Gold) architecture with schema evolution, Delta Live Tables (DLT), and robust error-handling patterns
Build high-performance ingestion frameworks for:

FHIR and HL7 message streams
X12 837/835 healthcare claims data
EHR/EMR source systems
Batch, real-time, and event-driven data sources

Azure Cloud Engineering

Develop and operate data pipelines leveraging:

Azure Data Lake Storage Gen2 (hierarchical namespace, ACLs, POSIX permissions)
Azure Data Factory or Synapse Pipelines (parameterization, dynamic pipelines, triggers)
Azure Event Hubs and/or Service Bus for streaming ingestion
Azure SQL Database and Azure Synapse (Dedicated and Serverless pools)
Azure Functions for lightweight orchestration and automation
Azure Monitor, Log Analytics, and Application Insights for observability

Implement enterprise-grade security including:

VNet integration and private endpoints
Secrets and key management using Azure Key Vault
Managed identities and least-privilege access controls

Distributed Data Engineering

Develop optimized PySpark and/or Scala pipelines using advanced Spark techniques:

Catalyst optimizer tuning
Cluster sizing and autoscaling strategies
Adaptive Query Execution (AQE)
Efficient join strategies (broadcast vs. shuffle)

Build and maintain:

High-volume batch ETL pipelines (100M+ records)
Low-latency streaming pipelines using Spark Structured Streaming

Implement CI/CD for Databricks environments, including:

Git-integrated DEV/QA/PROD workspaces
Automated job and workflow deployments
Unit testing using pytest and Databricks testing frameworks

Healthcare Data & Compliance

Design and implement secure PHI pipelines compliant with:

HIPAA Privacy and Security Rules
SOC 2 and HITRUST-aligned controls

Build pipelines supporting healthcare data standards including:

FHIR R4 resources (Patient, Encounter, Observation, Claim, etc.)
HL7 v2.x messages (ADT, ORU, ORM)
X12 EDI transactions (837, 835, 270/271)

Ensure end-to-end lineage tracking, auditability, and data retention across all lakehouse layers

Required Qualifications

5+ years of experience in modern data engineering roles
Expert-level proficiency in:

PySpark and Spark SQL
Databricks (Jobs, Workflows, Repos, Delta Live Tables)
Delta Lake architecture and transactional design patterns
Azure Data Factory or Azure Synapse Pipelines
Cloud-native data security (RBAC, ABAC, privilege boundary enforcement)

Strong experience working with healthcare data formats and standards:

FHIR (JSON)
HL7 v2/v3
X12 EDI claims data

Deep understanding of distributed systems, data partitioning strategies, concurrency, and cluster resource tuning

Preferred Qualifications

Experience implementing Unity Catalog at enterprise scale
Familiarity with MLOps workflows and Databricks MLflow
Experience using dbt with Databricks SQL
Relevant certifications, including:
Databricks Data Engineer Professional
Microsoft Azure DP-203
HL7 or FHIR certification (nice to have)

Benefits:

Comprehensive health, dental, and vision insurance
Health Savings Account with an employer contribution
Life Insurance
PTO
401(k) retirement plan with a company match
And more!

ENVIRONMENTAL/WORKING CONDITIONS: Normal busy office environment with much telephone work. Possible long hours as needed. The description is intended to provide only basic guidelines for meeting job requirements. Responsibilities, knowledge, skills, abilities and working conditions may change as needs evolve.

*If you are viewing this role on a job board such as Indeed.com or LinkedIn, please know that pay bands are auto assigned and may not reflect the true pay band within the organization.

*No Recruiters Please

Ready to apply?

APPLY

Share ·