Strong experience with Databricks (Workflows, MLflow, Delta Lake), Apache Spark (batch and streaming), and advanced Python (production-quality code).
Hands-on experience with streaming and real-time data systems.
Proven experience designing and implementing CI/CD pipelines.
Strong understanding of the ML lifecycle (training, deployment, monitoring, and retraining) and building scalable, distributed data and ML pipelines; experience with Snowflake, Kubernetes, Docker, and Terraform (IaC).
Requirements:
Design, build, and maintain production-grade ML pipelines on Databricks, including deployment, monitoring, and full lifecycle management.
Build and maintain CI/CD pipelines for ML workflows, including model versioning and experiment tracking to ensure reproducibility.
Develop and manage real-time and streaming data pipelines; contribute to low-latency inference and scalable model serving.
Monitor model performance and data drift, implement automated retraining strategies, enforce ML governance and best practices, and optimize performance, scalability, and cost.
Job description
Requirements:
Strong experience with Databricks (Workflows, MLflow, Delta Lake), Apache Spark (batch and streaming), and advanced Python (production-quality code).
Hands-on experience with streaming and real-time data systems.
Proven experience designing and implementing CI/CD pipelines.
Strong understanding of the ML lifecycle (training, deployment, monitoring, and retraining) and building scalable, distributed data and ML pipelines.
Experience with Snowflake, Kubernetes, and Docker.
Experience with Terraform or other Infrastructure as Code (IaC) tools.
Experience with feature stores (e.g., Snowflake Feature Store, Databricks Feature Store) and event-driven architectures (e.g., Kafka).
Experience with model serving frameworks, low-latency API development, and LLM deployment/serving.
Experience with monitoring and observability tools (e.g., ELK stack or similar).
Familiarity with A/B testing and experimentation frameworks.
Strong knowledge of RBAC, security, and governance in data/ML platforms.
Experience with cloud environments (Azure preferred).
Responsibilities:
Design, build, and maintain production-grade ML pipelines on Databricks.
Operationalize ML models, including deployment, monitoring, and full lifecycle management.
Build and maintain CI/CD pipelines for ML workflows.
Develop and manage real-time and streaming data pipelines.
Collaborate closely with Data Scientists to efficiently productionize models.
Implement model versioning, experiment tracking, and ensure reproducibility.
Define and enforce ML best practices, governance, and quality standards.
Monitor model performance and data drift, and implement automated retraining strategies.
Optimize performance, scalability, and cost of distributed workloads.
Contribute to platform design for low-latency inference and scalable model serving.