Strong proficiency in Python, PySpark, and SQL for data processing and automation
Experience building end-to-end machine learning pipelines, including feature engineering, model evaluation, deployment, monitoring, and retraining
Hands-on experience with distributed computing (e.g., Apache Spark) and working with large-scale datasets, focusing on performance and data quality
Advanced degree in Computer Science, Statistics, Data Science, or a related field, with strong analytical and communication skills
Requirements:
Develop Python/PySpark/SQL scripts and automations to modernize legacy processes (data management, rules, and alert engine), reducing manual effort and increasing traceability
Develop end-to-end ML models: scoping, exploration, feature creation, model selection/evaluation/interpretation, production deployment, monitoring, and retraining
Work with large volumes of data in distributed environments (Spark), focusing on performance and data quality
Job description
Responsabilidades
Desenvolver scripts e automações em Python / PySpark / SQL para modernizar processos legados (gestão de bases, regras e motor de alertas), reduzindo esforço manual e aumentando rastreabilidade.
Desenvolver modelos de Machine Learning end-to-end: scoping, exploração, criação de features, seleção /avaliação / interpretação de modelos, deploy em produção, monitoramento e retreinamento.
Trabalhar com grandes volumes de dados em ambientes distribuÃdos (Spark), com foco em performance e qualidade de dados.