Match score not available

Data Engineer

Remote:

Full Remote

Contract:

Full time

Experience:

Mid-level (2-5 years)

Work from:

Romania

Offer summary

Qualifications:

Hands-on experience with ML technologies, Building complex data pipelines (ETL), Work experience in cloud platforms (GCP), Understanding of code management tools (GIT/SVN), Experience with data engineering tools and practices.

Key responsabilities:

Implement and design scalable data pipelines
Develop and maintain data models using guidelines
Document business glossary in data catalog
Evaluate and support business data models
Guide projects teams in data model mapping

Sales Consulting

51 - 200 Employees

See more Sales Consulting offers

Job description

Responsibilities:

Implementing and designing scalable, optimized data pipelines for (pre-) processing ETL for machine learning models.
Develop and maintain conceptual and logical data models using data modeling guidelines from the clients;
Document and maintain business glossary in the enterprise data catalog solution;
Evaluate business data models and physical data models for variances and discrepancies;
Support project team in adopting business data models;
Guide project team to map physical data models to business glossary.

Knowledge/Experience:

For senior experience:

Hands-on technologies and frameworks used in ML, like sklearn, MLFlow, TensorFlow;

Building complex data pipelines e.g. ETL;

Experience working in cloud environment, data cloud platforms (e.g. GCP);

Understanding of code management repositories like GIT/SVN;

Familiar with software engineering practices like versioning, testing, documentation, code review;

Experience with Apache Airflow;

Experience in setting up both SQL as well as noSQL databases;

Experience with monitoring and observability (ELK stack);

Deployment and provisioning with automation tools e.g. Docker, Kubernetes, Openshift, CI/CD;

Knowledge of MLOps architecture and practices;

Relevant work experience in ML projects;

Knowledge of data manipulation and transformation, e.g. SQL;

Setting up/troubleshoot SQL and NoSQL databases.

For medium experience:

Design and Develop Data Pipelines: Create efficient and scalable data pipelines using GCP services such as Dataflow (Apache Beam), Dataproc (Apache Spark), and Pub/Sub;