Data Engineer

Work set-up: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

Bachelor's degree in Data Engineering or Computer Science., At least 2 years of experience in data engineering or related fields, or 5 years of relevant experience., Deep expertise with Apache Spark (PySpark), Hadoop, and Apache Hive., Strong programming skills in Python and understanding of database concepts, including NoSQL and vector databases..

Key responsibilities:

  • Design, develop, and maintain scalable batch and streaming data pipelines.
  • Transform, process, and integrate data using Python.
  • Handle structured and unstructured data, including NoSQL and vector databases.
  • Optimize performance of big data workflows and tune Spark and Hive jobs.

Tooploox logo
Tooploox SME http://tooploox.com/
51 - 200 Employees
See all jobs

Job description

Hi there!

We are Tooploox, an AI software development company offering custom AI solutions and services. We help innovative companies and startups design and build digital products with generative AI, mobile, and web technologies.

Our team, consisting of nearly 200 experts including our R&D team of over 40 engineers, many with PhDs, has pioneered AI solutions across industries like healthcare, fashion, and e-commerce. We’ve published over 15 research papers in top conferences like NeurIPS and ICML.

We're on the lookout for a Data Engineer to take on a pivotal role in our team. You'll be at the heart of working with data, focusing on scalable batch and streaming data pipelines. If you're someone who loves to merge traditional software development with innovative AI technologies, this role is tailor-made for you.

Feel invited!

What you will do:
  • Design, develop, and maintain scalable batch and streaming data pipelines.
  • Work with Python to transform, process, and integrate data.
  • Handle a mix of structured and unstructured data, including work with NoSQL and vector databases.
  • Optimize performance across big data workflows, including tuning Hive and Spark jobs.
Experience and skills you need to join us:
  • BS/BA in Data Engineering/Computer Science + 2 years of experience or related field or 5 years of relevant experience.
  • Deep experience with Apache Spark (especially PySpark), Hadoop, and Apache Hive.
  • Strong programming skills in Python.
  • Solid understanding of database concepts, including experience with NoSQL databases (e.g., MongoDB, Redis) and ideally vector databases.
  • Hands-on experience with stream processing, preferably using Apache Flink.
  • Familiarity with distributed computing, data warehousing, and performance optimization techniques.
  • Strong problem-solving and communication skills.
  • Fluency in Polish and English.
It would be great if you also have:
  • Experience with LLMs, prompt engineering, or machine learning workflows (we use this in conjunction with vector DBs).
  • Proficiency in Java or Scala - useful for deeper Spark optimization or contributing to broader engineering projects.
  • Familiarity with Spring Boot for building and deploying data applications.
How we work:

At Tooploox, you have the flexibility to choose your working hours and location. While we value remote work, we also believe in building relationships and invite you to join us in our Warsaw and Wrocław offices. Enjoy a relaxed atmosphere and try some “home-made” pizza from our office pizza oven. We love having pets in the office, so feel free to bring yours along.

Join us and shape the future of AI while working the way you like!

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
PolishEnglish
Check out the description to know which languages are mandatory.

Other Skills

  • Communication
  • Problem Solving

Data Engineer Related jobs