Proven experience in processing unstructured data from various formats such as PDF, DOCX, and XLSX., Strong proficiency in PySpark for distributed data processing., Experience with Gen AI/LLM tools like LangChain and OpenAI., Ability to interface with clients and deliver tailored solutions..
Key responsibilities:
Design and maintain ingestion pipelines for various document formats.
Leverage PySpark for processing large-scale datasets in distributed environments.
Utilize Gen AI tools for document chunking, embedding, and indexing.
Manage Databricks workflows and serve as a client-facing engineer to ensure pipeline reliability.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Lean Tech is a rapidly expanding organization situated in Medellín, Colombia. We pride ourselves on possessing one of the most influential networks within software development and IT services for the entertainment, financial, and logistics sectors. Our corporate projections offer many opportunities for professionals to elevate their careers and experience substantial growth. Joining our team means engaging with expansive engineering teams across Latin America and the United States, contributing to cutting-edge developments in multiple industries.
As a Gen AI Engineer / Data Engineer, you will play a vital role in building and managing data ingestion pipelines and Gen AI infrastructure that powers innovative, intelligent systems.
Position Title: Senior AI Data Engineer
Location: Remote - Colombia
What you will be doing:
The Gen AI Engineer Data Engineer, you will be responsible for designing and owning robust ingestion pipelines and supporting large-scale Gen AI workflows. You will be deeply involved in the preparation, parsing, and processing of unstructured data, ensuring that information is ready for downstream use in LLM-powered applications. Your responsibilities will include:
LData Ingestion Ownership: Own and maintain all ingestion pipelines from various document formats including PDF, PowerPoint (PPTX), Word (DOCX), Excel (XLSX), TXT, and Markdown (MD).
Distributed Processing: Leverage PySpark to efficiently process large-scale datasets in distributed environments.
LLM Frameworks: Utilize Gen AI tools such as LangChain, LlamaParser, docling, OpenAI, and Hugging Face for tasks like document chunking, embedding, and indexing.
Vector Database Management: Work with vector databases such as Databricks Vector Search, Pinecone, and Azure AI Search to store and retrieve high-dimensional data for LLM use.
Workflow Management: Design and manage Databricks workflows and scheduled jobs for automated pipeline execution.
Client Interaction: Serve as a client-facing engineer, ensuring ingestion pipelines meet business needs and perform reliably in production environments.
Requirements & Qualifications
To excel in this role, you should possess:
Proven experience in processing unstructured data from formats like PDF, DOCX, PPTX, XLSX, TXT, and MD
Strong proficiency in PySpark for distributed data processing
Experience with Gen AI/LLM tools including LangChain, LlamaParser, docking, OpenAI, and Hugging Face
Solid understanding of chunking, embedding, and indexing techniques
Experience with vector databases such as Databricks Vector Search, Pinecone, and Azure AI Search
Hands-on experience managing Databricks workflows and scheduled jobs
Ability to interface with clients, understand requirements, and deliver tailored solutions
Desired Skills:
Experience in deploying Gen AI pipelines in production environments
Familiarity with optimization of vector search for LLM applications
Understanding of retrieval-augmented generation (RAG) architectures
Knowledge of data governance and access control within enterprise data platforms
Why you will love Lean Tech:
Join a powerful tech workforce and help us change the world through technology
Professional development opportunities with international customers
Collaborative work environment
Career path and mentorship programs that will lead to new levels.
Join Lean Tech and contribute to shaping the data landscape within a dynamic and growing organization. Your skills will be honed, and your contributions will be vital to our continued success. Lean Tech is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Required profile
Experience
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.