Key Facts

Remote From:

Ukraine

Category: Physicist

Full time

Mid-level (2-5 years)

40 - 40K yearly

English, Ukrainian

Hard Skills

Prompt Engineering Large Language Modeling Quantization Hugging Face (NLP Framework) Pandas (Python Package) Data Synthesis Python (Programming Language) Text Manipulation Hugging Face (NLP Framework) Text Processing +18 more

Other Skills

•
Mentorship
•
Problem Solving

Roles & Responsibilities

3+ years of commercial experience in Machine Learning, with a specific focus on the NLP or LLM domain
Strong knowledge of Python3, NumPy, pandas, PyTorch and Hugging Face (Transformers, PEFT, Accelerate)
Proficiency in PEFT/LoRA and Reinforcement Learning techniques
Experience deploying LLMs to production environments using Triton Inference Server, vLLM, TGI, or ONNX

Requirements:

Design and implement advanced methods in prompt orchestration, fine-tuning (SFT/RLHF/DPO), and autonomous agentic workflows
Curate high-quality training data from large-scale text and multi-modal sources
Tune hyperparameters and improve inference speed/accuracy through PEFT (LoRA/QLoRA) and advanced prompt engineering
Collaborate with Product and Data Engineering teams to seamlessly integrate LLM features into the broader ecosystem

SQUAD

About SQUAD

We are a research and delivery team working on impactful products. We are gathering top notch minds in domains such as Research, Embedded, Hardware, Mobile, QA, Infrastructure, Delivery, Product and Design, and Analytics to collaborate on the latest smart home security/IoT. Our modern labs feature test devices and leading optical equipment, creating a unique opportunity to work and innovate on real R&D in Ukraine. We are a growing team that operates with a startup spirit to generate solutions for products and raise the bar with every detail. We pull together strong performers and foster an environment for creativity and discovery. We believe that the synergy of outstanding people and this environment can tackle any global challenge. Forget good. Do great in SQUAD.

Founded: 2018

Company size: 1001 - 5000

Website LinkedIn See all jobs →

Job description

Team Summary

Our distributed team is looking for an experienced Applied Scientist with a strong background in Large Language models to develop high-performance Generative AI features across Cloud and Edge environments.

Job Summary

In this role you will drive the transition from research to production by optimizing local inference through model compression and quantization for private, real-time Edge performance, while also engineering scalable RAG architectures and multi-agent systems for Cloud deployment. Your daily responsibilities encompass the full research lifecycle, including formulating hypotheses, generating synthetic datasets, fine-tuning LLMs, and validating safety and alignment, ultimately culminating in technical reports.

Responsibilities and Duties

Design and implement advanced methods in prompt orchestration, fine-tuning (SFT/RLHF/DPO), and autonomous agentic workflows
Curate high-quality training data from large-scale text and multi-modal sources
Identify patterns in model hallucinations and visualize evaluation metrics for clear interpretation
Tune hyperparameters and improve inference speed/accuracy through PEFT (LoRA/QLoRA) and advanced prompt engineering
Collaborate with Product and Data Engineering teams to seamlessly integrate LLM features into the broader ecosystem
Track and report progress using industry-standard benchmarks (MMLU, HumanEval, etc.) and custom internal KPIs
Stay at the forefront of the field (e.g., State Space Models, new Transformer variants) and evaluate cutting-edge techniques for production readiness
Engage in continuous technical growth and mentor junior colleagues to elevate the team's expertise

Qualifications and Skills

3+ years of commercial experience in Machine Learning, with a specific focus on the NLP or LLM domain
Strong knowledge of Python3, NumPy, pandas, and modern text-processing libraries, PyTorch and Hugging Face (Transformers, PEFT, Accelerate)
Proficiency in PEFT/LoRA and Reinforcement Learning techniques
Deep understanding of attention mechanisms, tokenization, context window management, and embedding spaces
Practical experience in at least one of the following: Retrieval-Augmented Generation (RAG), Fine-tuning, or Agentic frameworks
Proven ability to manage and analyze massive datasets (>100GB) across text, image, and audio formats
Hands-on experience crafting high-fidelity datasets and building robust data pipelines
Expertise in prompt engineering, agentic framework design, and LLM pipeline orchestration
Experience deploying LLMs to production environments using Triton Inference Server, vLLM, TGI, or ONNX
Good written and spoken English

Nice to have

Practical experience with Pinecone, Weaviate, Milvus, or Chroma
Advanced quantization (GGUF, AWQ, EXL2), pruning, and knowledge distillation
Experience with LangChain, LlamaIndex, or AutoGen
Basic understanding of web/client-server architecture and streaming API responses (Asyncio, aiohttp)
Familiarity with RAGAS, DeepEval, or G-Eval
Experience using Docker, Kubernetes, and cloud GPU orchestration (e.g., Run:ai, Lambda Labs)
Knowledge of C++, Triton, or CUDA for custom kernel development

We offer multiple benefits that include

The environment of equal opportunities, transparent and value-based corporate culture and an individual approach to each team member
Competitive compensation and perks
Gig-contract
21 paid vacation days per year, paid public holidays according to the Ukrainian legislation
Development opportunities like corporate courses, knowledge hubs, and free English classes as well as educational leaves
Medical insurance is provided from day one. Sick leaves and medical leaves are available
Remote working mode is available within Ukraine only
Free meals, fruits, and snacks when working in the office.