ML Engineer (LLM)

Work set-up: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

At least 5 years of experience in Python development, with a focus on production-grade applications., Hands-on experience in building real-time speech and audio products, including ASR, TTS, or streaming audio systems., Familiarity with large language model tooling, such as fine-tuning, prompt design, and frameworks like LangChain or LlamaIndex., Knowledge of systems and MLOps practices, including containerization, GPU scheduling, and cloud infrastructure..

Key responsibilities:

  • Design and implement real-time speech pipelines meeting latency and quality standards.
  • Evaluate, fine-tune, and integrate state-of-the-art ASR, LLM, and TTS models into production.
  • Optimize inference performance through techniques like quantization and hardware-aware compilation.
  • Develop and maintain scalable APIs and microservices, ensuring observability and autoscaling.

Synthflow AI logo
Synthflow AI https://synthflow.ai
11 - 50 Employees
See all jobs

Job description

Synthflow AI is a nocode platform for deploying voice AI agents that automate phone calls across contact center operations and business process outsourcing (BPO) at scale. We help midmarket and enterprise companies manage routine calls to save teams time and resources.

Our agents have already delivered measurable impact:

  • Over 5 million hours of contact center operations saved

  • 35% more calls answered compared to nonAI operators

  • 45 million calls handled with a 99.9% uptime

    • Backed by Accel, Atlantic Labs, and Singular and trusted by over 1,000 customers, our growth leads an industry shift toward sophisticated and accessible conversational AI.

      The Role

      We are looking for a hands‑on ML Engineer who lives at the intersection of TTS, STT and large language models. You will design and ship new low‑latency voice capabilities, working closely with product, research and infrastructure teams to push the boundaries of natural, multilingual conversation.

      What You’ll Do
      • Architect & implement real‑time speech pipelines (ASR → LLM → TTS) that meet stringent latency and quality targets.

      • Evaluate and fine‑tune state‑of‑the‑art ASR, LLM and TTS models—both commercial and open‑source—and integrate the best performers into production.

      • Optimise inference through quantisation, distillation, hardware‑aware graph compilation and reinforcement‑learning‑based tuning.

      • Expose scalable APIs & micro‑services with PythonFastAPI, gRPC or WebSocket streaming, backed by robust observability and autoscaling.

      • Own deployment across cloud and on‑prem environments, collaborating on containerisation (Docker), orchestration (Kubernetes) and CICD workflows.

      • Stay ahead of the curve by tracking research, running experiments and sharing learnings with the broader team.

        • What we’re looking for

          • Python Engineering: 5+ years writing production‑grade, well‑tested Python; deep familiarity with async, typing and performance profiling

          • Speech Audio: Hands‑on experience building real‑time ASR, TTS, voice chat or streaming audio products

          • LLM Tooling: Fine‑tuning, prompt design, evaluation, retrieval‑augmented generation; familiarity with frameworks such as OpenpipeART, LangChain, LlamaIndex or similar

          • Systems & MLOps: Containerisation, GPU scheduling, observability, DevOps on GCP or AWS; infrastructure‑as‑code principles

          • API Design: Building and maintaining high‑throughput RESTgRPCFastAPI services; securing and monitoring them in production

            • Bonus Points
              • Model compression expertise (quantisation, pruning, ONNXTensorRT)

              • Knowledge of audio and acoustics

              • Experience with reinforcement‑learning‑from‑human‑feedback (RLHF) or direct preference optimisation

              • Contributions to open‑source MLspeech projects (share your GitHub!)

              • Familiarity with GPU inference servers (Triton, KServe) or distributed compute frameworks (Ray)

                • Founded in Berlin in 2023 by serial entrepreneurs Albert Astabatsyan, Hakob Astabatsyan, and Sassun MirzakhanSaky, Synthflow AI democratizes access to advanced voice AI with a nocode platform that lets enterprises easily create, deploy and scale naturalsounding, costeffective voice agents tailored to their business needs.

Required profile

Experience

Level of experience: Senior (5-10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Problem Solving

ML Ops Engineer Related jobs