Real-Time Multimodal AI Engineer (Digital Human Systems)
We are engaging a Multimodal AI Engineer for building a real-time AI digital human that combines LLMs, voice cloning (text-to-speech) and facial animation / talking avatar, which is led by my client – a global leading strategy led technology build consultancy.
The goal is not a demo, it's a low-latency, production-grade system where responses are generated live, speech sounds natural and personalised, lip movement is synchronised with audio and the entire interaction feels coherent and human,
What You'll Work On
You'll design and implement a real-time multimodal pipeline, connecting multiple AI systems into a single, synchronised experience, including:
What We're Actually Looking For
This is not a generic AI role, you should be comfortable working at the intersection of:
Must-Have Experience
Nice-to-Have
If you are experienced in making multiple AI models behave like a single, real-time human interaction system, we'd like to hear from you. Please contact zxie@hiredgesolutions.com for next step and details.

Ting Internet

NVIDIA

Silver.dev

Samsara

Outsystems

Hiredge Solutions

Hiredge Solutions