Urdu AI Avatar Kiosk

Overview

A prototype public-service kiosk: walk up, ask a question in Urdu, and a photoreal on-screen person answers — with a cloned voice and lip-synced video.

What it does

Photoreal talking avatar — lip-sync generated with Wav2Lip over studio footage of a real presenter.
Cloned natural voice — neural TTS shaped through an RVC voice model so the avatar sounds like the actual person.
Local speech pipeline — speech recognition and synthesis run on local GPU hardware, with an LLM handling the intent layer.
Intent-driven answers — a curated set of public-service intents, written for the local audience rather than translated.

Status

Working prototype, demoed to stakeholders. Originally scoped for Pashto, pivoted to Urdu-primary after stakeholder feedback — the architecture supports both.

My role

Everything: the speech pipeline, avatar rendering, intent design, and the GPU server it runs on. A fun reminder that the hard part of "AI kiosks" isn't the model — it's latency, audio handling, and making a pipeline of five models fail gracefully.

Urdu AI Avatar Kiosk

Overview

What it does

Status

My role

Have a project that needs to actually ship?