← All work

Urdu AI Avatar Kiosk

Prototype walk-up kiosk with a photoreal talking avatar: cloned voice, lip-synced video, and a local speech pipeline for public-service queries in Urdu.

PythonWav2LipRVCWhisper

Overview

A prototype public-service kiosk: walk up, ask a question in Urdu, and a photoreal on-screen person answers — with a cloned voice and lip-synced video.

What it does

  • Photoreal talking avatar — lip-sync generated with Wav2Lip over studio footage of a real presenter.
  • Cloned natural voice — neural TTS shaped through an RVC voice model so the avatar sounds like the actual person.
  • Local speech pipeline — speech recognition and synthesis run on local GPU hardware, with an LLM handling the intent layer.
  • Intent-driven answers — a curated set of public-service intents, written for the local audience rather than translated.

Status

Working prototype, demoed to stakeholders. Originally scoped for Pashto, pivoted to Urdu-primary after stakeholder feedback — the architecture supports both.

My role

Everything: the speech pipeline, avatar rendering, intent design, and the GPU server it runs on. A fun reminder that the hard part of "AI kiosks" isn't the model — it's latency, audio handling, and making a pipeline of five models fail gracefully.

Next step

Have a project that needs to actually ship?

Tell me what you're building. I'll reply within one working day with an honest take on scope, timeline, and whether I'm the right person for it.