Realtime 3D avatars, streamed to any browser
A complete pipeline — speech, language, synthesis, animation, render — that turns voice into a photoreal Gaussian-splat avatar in under 1.2 seconds. Server-rendered, open stack, single GPU, streamed to any browser over WebRTC.
Measurable outcomes that transformed Laika Dynamics (Hawke's Bay)'s business.
Photoreal avatars exist; real-time photoreal avatars exist; voice-to-avatar pipelines exist. What didn't exist was a single open-source stack that turned voice into a streamed Gaussian-splat avatar with a sub-1.2-second end-to-end latency on a single GPU — at a price point that didn't require a hyperscaler. Every other path involves stitching together closed APIs at $0.10+ per minute with latency that breaks the illusion. We wanted the whole pipeline open, single-GPU, and cheap enough per-trained-avatar that real businesses could deploy it.
We built the full pipeline end-to-end: Whisper for speech-to-text, an LLM for the language layer, F5-TTS for synthesis, NVIDIA Audio2Face for facial animation driven by phonemes, and Gaussian Splatting Avatar Compositor (GSAC) for the photoreal render. The whole thing runs server-side on a single GPU at 30fps 720p H.264, streamed over WebRTC to any modern browser. SvelteKit frontend; Bun + Hono gateway; everything containerised. End-to-end latency: 1.18 seconds. Per-avatar training cost: $3–6.
How we approached this project to deliver exceptional results.
Designed the pipeline as a streaming graph, not a request/response chain — every stage starts producing output before the previous one finishes
Picked open models at every layer (Whisper, F5-TTS, Audio2Face, GSAC) so the whole stack can be deployed on customer-owned infrastructure
Optimised for single-GPU deployment — 13GB VRAM footprint, fits on a consumer-grade RTX 4090 or a single H100 slice
Streamed video over WebRTC at 30fps 720p H.264 — chosen for the latency budget, not the bitrate
Built the SvelteKit frontend with a clean masthead so the technical specifics (latency, footprint, frame rate, cost) are visible above the fold
Published the spec — `avatar-mvp-spec.md` is the single source of truth and every claim is reproducible
Key screens and features from the final product.
Editorial research-preview marketing site — every spec above the fold
Mobile experience preserves the data-dense layout
Next Project
SaaS
The kanban board that out-prices Trello and out-ships Jira
View Case StudyLet's create something amazing together. Tell us about your project and we'll make it happen.