AI Platform

Laika  Avatar

Realtime 3D avatars, streamed to any browser

A complete pipeline — speech, language, synthesis, animation, render — that turns voice into a photoreal Gaussian-splat avatar in under 1.2 seconds. Server-rendered, open stack, single GPU, streamed to any browser over WebRTC.

Client
Laika Dynamics (Hawke's Bay)
Year
2026
Timeline
Research preview v0.1
Website
Visit Site
BunHonoSvelteKitWhisperF5-TTSNVIDIA Audio2FaceGaussian SplattingWebRTCSingle-GPU
Laika Avatar
Results

The impact we delivered

Measurable outcomes that transformed Laika Dynamics (Hawke's Bay)'s business.

1.18s
End-to-End Latency
13 GB
Single-GPU Footprint
30 fps
720p H.264 / WebRTC
$3–6
Per Avatar Trained
The Challenge

Understanding the problem

Photoreal avatars exist; real-time photoreal avatars exist; voice-to-avatar pipelines exist. What didn't exist was a single open-source stack that turned voice into a streamed Gaussian-splat avatar with a sub-1.2-second end-to-end latency on a single GPU — at a price point that didn't require a hyperscaler. Every other path involves stitching together closed APIs at $0.10+ per minute with latency that breaks the illusion. We wanted the whole pipeline open, single-GPU, and cheap enough per-trained-avatar that real businesses could deploy it.

Before
Our Solution

How we solved it

We built the full pipeline end-to-end: Whisper for speech-to-text, an LLM for the language layer, F5-TTS for synthesis, NVIDIA Audio2Face for facial animation driven by phonemes, and Gaussian Splatting Avatar Compositor (GSAC) for the photoreal render. The whole thing runs server-side on a single GPU at 30fps 720p H.264, streamed over WebRTC to any modern browser. SvelteKit frontend; Bun + Hono gateway; everything containerised. End-to-end latency: 1.18 seconds. Per-avatar training cost: $3–6.

After
Our Approach

Step by step

How we approached this project to deliver exceptional results.

1

Designed the pipeline as a streaming graph, not a request/response chain — every stage starts producing output before the previous one finishes

2

Picked open models at every layer (Whisper, F5-TTS, Audio2Face, GSAC) so the whole stack can be deployed on customer-owned infrastructure

3

Optimised for single-GPU deployment — 13GB VRAM footprint, fits on a consumer-grade RTX 4090 or a single H100 slice

4

Streamed video over WebRTC at 30fps 720p H.264 — chosen for the latency budget, not the bitrate

5

Built the SvelteKit frontend with a clean masthead so the technical specifics (latency, footprint, frame rate, cost) are visible above the fold

6

Published the spec — `avatar-mvp-spec.md` is the single source of truth and every claim is reproducible

Gallery

Project highlights

Key screens and features from the final product.

Editorial research-preview marketing site — every spec above the fold

Editorial research-preview marketing site — every spec above the fold

Mobile experience preserves the data-dense layout

Mobile experience preserves the data-dense layout

Next Project

Gryphin

SaaS

The kanban board that out-prices Trello and out-ships Jira

View Case Study
Currently accepting new projects

Ready to start your project?

Let's create something amazing together. Tell us about your project and we'll make it happen.

Free consultation
No lock-in contracts
NZ-based team