AI Platform

Laika Avatar

Realtime 3D avatars, streamed to any browser

A complete pipeline — speech, language, synthesis, animation, render — that turns voice into a photoreal Gaussian-splat avatar in under 1.2 seconds. Server-rendered, open stack, single GPU, streamed to any browser over WebRTC.

Client

Laika Dynamics (Hawke's Bay)

Year

2026

Timeline

Research preview v0.1

Website

Visit Site

BunHonoSvelteKitWhisperF5-TTSNVIDIA Audio2FaceGaussian SplattingWebRTCSingle-GPU

1.18s

End-to-End Latency

6 of 9

Results

The impact we delivered

Measurable outcomes that transformed Laika Dynamics (Hawke's Bay)'s business.

1.18s

End-to-End Latency

13 GB

Single-GPU Footprint

30 fps

720p H.264 / WebRTC

$3–6

Per Avatar Trained

The Challenge

Understanding the problem

Photoreal avatars exist; real-time photoreal avatars exist; voice-to-avatar pipelines exist. What didn't exist was a single open-source stack that turned voice into a streamed Gaussian-splat avatar with a sub-1.2-second end-to-end latency on a single GPU — at a price point that didn't require a hyperscaler. Every other path involves stitching together closed APIs at $0.10+ per minute with latency that breaks the illusion. We wanted the whole pipeline open, single-GPU, and cheap enough per-trained-avatar that real businesses could deploy it.

Before

Our Solution

How we solved it

We built the full pipeline end-to-end: Whisper for speech-to-text, an LLM for the language layer, F5-TTS for synthesis, NVIDIA Audio2Face for facial animation driven by phonemes, and Gaussian Splatting Avatar Compositor (GSAC) for the photoreal render. The whole thing runs server-side on a single GPU at 30fps 720p H.264, streamed over WebRTC to any modern browser. SvelteKit frontend; Bun + Hono gateway; everything containerised. End-to-end latency: 1.18 seconds. Per-avatar training cost: $3–6.

After

Our Approach

Step by step

How we approached this project to deliver exceptional results.

Designed the pipeline as a streaming graph, not a request/response chain — every stage starts producing output before the previous one finishes

Picked open models at every layer (Whisper, F5-TTS, Audio2Face, GSAC) so the whole stack can be deployed on customer-owned infrastructure

Optimised for single-GPU deployment — 13GB VRAM footprint, fits on a consumer-grade RTX 4090 or a single H100 slice

Streamed video over WebRTC at 30fps 720p H.264 — chosen for the latency budget, not the bitrate

Built the SvelteKit frontend with a clean masthead so the technical specifics (latency, footprint, frame rate, cost) are visible above the fold

Published the spec — `avatar-mvp-spec.md` is the single source of truth and every claim is reproducible

Gallery

Project highlights

Key screens and features from the final product.

Editorial research-preview marketing site — every spec above the fold

Mobile experience preserves the data-dense layout

Next Project

Gryphin

SaaS

The kanban board that out-prices Trello and out-ships Jira

View Case Study

Currently accepting new projects

Ready to start your project?

Let's create something amazing together. Tell us about your project and we'll make it happen.

Start a Project View More Work

Free consultation

No lock-in contracts

NZ-based team