Changelog

What shipped.
And what's shipping next.

Public roadmap. We commit to dates and ship them. Items past the launch line are scheduled with confidence; in-flight items are this week's work; shipped items are linkable.

Shipped
In-flight (this week)
Scheduled (date locked)
Deferred
scheduled

Per-customer LoRA fine-tunes

  • Brain-B per-customer LoRA pipeline live
  • Adapter swapping at session-start without cold-start penalty
  • Self-serve via /voice/dashboard/finetune
  • Custom voice cloning available for Enterprise tier (DP-SGD ε=4)
v0.4.0scheduled

Voice cloning + AutoVoice + Show HN #2

  • Zero-shot voice cloning (3s reference audio)
  • DP-SGD ε=4 + canary memorization tests
  • AutoVoice: agent picks register based on caller signal
  • Image input mid-call (vision)
  • First FTE hire (CS Lead at $20K MRR threshold)
v0.3.0scheduled

Day 90: Image input + AutoVoice

  • image_input.py event support (vision)
  • AutoVoice register-switching
  • Cartesia Sonic-3 TTS as default
  • MCP tool bridging GA
v0.2.0scheduled

Day 60: Flagship reveal + outbound

  • Flagship $0.09/min tier publicly revealed (auto-route on intent)
  • Outbound dialer enabled (TCPA-gated, 7-day inbound-only minimum lifted)
  • Pathway YAML primitive (declarative call flows)
  • Brain-B L0 routing bot (intent classifier)
  • Custom adapter $99/mo tier
  • L4 backbone (Soft barge-in canonicalized)
v0.1.0scheduled

Day 30 launch — founding-customer beta opens

  • $0.04/min standalone voice API live (mini tier)
  • OpenAI Realtime drop-in: 74/77 events at launch
  • Phone numbers bundled ($1/mo local, $5/mo vanity)
  • 0-byte audio retention
  • 5 production personas live (Tony, Maria, Mike, Jenny, Sam)
  • @toolkit-llm/voice + toolkit-voice SDKs published
  • voicebench-10 OSS leaderboard live (Apache 2.0)
  • Founding-customer beta opens (first 50, 30% off forever)
  • Show HN: 'Voice AI at $0.04/min, phone numbers included, OpenAI Realtime drop-in'
in-flight

First production pod stable + 4-pod bench

  • PersonaPlex Blackwell image baked + GHCR pushed
  • TTFT p95 <250ms across A6000 / 5090 / PRO 6000 / 4090-failover
  • voicebench-10 dry-run with 9 providers
  • First 5 vanity DIDs acquired via Bandwidth
in-flight

Bandwidth KYC + 3-carrier provisioning

  • Bandwidth account active
  • Telnyx + SignalWire backup carriers in queue
  • +1-628-KLAW-NOW shared demo line provisioned
  • 5 vanity DIDs targeted: Tony 415-CAR-TONY, Maria 305-CASA-NOW, Mike 510-BUILD-IT, Jenny 805-PLAY-JOY, Sam 702-NEW-TIRE
shipped

Voice marketing surface — toolkit-llm.com/voice live

  • /voice lander with embedded dialer + live calculator + 5-line code diff
  • /voice/start founding-beta waitlist (D1-backed)
  • /voice/quickstart 5-step real cURL/JS/Python tutorial
  • /voice/compat full OpenAI Realtime event matrix (74/77 events)
  • /voice/migrate hub with 4 codemod paths (OpenAI, Vapi, Bland, Twilio)
  • /voice/voicebench leaderboard (Apache 2.0)
  • /voice/personas detail pages (5 SSG'd profiles)
  • /voice/trust security & compliance posture (GDPR Article 9, anti-clone clause)
  • /voice/faq — 21 honest dev objections answered
  • /voice/sdks — npm + PyPI install + working samples
  • 4 CF Pages Functions: /api/voice/{try-call, waitlist, health, waitlist-export}
  • Hardened: SHA-256 phone hashing, geo-block, premium-prefix blocklist, KV rate limits
shipped

OpenAI gpt-realtime-2 drop — competitive analysis complete

  • Honest pricing pivot: 4-8× cheaper (down from 72%) — wedge survives
  • H2 updated: '$0.04/min vs OpenAI's $0.16-0.33. We don't make you bring Twilio.'
  • Compat sprint planned: 67/70 → 74/77 events (1 week)
  • Briefing context positioning: 'fresh hourly' (counter to OpenAI's 1hr cache)
  • Audio retention positioning strengthened (we keep 0; OpenAI defaults to 1hr)
v0.0.7shipped

Brain B selection locked: Qwen3-4B-Instruct-2507

  • Vast 4090 bench: TTFT p95 29ms, 0 leaks, 3.8GB VRAM
  • Qwen3.5-9B-NVFP4 deferred pending torch 2.7+ image
  • Qwen3.5-9B-AWQ-INT4 viable on 4090 (24GB)
  • Production-locked for Day 30
shipped

Architecture: master + dumb-worker pool

  • 2× A6000 RunPod Secure masters (always-on, customer state)
  • Ephemeral workers self-register across RunPod / Lambda / Crusoe / Latitude / 5090 / PRO 6000 / 4090-failover
  • CF Cron `gpu-scout` opportunistically scales the pool
  • 3 GHCR image variants: ampere / ada / blackwell
shipped

OpenAI Realtime compat: 67/70 events shipped

  • Full lifecycle: session.update, conversation.item.*, input_audio_buffer.*, response.*
  • Function calling (server tools + client tools)
  • Error handling with rate_limits.updated
  • MCP tool bridging (toolkit extension)
shipped

Two-brain architecture (Helium + toolkit-chat)

  • Helium = mouth (audio in/out, low-latency)
  • toolkit-chat = brain (logic, tools, memory)
  • 6 meta-tools defined; 3-lane response loop
  • Pre-warmed KV cache: 147ms avg first-token

Want a feature on this list?

Founding-beta customers vote on the roadmap. Your use case shapes what ships next.

Apply for founding beta →