2027-02-03 scheduled
Per-customer LoRA fine-tunes Brain-B per-customer LoRA pipeline live Adapter swapping at session-start without cold-start penalty Self-serve via /voice/dashboard/finetune Custom voice cloning available for Enterprise tier (DP-SGD ε=4)
2026-09-07 v0.4.0 scheduled
Voice cloning + AutoVoice + Show HN #2 Zero-shot voice cloning (3s reference audio) DP-SGD ε=4 + canary memorization tests AutoVoice: agent picks register based on caller signal Image input mid-call (vision) First FTE hire (CS Lead at $20K MRR threshold)
2026-08-08 v0.3.0 scheduled
Day 90: Image input + AutoVoice image_input.py event support (vision) AutoVoice register-switching Cartesia Sonic-3 TTS as default MCP tool bridging GA
2026-07-09 v0.2.0 scheduled
Day 60: Flagship reveal + outbound Flagship $0.09/min tier publicly revealed (auto-route on intent) Outbound dialer enabled (TCPA-gated, 7-day inbound-only minimum lifted) Pathway YAML primitive (declarative call flows) Brain-B L0 routing bot (intent classifier) Custom adapter $99/mo tier L4 backbone (Soft barge-in canonicalized)
2026-06-09 v0.1.0 scheduled
Day 30 launch — founding-customer beta opens $0.04/min standalone voice API live (mini tier) OpenAI Realtime drop-in: 74/77 events at launch Phone numbers bundled ($1/mo local, $5/mo vanity) 0-byte audio retention 5 production personas live (Tony, Maria, Mike, Jenny, Sam) @toolkit-llm/voice + toolkit-voice SDKs published voicebench-10 OSS leaderboard live (Apache 2.0) Founding-customer beta opens (first 50, 30% off forever) Show HN: 'Voice AI at $0.04/min, phone numbers included, OpenAI Realtime drop-in'
2026-05-15 in-flight
First production pod stable + 4-pod bench PersonaPlex Blackwell image baked + GHCR pushed TTFT p95 <250ms across A6000 / 5090 / PRO 6000 / 4090-failover voicebench-10 dry-run with 9 providers First 5 vanity DIDs acquired via Bandwidth
2026-05-12 in-flight
Bandwidth KYC + 3-carrier provisioning Bandwidth account active Telnyx + SignalWire backup carriers in queue +1-628-KLAW-NOW shared demo line provisioned 5 vanity DIDs targeted: Tony 415-CAR-TONY, Maria 305-CASA-NOW, Mike 510-BUILD-IT, Jenny 805-PLAY-JOY, Sam 702-NEW-TIRE
2026-05-08 shipped
Voice marketing surface — toolkit-llm.com/voice live /voice lander with embedded dialer + live calculator + 5-line code diff /voice/start founding-beta waitlist (D1-backed) /voice/quickstart 5-step real cURL/JS/Python tutorial /voice/compat full OpenAI Realtime event matrix (74/77 events) /voice/migrate hub with 4 codemod paths (OpenAI, Vapi, Bland, Twilio) /voice/voicebench leaderboard (Apache 2.0) /voice/personas detail pages (5 SSG'd profiles) /voice/trust security & compliance posture (GDPR Article 9, anti-clone clause) /voice/faq — 21 honest dev objections answered /voice/sdks — npm + PyPI install + working samples 4 CF Pages Functions: /api/voice/{try-call, waitlist, health, waitlist-export} Hardened: SHA-256 phone hashing, geo-block, premium-prefix blocklist, KV rate limits
2026-05-07 shipped
OpenAI gpt-realtime-2 drop — competitive analysis complete Honest pricing pivot: 4-8× cheaper (down from 72%) — wedge survives H2 updated: '$0.04/min vs OpenAI's $0.16-0.33. We don't make you bring Twilio.' Compat sprint planned: 67/70 → 74/77 events (1 week) Briefing context positioning: 'fresh hourly' (counter to OpenAI's 1hr cache) Audio retention positioning strengthened (we keep 0; OpenAI defaults to 1hr)
2026-05-07 v0.0.7 shipped
Brain B selection locked: Qwen3-4B-Instruct-2507 Vast 4090 bench: TTFT p95 29ms, 0 leaks, 3.8GB VRAM Qwen3.5-9B-NVFP4 deferred pending torch 2.7+ image Qwen3.5-9B-AWQ-INT4 viable on 4090 (24GB) Production-locked for Day 30
2026-05-01 shipped
Architecture: master + dumb-worker pool 2× A6000 RunPod Secure masters (always-on, customer state) Ephemeral workers self-register across RunPod / Lambda / Crusoe / Latitude / 5090 / PRO 6000 / 4090-failover CF Cron `gpu-scout` opportunistically scales the pool 3 GHCR image variants: ampere / ada / blackwell
2026-04-15 shipped
OpenAI Realtime compat: 67/70 events shipped Full lifecycle: session.update, conversation.item.*, input_audio_buffer.*, response.* Function calling (server tools + client tools) Error handling with rate_limits.updated MCP tool bridging (toolkit extension)
2026-04-01 shipped
Two-brain architecture (Helium + toolkit-chat) Helium = mouth (audio in/out, low-latency) toolkit-chat = brain (logic, tools, memory) 6 meta-tools defined; 3-lane response loop Pre-warmed KV cache: 147ms avg first-token