SDKs

Two SDKs.
Same shape as OpenAI Realtime.

Native fetch. Zero runtime dependencies. We refused to ship 5MB of axios bundlers. The TypeScript SDK is <250KB minified. The Python SDK is <150KB on disk.

npm

@toolkit-llm/voice

v0.1.0-beta.1 · TypeScript · JavaScript · Deno · Bun

npm install @toolkit-llm/voice

PyPI

toolkit-voice

v0.1.0b1 · Python 3.10+ · asyncio native

pip install toolkit-voice

TypeScript / JavaScript

Install:

bash

# npm
npm install @toolkit-llm/voice

# or pnpm / yarn / bun
pnpm add @toolkit-llm/voice
yarn add @toolkit-llm/voice
bun add @toolkit-llm/voice

Realtime session:

TypeScript

import { Toolkit } from "@toolkit-llm/voice";

const client = new Toolkit({
  apiKey: process.env.TOOLKIT_API_KEY,
});

const session = await client.realtime.sessions.create({
  model: "toolkit-voice",
  instructions: "You are a helpful auto-dealer agent named Tony.",
  voice: "VARM1",
  modalities: ["text", "audio"],
});

session.on("response.audio.delta", (event) => {
  // event.delta is base64 PCM16 (or opus if you set audio_format)
  speakers.write(event.delta);
});

session.on("conversation.item.input_audio_transcription.completed", (event) => {
  console.log("Caller said:", event.transcript);
});

session.appendAudio(micChunk);

Outbound dial:

TypeScript

const call = await client.calls.create({
  to: "+14155551234",
  from: "+14152278669",  // Your provisioned number
  persona: "tony",
  max_seconds: 60,
});

// Stream events as the call progresses
for await (const event of call.events()) {
  if (event.type === "call.ended") break;
}

Webhook verification:

TypeScript

import { verifyWebhook } from "@toolkit-llm/voice/webhooks";

// Express
app.post("/voice/event", express.raw({ type: "*/*" }), (req, res) => {
  const event = verifyWebhook({
    body: req.body,                                  // raw bytes
    signature: req.header("X-Toolkit-Signature"),
    secret: process.env.TOOLKIT_WEBHOOK_SECRET,
    tolerance_s: 300,                                // 5min replay window
  });
  // event is now a typed Event union
});

Python

Install:

bash

pip install toolkit-voice

# Or with uv (faster):
uv pip install toolkit-voice

Realtime session (asyncio):

Python

import asyncio
from toolkit import Toolkit

async def main():
    client = Toolkit()
    session = await client.realtime.sessions.create(
        model="toolkit-voice",
        instructions="You are a helpful auto-dealer agent named Tony.",
        voice="VARM1",
    )

    @session.on("response.audio.delta")
    def on_audio(event):
        # event.delta is base64 PCM16
        speakers.write(event.delta)

    @session.on("conversation.item.input_audio_transcription.completed")
    def on_transcript(event):
        print(f"Caller said: {event.transcript}")

    await session.append_audio(mic_chunk)

asyncio.run(main())

Outbound dial:

Python

call = client.calls.create(
    to="+14155551234",
    from_="+14152278669",
    persona="tony",
    max_seconds=60,
)

async for event in call.events():
    if event.type == "call.ended":
        break

Webhook verification (FastAPI):

Python

from fastapi import FastAPI, Request, HTTPException
from toolkit.webhooks import verify_webhook, InvalidSignature

app = FastAPI()

@app.post("/voice/event")
async def voice_event(request: Request):
    raw_body = await request.body()
    try:
        event = verify_webhook(
            body=raw_body,
            signature=request.headers["X-Toolkit-Signature"],
            secret=os.environ["TOOLKIT_WEBHOOK_SECRET"],
        )
    except InvalidSignature:
        raise HTTPException(401, "Invalid signature")
    return {"ok": True}

What's in the box

Realtime sessionsWebSocket-based session creation, lifecycle, event handling — same shape as OpenAI Realtime.
Outbound callsDial-out by E.164 number, bind to persona, stream events as the call progresses.
Numbers (DIDs)Buy, port-in, bind to persona, hot-swap webhook URL.
WebhooksHMAC-SHA256 signature verification with raw-body support, replay-window tolerance.
Tool callsSame JSON-schema function-calling spec as OpenAI. Tool result streaming.
Auto-retry + backoffExponential backoff on 429/503, idempotency keys for POST endpoints.
Typed errors54 typed error subclasses (RateLimitError, AuthError, etc.) — catch what you mean.

What we left out (intentionally)

axios / node-fetch / cross-fetchWe use native `fetch`. If your runtime doesn't have it, polyfill at the entry point.
EventEmitter polyfillsWe use native `EventTarget`. Works in browser, Node 18+, Deno, Bun.
OAuth flowsBearer-token auth only. No refresh logic. No token storage. Stateless.
Multi-tenancy primitivesYou bring the user-mapping. We're an API, not a CRM.
TypeScript runtime types (zod, io-ts)We ship .d.ts. Validate at the boundary if you want — your call.

Source + issue tracker

TypeScript: github.com/toolkit-cli/toolkit-voice-js
Python: github.com/toolkit-cli/toolkit-voice-python
Bug reports: file an issue — we triage within 24h

Need a different language? founders@toolkit-llm.com — Go and Ruby SDKs are scoped for Day 90 if there's demand.

Two SDKs.Same shape as OpenAI Realtime.

TypeScript / JavaScript

Python

What's in the box

What we left out (intentionally)

Source + issue tracker

Two SDKs.
Same shape as OpenAI Realtime.