Local STT via MLX-Whisper
Speech recognition runs entirely on your Apple Silicon Mac using MLX-Whisper large-v3-mlx-4bit. No cloud API, no usage fees, no audio leaves your machine.
Natural voice conversations with your AI assistant β from anywhere, with zero paid speech APIs
Overview
Voice Control turns your OpenClaw setup into a hands-free AI assistant. Built on WebRTC with a fully self-hosted stack β MLX-Whisper for local speech recognition, Edge-TTS for free text-to-speech, and LiveKit for real-time audio. Generate a one-time link, open it on your phone, and start talking.
Capabilities
Every component runs locally or uses free services β no ongoing API costs for speech.
Speech recognition runs entirely on your Apple Silicon Mac using MLX-Whisper large-v3-mlx-4bit. No cloud API, no usage fees, no audio leaves your machine.
Text-to-speech powered by Microsoft's free Edge-TTS service β high-quality natural voices with no subscription or per-character billing.
Real-time audio streaming over WebRTC, self-hosted with LiveKit. Low-latency duplex audio that works reliably from any browser or iOS Safari.
Call your AI from anywhere using your iPhone. Tailscale provides a trusted HTTPS endpoint so iOS Safari connects without certificate warnings.
Ask Claude to read files, run shell commands, search your memory store, or list active sessions β all triggered naturally through conversation.
Accessible from anywhere on your Tailscale network. One-time call links expire after 1 hour, so every session starts fresh and secure.
Architecture
A fully self-hosted audio pipeline β from your mic to Claude's voice, nothing leaves your network.
Every call goes through a deterministic six-step pipeline. Speech is detected by Silero VAD, transcribed locally on-device by MLX-Whisper, processed by Claude, then converted back to audio by Edge-TTS β all in real-time over WebRTC.
Getting on a call takes seconds. A single script generates a signed JWT with a fresh room name, bundles it into a Tailscale HTTPS URL, and prints a ready-to-open link.
One-Time Link Flow
Run ./call.sh β generates a signed JWT + unique room
Link delivered via Tailscale DNS (trusted Let's Encrypt cert)
Open on iPhone or browser β WebRTC handshake via token server
Audio streams through LiveKit to the voice agent
Link expires after 1 hour β next call, fresh link
STT
MLX-Whisper
local, Apple Silicon
TTS
Edge-TTS
free Microsoft
Transport
LiveKit
self-hosted WebRTC
Access
Tailscale
zero-config VPN
Coverage
The voice agent has full access to OpenClaw tools β anything you can type, you can now say.
Chat naturally with Claude β ask anything, brainstorm ideas, or get quick answers hands-free.
Execute shell commands on your Mac mini by voice β no keyboard needed.
Query your OpenClaw memory store out loud and get spoken answers back instantly.
Ask Claude to read any file on your system and summarize or explain its contents.
Find out what agents are active, what tasks are running, or what sessions exist β just ask.
Steer sub-agents, check task status, and orchestrate your OpenClaw setup from anywhere.
Coming Soon
Be the first to know when this plugin launches.