vupai¶
Voice UI for AI panes: push-to-talk voice control for your tmux agent panes, on macOS, with on-device speech.
vupai (say "voo-pie") is a Voice UI for your AI panes.
Hold a key, speak, and what you say is typed into the right tmux pane: the one you're looking at, or an agent you call by name ("atlas, run the tests"). Speech-to-text runs on-device with NVIDIA Parakeet (via Apple MLX): no cloud, no API keys.
Built for a tmux-centric workflow where you keep several coding agents and
shells open at once and want to drive them by voice without reaching for the
mouse. New panes launch an agent by default (claude out of the box) and should
work with other agentic coding tools (Codex, Gemini, …), though testing so far
has focused on Claude Code.
🎬 Every feature has a narrated clip in See it in action.
Why not plain tmux?¶
vupai runs on tmux: it doesn't replace it, it adds a voice layer on top. tmux already gives you panes, splits, and a way to keep many agents on screen. What it can't do is let you talk to them. That's the gap vupai fills.
| With plain tmux | With vupai |
|---|---|
Switch panes with <prefix>-arrow, then type |
Hold a key and talk to the focused pane |
| Manually track which pane is which agent | Panes auto-name themselves; address them by name ("atlas, run the tests") |
| Re-type the same command in each pane | Broadcast by voice to every agent at once ("everyone, pull main") |
| Split / resize / re-layout with prefix chords | Voice commands: "create 3 panes", "focus atlas", "swap atlas and sage", "layout tile" |
| Read each pane yourself to see what agents are doing | Supervision board + "read atlas" speaks a one-line summary aloud |
| Every pane is a blank agent you brief by hand | Squad: named specialists with predefined roles; "open sage" launches your frontend expert, brief and all |
| n/a | On-device speech (Parakeet via Apple MLX) - no cloud, no API keys |
If you only have one shell open, you don't need vupai. It earns its keep when you are juggling several agents and want to drive them hands-on-keyboard-optional.
Because input is voice-first, it can also ease the typing load for anyone with RSI or hand-strain, though vupai isn't built or tested as a dedicated accessibility tool.
How it works¶
hold dictation key (Right-Option) → record (sox) → transcribe (Parakeet) → route → paste into a tmux pane → Enter
- Routing is hybrid. By default your speech goes to the focused pane. If you start with an agent's name, it goes there instead, even when it isn't focused. Say a number ("two, …") to hit a pane by its position in the current window.
- Injection is safe. vupai pastes your text and waits until it actually appears in the pane before pressing Enter; it never blindly submits.
- Local-first speech. Speech-to-text runs entirely on-device via Apple MLX (NVIDIA Parakeet): no cloud service, no API key, no account, so your voice and transcripts never leave your Mac. vupai itself makes no network calls beyond a one-time model download (~2 GB) on first use. The agents you drive, and the optional board summarizer, use whatever model you point them at, which may be a cloud service.
Where next¶
- Install and set up vupai.
- Learn the usage basics and voice commands.
- Supervise many agents with the board, the activity ledger, and review.
- Capture ideas in the task pile and keep specialists in your squad.