Offline Voice Dictation on Linux: My Whisper-Based Alternative to Wispr Flow
A few months ago, I wrote about discovering Wispr Flow and how it changed my workflow on Mac. The problem? I also work on Linux, and Wispr Flow isn’t available there.
I needed something that worked offline, respected my privacy, and just… worked.
This mattered more than you might think. I use Claude Code constantly, and I’d grown so accustomed to dictating my thoughts on Mac that working on Linux without it felt painful. Every time I sat down at my Linux machine, I’d instinctively reach for the dictation hotkey—and nothing would happen. Finding a solution that actually works has been a game-changer for my Linux workflow.
The Linux Gap
Voice dictation on Linux has always been a second-class citizen. Most solutions either send your audio to cloud services (privacy nightmare), don’t work properly with Wayland, or are just unreliable.
After some research, I found a solution that actually delivers: faster-whisper + ydotool.
How It Compares to Wispr Flow
Let me be honest about the differences.
Wispr Flow feels instantaneous—you speak, and the text appears almost immediately. My Linux setup types word by word, which initially seemed like a limitation.
But here’s the thing: I’ve actually grown to enjoy the slower pace.
Watching each word appear gives me time to double-check as it types. If something’s off, I notice immediately rather than finding mistakes after a wall of text has appeared. It’s a different rhythm, but not a worse one.
The transcription quality is excellent—surprisingly close to Wispr Flow for everyday speech. Technical terms and proper nouns sometimes need correction, but for general dictation it’s solid.
One thing that surprised me: the punctuation is handled beautifully. It knows when to put full stops, commas, and question marks without you having to say them explicitly. The text comes out feeling natural and readable, matching what I’d expect from Wispr Flow.
The Setup
The workflow is simple:
- Press
Ctrl+Shift+Spaceto start recording - Speak naturally
- Press
Ctrl+Shift+Spaceagain to stop - Text gets transcribed and typed into the active window
Everything runs locally on CPU. No cloud services, no subscriptions, no internet required.
The Components
ydotool is the key piece for Wayland support. Unlike xdotool, it works at the kernel level so it doesn’t care whether you’re on X11 or Wayland. I built it from source from the GitHub repo.
faster-whisper handles the actual transcription—it’s a CTranslate2-based implementation of OpenAI’s Whisper model that runs efficiently on CPU.
I’m using the small model (466MB, needs about 2GB RAM). It’s a good balance between speed and accuracy. You can go smaller for faster results or larger for better accuracy:
| Model | Size | RAM | Notes |
|---|---|---|---|
| tiny | 75 MB | ~1 GB | Fast but less accurate |
| base | 142 MB | ~1 GB | Decent for quick notes |
| small | 466 MB | ~2 GB | What I use - good balance |
| medium | 1.5 GB | ~5 GB | Better accuracy, slower |
| large-v3 | 3 GB | ~10 GB | Best accuracy, needs patience |
System Dependencies
sudo apt install portaudio19-dev python3-venv python3-pip git pulseaudio-utils libnotify-bin cmake libevdev-dev scdoc
The One Gotcha: Permissions
ydotool needs access to /dev/uinput to inject keystrokes. This requires:
- A udev rule at
/etc/udev/rules.d/60-uinput.rules:KERNEL=="uinput", MODE="0660", GROUP="input" -
Your user in the
inputgroup - A logout/login for the group membership to take effect
Running as Services
Everything runs through systemd user services so it starts automatically on login:
# Check if it's running
systemctl --user status ydotoold
systemctl --user status dictation-listener
# View logs if something's wrong
journalctl --user -u dictation-listener -f
Was It Worth It?
Absolutely. Once it’s set up, it just works. I use it for:
- Quick notes and drafts
- Writing first passes of documentation
- Responding to messages when my hands are tired
- Brain dumps when I need to think out loud
The word-by-word typing that seemed like a limitation? It’s become a feature. I’m more present with what I’m dictating, catching errors in real-time rather than after the fact.
Different from Wispr Flow, but solving the same problem: removing the input bottleneck so I can work at the speed of thought.
Resources
- faster-whisper - The CTranslate2-based Whisper implementation
- ydotool - Linux automation that works on Wayland
- OpenAI Whisper - The original model
Setup completed January 2026 on Ubuntu 24.04 with Wayland.