Fast, local speech-to-text with a global hotkey
TomChat is a desktop application that transcribes your speech and types it directly into any application. Press Right Shift to start recording, speak, and watch your words appear. All processing happens locally on your machine - no cloud, no API keys, no latency.
Built with Tauri, NVIDIA Parakeet, and Silero VAD.
- Fast Local Transcription - NVIDIA Parakeet TDT 0.6B runs entirely on your CPU
- Voice Activity Detection - Automatically stops recording after silence
- Global Hotkey - Right Shift works from any application
- System Tray - Runs quietly in the background
- Visual Feedback - Floating bubble shows recording state
- Direct Text Injection - Types transcribed text into the active window
- OS: Linux (Ubuntu 22.04+ recommended)
- RAM: 4GB minimum (8GB recommended)
- Disk: ~1.5GB for models
git clone https://github.com/sixteen-dev/tomchat-app.git
cd tomchat-app
bash setup.shThe setup script handles everything: system dependencies, Node.js, Rust, AI models (~1.1GB download), building, and installing the .deb package. It's idempotent — safe to run again if interrupted.
Once installed, launch TomChat from your application menu or run tom-chat. Press Right Shift to start/stop recording.
Manual setup (if you prefer not to use the script)
sudo apt update
sudo apt install -y \
build-essential \
libasound2-dev \
libssl-dev \
pkg-config \
libsoup2.4-dev \
libwebkit2gtk-4.0-dev \
libayatana-appindicator3-dev \
wget
# Install Node.js 18+
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/envThe global hotkey requires access to input devices:
sudo usermod -a -G input $USERImportant: Log out and log back in for this to take effect.
mkdir -p resources/models
cd resources/models
# Download Parakeet TDT 0.6B v2 (FP16) - ~1.1GB
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2
tar -xjf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2
# Download Silero VAD - ~2MB
wget https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
cd ../..npm install
npm run dev# Build the application
npm run tauri build
# The .deb package will be at:
# src-tauri/target/release/bundle/deb/tom-chat_*.deb# Install the package
sudo dpkg -i src-tauri/target/release/bundle/deb/tom-chat_*.deb
# Install the shared libraries (required)
sudo cp src-tauri/target/release/libsherpa-onnx-c-api.so \
src-tauri/target/release/libsherpa-onnx-cxx-api.so \
src-tauri/target/release/libonnxruntime.so \
/usr/lib/
sudo ldconfig- Launch - TomChat icon appears in your system tray
- Press Right Shift - Recording starts (bubble turns red)
- Speak - Your voice is captured
- Press Right Shift again or wait for silence - Recording stops
- Text appears - Transcription is typed into the active application
- Left Click: Open settings window
- Right Click: Show menu (Settings, Toggle Recording, Show/Hide Bubble, Quit)
- Silence Timeout: How long to wait after speech stops (default: 1500ms)
- Start/Stop Service: Enable or disable the hotkey
tomchat-app/
├── src/ # Frontend (TypeScript)
│ ├── main.ts # Settings UI
│ ├── bubble.html # Recording indicator
│ └── bubble.js # Bubble logic
├── src-tauri/ # Backend (Rust)
│ ├── src/
│ │ ├── main.rs # App entry, hotkey, events
│ │ ├── audio/
│ │ │ ├── capture.rs # Microphone input
│ │ │ └── vad.rs # Voice activity detection
│ │ ├── speech/
│ │ │ └── transcriber.rs # Parakeet transcription
│ │ └── input/
│ │ └── injector.rs # Keyboard simulation
│ ├── Cargo.toml
│ └── tauri.conf.json
├── resources/
│ └── models/ # Model files (not in git)
├── index.html # Settings page
└── package.json
# Verify you're in the input group
groups | grep input
# If not, add yourself and re-login
sudo usermod -a -G input $USER# For development
export LD_LIBRARY_PATH=./src-tauri/target/release:$LD_LIBRARY_PATH
# For production, install libraries system-wide
sudo cp src-tauri/target/release/lib*.so /usr/lib/
sudo ldconfig# Test your microphone
arecord -f cd -t wav -d 3 test.wav && aplay test.wav
# Add yourself to the audio group if needed
sudo usermod -a -G audio $USEREnsure models are in one of these locations:
./resources/models/(development)/usr/lib/tom-chat/_up_/resources/models/(installed .deb)
TomChat auto-detects model precision. You can use INT8 for smaller size:
# INT8 version (~600MB instead of 1.1GB)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2- Tauri - Desktop app framework
- sherpa-rs - Rust bindings for sherpa-onnx
- rdev - Global keyboard input
- cpal - Audio capture
- enigo - Keyboard simulation
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- NVIDIA NeMo for the Parakeet speech recognition model
- Silero for the voice activity detection model
- sherpa-onnx for making these models easy to use
- Named after Tommy