TomChat

Fast, local speech-to-text with a global hotkey

TomChat is a desktop application that transcribes your speech and types it directly into any application. Press Right Shift to start recording, speak, and watch your words appear. All processing happens locally on your machine - no cloud, no API keys, no latency.

Built with Tauri, NVIDIA Parakeet, and Silero VAD.

Features

Fast Local Transcription - NVIDIA Parakeet TDT 0.6B runs entirely on your CPU
Voice Activity Detection - Automatically stops recording after silence
Global Hotkey - Right Shift works from any application
System Tray - Runs quietly in the background
Visual Feedback - Floating bubble shows recording state
Direct Text Injection - Types transcribed text into the active window

Requirements

OS: Linux (Ubuntu 22.04+ recommended)
RAM: 4GB minimum (8GB recommended)
Disk: ~1.5GB for models

Quick Start

git clone https://github.com/sixteen-dev/tomchat-app.git
cd tomchat-app
bash setup.sh

The setup script handles everything: system dependencies, Node.js, Rust, AI models (~1.1GB download), building, and installing the .deb package. It's idempotent — safe to run again if interrupted.

Once installed, launch TomChat from your application menu or run tom-chat. Press Right Shift to start/stop recording.

Manual setup (if you prefer not to use the script)

1. Install System Dependencies

sudo apt update
sudo apt install -y \
    build-essential \
    libasound2-dev \
    libssl-dev \
    pkg-config \
    libsoup2.4-dev \
    libwebkit2gtk-4.0-dev \
    libayatana-appindicator3-dev \
    wget

# Install Node.js 18+
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

2. Add User to Input Group

The global hotkey requires access to input devices:

sudo usermod -a -G input $USER

Important: Log out and log back in for this to take effect.

3. Download Models

mkdir -p resources/models
cd resources/models

# Download Parakeet TDT 0.6B v2 (FP16) - ~1.1GB
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2
tar -xjf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2

# Download Silero VAD - ~2MB
wget https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx

cd ../..

4. Install Dependencies & Run

npm install
npm run dev

Building for Production

# Build the application
npm run tauri build

# The .deb package will be at:
# src-tauri/target/release/bundle/deb/tom-chat_*.deb

Installing the .deb Package

# Install the package
sudo dpkg -i src-tauri/target/release/bundle/deb/tom-chat_*.deb

# Install the shared libraries (required)
sudo cp src-tauri/target/release/libsherpa-onnx-c-api.so \
       src-tauri/target/release/libsherpa-onnx-cxx-api.so \
       src-tauri/target/release/libonnxruntime.so \
       /usr/lib/
sudo ldconfig

Usage

Launch - TomChat icon appears in your system tray
Press Right Shift - Recording starts (bubble turns red)
Speak - Your voice is captured
Press Right Shift again or wait for silence - Recording stops
Text appears - Transcription is typed into the active application

System Tray Menu

Left Click: Open settings window
Right Click: Show menu (Settings, Toggle Recording, Show/Hide Bubble, Quit)

Settings

Silence Timeout: How long to wait after speech stops (default: 1500ms)
Start/Stop Service: Enable or disable the hotkey

Project Structure

tomchat-app/
├── src/                          # Frontend (TypeScript)
│   ├── main.ts                   # Settings UI
│   ├── bubble.html               # Recording indicator
│   └── bubble.js                 # Bubble logic
├── src-tauri/                    # Backend (Rust)
│   ├── src/
│   │   ├── main.rs               # App entry, hotkey, events
│   │   ├── audio/
│   │   │   ├── capture.rs        # Microphone input
│   │   │   └── vad.rs            # Voice activity detection
│   │   ├── speech/
│   │   │   └── transcriber.rs    # Parakeet transcription
│   │   └── input/
│   │       └── injector.rs       # Keyboard simulation
│   ├── Cargo.toml
│   └── tauri.conf.json
├── resources/
│   └── models/                   # Model files (not in git)
├── index.html                    # Settings page
└── package.json

Troubleshooting

Hotkey Not Working

# Verify you're in the input group
groups | grep input

# If not, add yourself and re-login
sudo usermod -a -G input $USER

Library Not Found Error

# For development
export LD_LIBRARY_PATH=./src-tauri/target/release:$LD_LIBRARY_PATH

# For production, install libraries system-wide
sudo cp src-tauri/target/release/lib*.so /usr/lib/
sudo ldconfig

Microphone Not Working

# Test your microphone
arecord -f cd -t wav -d 3 test.wav && aplay test.wav

# Add yourself to the audio group if needed
sudo usermod -a -G audio $USER

Model Not Found

Ensure models are in one of these locations:

./resources/models/ (development)
/usr/lib/tom-chat/_up_/resources/models/ (installed .deb)

Alternative Models

TomChat auto-detects model precision. You can use INT8 for smaller size:

# INT8 version (~600MB instead of 1.1GB)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2

Tech Stack

Tauri - Desktop app framework
sherpa-rs - Rust bindings for sherpa-onnx
rdev - Global keyboard input
cpal - Audio capture
enigo - Keyboard simulation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

NVIDIA NeMo for the Parakeet speech recognition model
Silero for the voice activity detection model
sherpa-onnx for making these models easy to use
Named after Tommy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TomChat

Features

Requirements

Quick Start

1. Install System Dependencies

2. Add User to Input Group

3. Download Models

4. Install Dependencies & Run

Building for Production

Installing the .deb Package

Usage

System Tray Menu

Settings

Project Structure

Troubleshooting

Hotkey Not Working

Library Not Found Error

Microphone Not Working

Model Not Found

Alternative Models

Tech Stack

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
public		public
resources		resources
src-tauri		src-tauri
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bubble.html		bubble.html
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
setup.sh		setup.sh
tsconfig.json		tsconfig.json
vite.config.js		vite.config.js

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TomChat

Features

Requirements

Quick Start

1. Install System Dependencies

2. Add User to Input Group

3. Download Models

4. Install Dependencies & Run

Building for Production

Installing the .deb Package

Usage

System Tray Menu

Settings

Project Structure

Troubleshooting

Hotkey Not Working

Library Not Found Error

Microphone Not Working

Model Not Found

Alternative Models

Tech Stack

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages