8000
Skip to content
8000

sixteen-dev/tomchat-app

Repository files navigation

TomChat

TomChat

Fast, local speech-to-text with a global hotkey

TomChat is a desktop application that transcribes your speech and types it directly into any application. Press Right Shift to start recording, speak, and watch your words appear. All processing happens locally on your machine - no cloud, no API keys, no latency.

Built with Tauri, NVIDIA Parakeet, and Silero VAD.

Features

  • Fast Local Transcription - NVIDIA Parakeet TDT 0.6B runs entirely on your CPU
  • Voice Activity Detection - Automatically stops recording after silence
  • Global Hotkey - Right Shift works from any application
  • System Tray - Runs quietly in the background
  • Visual Feedback - Floating bubble shows recording state
  • Direct Text Injection - Types transcribed text into the active window

Requirements

  • OS: Linux (Ubuntu 22.04+ recommended)
  • RAM: 4GB minimum (8GB recommended)
  • Disk: ~1.5GB for models

Quick Start

git clone https://github.com/sixteen-dev/tomchat-app.git
cd tomchat-app
bash setup.sh

The setup script handles everything: system dependencies, Node.js, Rust, AI models (~1.1GB download), building, and installing the .deb package. It's idempotent — safe to run again if interrupted.

Once installed, launch TomChat from your application menu or run tom-chat. Press Right Shift to start/stop recording.

Manual setup (if you prefer not to use the script)

1. Install System Dependencies

sudo apt update
sudo apt install -y \
    build-essential \
    libasound2-dev \
    libssl-dev \
    pkg-config \
    libsoup2.4-dev \
    libwebkit2gtk-4.0-dev \
    libayatana-appindicator3-dev \
    wget

# Install Node.js 18+
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt install -y nodejs

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

2. Add User to Input Group

The global hotkey requires access to input devices:

sudo usermod -a -G input $USER

Important: Log out and log back in for this to take effect.

3. Download Models

mkdir -p resources/models
cd resources/models

# Download Parakeet TDT 0.6B v2 (FP16) - ~1.1GB
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2
tar -xjf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2
rm sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-fp16.tar.bz2

# Download Silero VAD - ~2MB
wget https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx

cd ../..

4. Install Dependencies & Run

npm install
npm run dev

Building for Production

# Build the application
npm run tauri build

# The .deb package will be at:
# src-tauri/target/release/bundle/deb/tom-chat_*.deb

Installing the .deb Package

# Install the package
sudo dpkg -i src-tauri/target/release/bundle/deb/tom-chat_*.deb

# Install the shared libraries (required)
sudo cp src-tauri/target/release/libsherpa-onnx-c-api.so \
       src-tauri/target/release/libsherpa-onnx-cxx-api.so \
       src-tauri/target/release/libonnxruntime.so \
       /usr/lib/
sudo ldconfig

Usage

  1. Launch - TomChat icon appears in your system tray
  2. Press Right Shift - Recording starts (bubble turns red)
  3. Speak - Your voice is captured
  4. Press Right Shift again or wait for silence - Recording stops
  5. Text appears - Transcription is typed into the active application

System Tray Menu

  • Left Click: Open settings window
  • Right Click: Show menu (Settings, Toggle Recording, Show/Hide Bubble, Quit)

Settings

  • Silence Timeout: How long to wait after speech stops (default: 1500ms)
  • Start/Stop Service: Enable or disable the hotkey

Project Structure

tomchat-app/
├── src/                          # Frontend (TypeScript)
│   ├── main.ts                   # Settings UI
│   ├── bubble.html               # Recording indicator
│   └── bubble.js                 # Bubble logic
├── src-tauri/                    # Backend (Rust)
│   ├── src/
│   │   ├── main.rs               # App entry, hotkey, events
│   │   ├── audio/
│   │   │   ├── capture.rs        # Microphone input
│   │   │   └── vad.rs            # Voice activity detection
│   │   ├── speech/
│   │   │   └── transcriber.rs    # Parakeet transcription
│   │   └── input/
│   │       └── injector.rs       # Keyboard simulation
│   ├── Cargo.toml
│   └── tauri.conf.json
├── resources/
│   └── models/                   # Model files (not in git)
├── index.html                    # Settings page
└── package.json

Troubleshooting

Hotkey Not Working

# Verify you're in the input group
groups | grep input

# If not, add yourself and re-login
sudo usermod -a -G input $USER

Library Not Found Error

# For development
export LD_LIBRARY_PATH=./src-tauri/target/release:$LD_LIBRARY_PATH

# For production, install libraries system-wide
sudo cp src-tauri/target/release/lib*.so /usr/lib/
sudo ldconfig

Microphone Not Working

# Test your microphone
arecord -f cd -t wav -d 3 test.wav && aplay test.wav

# Add yourself to the audio group if needed
sudo usermod -a -G audio $USER

Model Not Found

Ensure models are in one of these locations:

  • ./resources/models/ (development)
  • /usr/lib/tom-chat/_up_/resources/models/ (installed .deb)

Alternative Models

TomChat auto-detects model precision. You can use INT8 for smaller size:

# INT8 version (~600MB instead of 1.1GB)
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2

Tech Stack

  • Tauri - Desktop app framework
  • sherpa-rs - Rust bindings for sherpa-onnx
  • rdev - Global keyboard input
  • cpal - Audio capture
  • enigo - Keyboard simulation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • NVIDIA NeMo for the Parakeet speech recognition model
  • Silero for the voice activity detection model
  • sherpa-onnx for making these models easy to use
  • Named after Tommy

About

Talk to your computer. TomChat transcribed your voice directly into any applications - entirely offline, entirely private.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

 
 
 

Contributors

0