Semantic Search from Scratch: Building a FAQ Chatbot

You might have wondered: how does Google know to show you exactly what you need even when you type something slightly different?

Meanwhile, your SQL queries return null when even one word is off.

The answer is semantic search.

You might have used it before when building RAG systems or working with AI tools, but do you actually know how it works?

"Semantics" is all about meaning and context — and here you'll learn how to encode them in a way that computers can understand so we can make the best use of it.

In this guide, we'll build a fully functional semantic search system from scratch (mostly lol), and by the end, we'll hook it up to a chatbot for RAG.

What You'll Build

By the end of this guide, you'll have:

A semantic search engine that understands meaning
An FAQ chatbot powered by vector embeddings
A working RAG (Retrieval-Augmented Generation) system
Deep understanding of how it all works

Although we'll build our system with SQLite, the patterns, architecture, and design principles you'll learn here are not SQLite-specific. They apply to any semantic search system you want to build — whether you later move to Postgres with pgvector, specialized vector databases, or even in-memory solutions.

SQLite is a solid choice for this because it has such a low resource footprint and negligible setup cost. It exists on most devices, so that's a plus if you ever want to deploy such an app.

Who This Guide Is For

This guide assumes you already have:

Basic programming skills
Some familiarity with databases (SQL basics)
Curiosity about how AI search works

Optional but helpful:

Experience with Rust or Python (Python version coming soon)
Prior exposure to embeddings or vector concepts
Familiarity with chatbots or LLMs

If you need a refresher:

Setup

git clone https://github.com/olorikendrick/vector-search-with-rust-and-sqlite
cd vector-search-with-rust-and-sqlite/embeddings

Each chapter may introduce breaking changes. To follow along cleanly, we encourage you to branch per chapter:

git checkout -b chapter-2
cargo run

Optionally replace faq.txt with your own FAQ file before running.

Chapters

Chapter 1: Where Traditional Queries Fail
We explore the limits of traditional SQL and what we can use in its place.

Chapter 2: Building Vector Search
Generate embeddings, store them in SQLite, calculate cosine similarity, and search.

Chapter 3: Refactoring (coming soon)
Clean up the codebase, swap in Rig for provider-agnostic embeddings, and add proper error handling.

Chapter 4: The FAQ Chatbot (coming soon)
Hook it all up to build a complete RAG system.

This is very much a work in progress, so it's still rough around the edges.

Any support, contributions, or corrections are appreciated — it keeps me going.

If you find it helpful, star and share!

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
embeddings		embeddings
src		src
.gitignore		.gitignore
README.md		README.md
book.toml		book.toml
embeddings.db		embeddings.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Search from Scratch: Building a FAQ Chatbot

What You'll Build

Who This Guide Is For

Setup

Chapters

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Search from Scratch: Building a FAQ Chatbot

What You'll Build

Who This Guide Is For

Setup

Chapters

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages