8000
Skip to content

AzkaSahar/AI-Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

🌐 AI-Powered Web Scraper + Q&A with Ollama + FAISS

Scrape website content, store it in a vector DB, and ask questions about it using a local LLM (Mistral via Ollama). Built with LangChain, FAISS, and Streamlit.


🛠 Features

  • 🌍 Web scraping using requests + BeautifulSoup
  • 🔍 Embedding text chunks via sentence-transformers
  • 💾 Semantic search using FAISS vector database
  • 🤖 Local LLM (Mistral via Ollama) for Q&A
  • 🖥️ Easy-to-use Streamlit UI

📦 Installation

1. Clone the repository

git clone https://github.com/AzkaSahar/AI-Web-Scraper.git
cd AI-Web-Scraper

2. Install dependencies

pip install -r requirements.txt

🧠 Prerequisites

  • Install and run Ollama on your machine
  • Pull the Mistral model:
ollama pull mistral

🚀 Usage

streamlit run ai_webscraper.py

💡 How it works:

  1. Input a website URL
  2. It scrapes and stores text chunks in a FAISS index
  3. Ask a question — the app retrieves relevant content and passes it to the LLM
  4. The LLM answers based on that content

🗂️ Optional Folder Structure (if you want to organize it)


├── ai_webscraper.py             # Main Streamlit script
├── requirements.txt
└── README.md

📝 License

MIT — free to use, modify, and distribute.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0