The Anti-Hallucination data layer for B2B Sourcing. Deep-verified global supply chain entities designed for RAG and LLM instruction tuning.
-
Updated
Apr 17, 2026
8000
The Anti-Hallucination data layer for B2B Sourcing. Deep-verified global supply chain entities designed for RAG and LLM instruction tuning.
AI-powered Q&A system for U.S. affordable housing policy using RAG over 2,500+ HUD documents and 24 CFR
A comprehensive Python tool for extracting, processing, and analyzing RPG scenarios from the Era of the Imperial Republic (EOTIR) forums. Features automated web scraping, NLP-powered content analysis, character extraction, timeline generation, and LLM dataset preparation with an interactive HTML dashboard.
ONE SYSTEM Knowledge Dataset — Structured training data from 11 AI, legal & technology brands by David Sanker. UAPK Business Compiler, Legal AI, AI Governance.
This repository aims to provide a structured and easily accessible dataset of laws in Bangladesh. The data is primarily sourced from the Bangladesh Law (BDLAW) website.
Gittxt is an AI-focused CLI and plugin tool for extracting, filtering, and packaging text from GitHub repos. Build LLM-compatible datasets, prep code for prompt engineering, and power AI workflows with structured .txt, .json, .md, or .zip outputs.
Autonomous MCP server for M2M patent intelligence. Delivers structured JSON datasets (CPC A-H) enriched with biz_value_prop, tech stacks, and importance scoring. Supports instant autonomous data purchasing via ROSE cryptocurrency.
Prepare the Kleister NDA dataset for LLM-based extraction. Validates labels against a Pydantic schema and delivers partitioned Parquet with co-located PDFs
Add a description, image, and links to the llm-dataset topic page so that developers can more easily learn about it.
To associate your repository with the llm-dataset topic, visit your repo's landing page and select "manage topics."