Dolphin 3.0 🐬: Versatile AI for coding, math, and more
-
Updated
Mar 12, 2025 - Python
8000
Dolphin 3.0 🐬: Versatile AI for coding, math, and more
Chat data cleaning, filtering and deduplication pipeline.
138M param ChatML training stack optimized for Apple Silicon via MLX. Features a curated Quality2K continuation curriculum and v18 SFT alignment.
Deepseek-Dataset-Generator creates conversational datasets for LLM fine-tuning via DeepSeek API. Supports various formats (ChatML, ShareGPT, Alpaca, JSON, CSV), easy configuration via YAML and detailed logs. Ideal for generating realistic and customized data quickly.
Fine-tuned small language models (Qwen3-0.6B, Gemma3-1B) to detect prompt injection attacks using reasoning-augmented supervised fine-tuning with ChatML templates. Achieves 95-99% accuracy on adversarial prompts including goal hijacking, DAN jailbreaks, and obfuscation attacks.
About working Propmting in OpenAI models, it is also used with deffrent pettren Alpaca prompt, INST prompt
Paste your function, hit convert, and get a clean summary ready for use in LLM-based systems.
LLM Scribe is a toolkit for creating handwritten datasets quickly and easily for LLM fine-tuning. Automatically outputs into multiple common finetuning formats such as chatml, alpaca, and more.
Generate instruction-tuning datasets (JSONL) from structured data using Claude
Week 5 project: build a hybrid retriever that fuses FAISS dense vectors with SQLite FTS5/BM25 keyword search (RRF/weighted fusion), plus a Supervised Fine-Tuning (SFT) pipeline (Full FT vs LoRA/QLoRA) using TRL/PEFT/DeepSpeed.
A Python-based interactive CLI interface for chatting with Hugging Face language models, optimized for casual, Discord-style conversation using ChatML. Supports both quantized and full-precision models, live token streaming with color formatting, and dynamic generation parameter adjustment.
Upload data to PostHog-LLM
A dataset toolbox for preparing and analyzing conversational datasets, including CSV splitting, CSV → Parquet conversion, dataset statistics, Parquet cleaning and sorting, HuggingFace–style metadata generation, and batched chain insertion into PostgreSQL — with Rich progress, multiprocessing, and 32 GB-RAM-friendly batching.
A flexible TypeScript framework for ingesting, processing, and exporting chat/conversation data for LLM training and analysis.
Upload data to PostHog-LLM
Standardized spec and vendor-specific transforms for ChatML
The Anti-Hallucination data layer for B2B Sourcing. Deep-verified global supply chain entities designed for RAG and LLM instruction tuning.
Add a description, image, and links to the chatml topic page so that developers can more easily learn about it.
To associate your repository with the chatml topic, visit your repo's landing page and select "manage topics."