-
Shanghai Jiao Tong University
- Shanghai
- https://orcid.org/0009-0002-2842-2259
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
This includes the original implementation of "AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning".
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …
A fast + lightweight implementation of the GCG algorithm in PyTorch
一个基于nano banana pro🍌的原生AI PPT生成应用,迈向真正的"Vibe PPT"; 支持上传任意模板图片;上传任意素材&智能解析;一句话/大纲/页面描述自动生成PPT;口头修改指定区域、一键导出 - An AI-native PPT generator based on nano banana pro🍌
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
Aligning pretrained language models with instruction data generated by themselves.
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
[New Preprint] PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
Universal and Transferable Attacks on Aligned Language Models
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
[NAACL 2025] SIUO: Cross-Modality Safety Alignment
A multi-platform proxy client based on ClashMeta,simple and easy to use, open-source and ad-free.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Code and documentation to train Stanford's Alpaca models, and generate the data.
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
The official repository for guided jailbreak benchmark
Finetune Llama-3-8b on the MathInstruct dataset
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster 系列是 Alibaba-AAIG 自研的安全模型,致力于构建负责任的 AI 生态。
A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step