Arcs-ur

🍀

Orange!Pumpkin!

SheldonChen Arcs-ur

🍀

Orange!Pumpkin!

SJTU➡️SJTU, majoring in Information Security

23 followers · 133 following

Shanghai Jiao Tong University
Shanghai
https://orcid.org/0009-0002-2842-2259

Achievements

Highlights

Lists (1)

Sort

✨ Inspiration

1 repository

Stars

hank0316 / AdaSearch

This includes the original implementation of "AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning".

5 Updated Dec 20, 2025

agencyenterprise / PromptInject

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …

Python 445 42 Updated Feb 26, 2024

GraySwanAI / nanoGCG

A fast + lightweight implementation of the GCG algorithm in PyTorch

Python 306 73 Updated May 13, 2025

raminmh / CfC

Closed-form Continuous-time Neural Networks

Python 996 158 Updated Jul 5, 2024

Anionex / banana-slides

一个基于nano banana pro🍌的原生AI PPT生成应用，迈向真正的＂Vibe PPT＂; 支持上传任意模板图片；上传任意素材&智能解析；一句话/大纲/页面描述自动生成PPT；口头修改指定区域、一键导出 - An AI-native PPT generator based on nano banana pro🍌

Python 5,905 655 Updated Dec 23, 2025

FBEC

andyrdt / refusal_direction

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Python 321 87 Updated Jun 13, 2025

yizhongw / self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Python 4,550 524 Updated Mar 27, 2023

LianjiaTech / BELLE

BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）

HTML 8,282 768 Updated Oct 16, 2024

THUDM / LongBench

LongBench v2 and LongBench (ACL 25'&24')

Python 1,047 112 Updated Jan 15, 2025

apple / pico-banana-400k

Python 1,738 77 Updated Dec 16, 2025

sleeepeer / PISanitizer

[New Preprint] PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization

Python 8 Updated Dec 10, 2025

llm-attacks / llm-attacks

Universal and Transferable Attacks on Aligned Language Models

Python 4,409 587 Updated Aug 2, 2024

PKU-YuanGroup / Look-Back

This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".

Python 76 4 Updated Jul 10, 2025

sinwang20 / SIUO

[NAACL 2025] SIUO: Cross-Modality Safety Alignment

HTML 123 11 Updated Jan 31, 2025

chen08209 / FlClash

A multi-platform proxy client based on ClashMeta,simple and easy to use, open-source and ad-free.

Dart 27,892 1,683 Updated Dec 23, 2025

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 814 122 Updated Aug 16, 2024

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Jupyter Notebook 1,929 295 Updated Aug 9, 2025

tatsu-lab / stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 30,262 4,029 Updated Jul 17, 2024

Sizhe-Chen / StruQ

official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries

Python 56 3 Updated Nov 10, 2025

SproutNan / AI-Safety_Benchmark

The official repository for guided jailbreak benchmark

Python 26 Updated Jul 28, 2025

togethercomputer / finetuning

Finetune Llama-3-8b on the MathInstruct dataset

Python 115 26 Updated Oct 17, 2024

PKU-Alignment / beavertails

BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

Makefile 171 6 Updated Oct 27, 2023

thu-ml / STAIR

Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"

Python 86 6 Updated Feb 26, 2025

Alibaba-AAIG / Oyster

The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster 系列是 Alibaba-AAIG 自研的安全模型，致力于构建负责任的 AI 生态。

Python 57 3 Updated Sep 11, 2025