-
Zhejiang University
Stars
Claw-R1: Empowering OpenClaw with Advanced Agentic RL.
AgentEvolver: Towards Efficient Self-Evolving Agent System
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
[ICLR 2026] The official repository for the paper "AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning".
GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
π» SETA: Scaling Environments for Terminal Agents
An End-to-End Infrastructure for Training and Evaluating Various LLM Agents
Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Scalable toolkit for efficient model reinforcement
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
Extrapolating RLVR to General Domains without Verifiers
π EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents
rl from zero pretrain, can it be done? yes.
Process Consistency Filter: Improve Reasoning Quality for LLM Reinforcement Learning
This is the official github repo for paper "mplicit User Feedback in Human-LLM Dialogues: Informative to Understand Users yet Noisy as a Learning Signal"
Interleaving Reasoning: Next-Generation Reasoning Systems for AGI
π A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond
Official repository for ACL 2025 Main Conference Paper "Keys to Robust Edits: From Theoretical Insights to Practical Advances"