Skip to main content

Showing 1–50 of 113 results for author: Mi, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.02642  [pdf, ps, other

    cs.HC

    Engagement Is Not Transfer: A Withdrawal Study of a Consumer Social Robot with Autistic Children at Home

    Authors: Yibo Meng, Guangrui Fan, Bingyi Liu, Yingfangzhong Sun, Ruiqi Chen, Haipeng Mi

    Abstract: This study examines whether engagement with social robots translates into improved human-directed social abilities in autistic children. We conducted an 8-week home-based randomized controlled trial with 40 children aged 5--9 using a commercial social robot (Qrobot). Families were assigned to either continued robot access or robot withdrawal. Quantitative measures and caregiver interviews assessed… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: Accepted by IDC 2026

    ACM Class: H.5.2

  2. arXiv:2602.22604  [pdf, ps, other

    cs.HC

    DuoMorph: Synergistic Integration of FDM Printing and Pneumatic Actuation for Shape-Changing Interfaces

    Authors: Xueqing Li, Danqi huang, Tianyu Yu, Shuzi Yin, Bingjie Gao, Anna Matsumoto, Zhihao Yao, Yiwei Zhao, Shiqing Lyu, Yuchen Tian, Lining Yao, Haipeng Mi, Qiuyu Lu

    Abstract: We introduce DuoMorph, a design and fabrication method that synergistically integrates Fused Deposition Modeling (FDM) printing and pneumatic actuation to create novel shape-changing interfaces. In DuoMorph, the printed structures and heat-sealed pneumatic elements are mutually designed to actuate and constrain each other, enabling functions that are difficult for either component to achieve in is… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  3. arXiv:2602.12108  [pdf, ps, other

    cs.AI

    The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

    Authors: Xiaoyuan Liu, Tian Liang, Dongyang Ma, Deyu Zhou, Haitao Mi, Pinjia He, Yan Wang

    Abstract: In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

  4. arXiv:2602.08335  [pdf, ps, other

    cs.AI

    Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System

    Authors: Yanming Li, Xuelin Zhang, WenJie Lu, Ziye Tang, Maodong Wu, Haotian Luo, Tongtong Wu, Zijie Peng, Hongze Mi, Yibo Feng, Naiqiang Tan, Chao Huang, Hong Chen, Li Shen

    Abstract: Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remains notoriously difficult due to the credit assignment challenge, as it is often unclear which specific functional agent is responsible for the success or failure of decision trajectories. Existing met… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

  5. arXiv:2602.08030  [pdf, ps, other

    cs.AI cs.CL

    Free(): Learning to Forget in Malloc-Only Reasoning Models

    Authors: Yilun Zheng, Dongyang Ma, Tian Liang, Jiahao Xu, Xinting Huang, Lihui Chen, Haitao Mi, Yan Wang

    Abstract: Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To brea… ▽ More

    Submitted 10 February, 2026; v1 submitted 8 February, 2026; originally announced February 2026.

  6. arXiv:2602.05085  [pdf, ps, other

    cs.CL

    Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

    Authors: Sidi Lu, Zhenwen Liang, Dongyang Ma, Yan Wang, Haitao Mi, Dong Yu

    Abstract: In this paper, we aim to bridge test-time-training with a new type of parametric memory that can be flexibly offloaded from or merged into model parameters. We present Locas, a Locally-Supported parametric memory that shares the design of FFN blocks in modern transformers, allowing it to be flexibly permanentized into the model parameters while supporting efficient continual learning. We discuss t… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

    Comments: Tencent AI Lab Technical Report

  7. arXiv:2602.03412  [pdf, ps, other

    cs.CL

    Verified Critical Step Optimization for LLM Agents

    Authors: Mukai Li, Qingcheng Zeng, Tianqing Fang, Zhenwen Liang, Linfeng Song, Qi Liu, Haitao Mi, Dong Yu

    Abstract: As large language model agents tackle increasingly complex long-horizon tasks, effective post-training becomes critical. Prior work faces fundamental challenges: outcome-only rewards fail to precisely attribute credit to intermediate steps, estimated step-level rewards introduce systematic noise, and Monte Carlo sampling approaches for step reward estimation incur prohibitive computational cost. I… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

    Comments: Working in progress

  8. arXiv:2602.00585  [pdf, ps, other

    cs.AI

    Exploring Information Seeking Agent Consolidation

    Authors: Guochen Yan, Jialong Wu, Zhengwei Tao, Bo Li, Qintong Zhang, Jiahao Xu, Haitao Mi, Yuejian Fang, Qingni Shen, Wentao Zhang, Zhonghai Wu

    Abstract: Information-seeking agents have emerged as a powerful paradigm for solving knowledge-intensive tasks. Existing information-seeking agents are typically specialized for open web, documents, or local knowledge bases, which constrains scalability and cross-domain generalization. In this work, we investigate how to consolidate heterogeneous information-seeking agents into a single foundation agentic m… ▽ More

    Submitted 31 January, 2026; originally announced February 2026.

  9. arXiv:2601.22528  [pdf, ps, other

    cs.AI

    Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution

    Authors: Hongze Mi, Yibo Feng, WenJie Lu, Song Cao, Jinyuan Li, Yanming Li, Xuelin Zhang, Haotian Luo, Songyang Peng, He Cui, Tengfei Tian, Jun Fang, Hua Chai, Naiqiang Tan

    Abstract: Multimodal Large Language Model (MLLM) agents facilitate Graphical User Interface (GUI) automation but struggle with long-horizon, cross-application tasks due to limited context windows. While memory systems provide a viable solution, existing paradigms struggle to adapt to dynamic GUI environments, suffering from a granularity mismatch between high-level intent and low-level execution, and contex… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

  10. arXiv:2601.19280  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

    Authors: Kishan Panaganti, Zhenwen Liang, Wenhao Yu, Haitao Mi, Dong Yu

    Abstract: Recent progress in Large Language Model (LLM) reasoning is increasingly driven by the refinement of post-training loss functions and alignment strategies. However, standard Reinforcement Learning (RL) paradigms like Group Relative Policy Optimization (GRPO) remain constrained by static uniformity: uniform prompt sampling and a fixed number of rollouts per prompt. For heterogeneous, heavy-tailed re… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

    Comments: Keywords: Large Language Models, Reasoning Models, Reinforcement Learning, Distributionally Robust Optimization, GRPO

  11. arXiv:2601.18984  [pdf, ps, other

    cs.LG cs.CL

    Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning

    Authors: Haolin Liu, Dian Yu, Sidi Lu, Yujun Zhou, Rui Liu, Zhenwen Liang, Haitao Mi, Chen-Yu Wei, Dong Yu

    Abstract: Reinforcement learning (RL) has emerged as a powerful framework for improving the reasoning capabilities of large language models (LLMs). However, most existing RL approaches rely on sparse outcome rewards, which fail to credit correct intermediate steps in partially successful solutions. Process reward models (PRMs) offer fine-grained step-level supervision, but their scores are often noisy and d… ▽ More

    Submitted 26 January, 2026; originally announced January 2026.

  12. arXiv:2601.15808  [pdf, ps, other

    cs.AI

    Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

    Authors: Yuxuan Wan, Tianqing Fang, Zaitang Li, Yintong Huo, Wenxuan Wang, Haitao Mi, Dong Yu, Michael R. Lyu

    Abstract: Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving. While the majority of existing efforts focus on enhancing policy capabilities via post-training, we propose an alternative paradigm: self-evolving the agent's ability by iteratively verifying the policy model's outputs, guided by meticulously crafted rubrics. This approach gives rise… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  13. arXiv:2601.08699  [pdf, ps, other

    cs.CL

    RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis

    Authors: Zhengwei Tao, Bo Li, Jialong Wu, Guochen Yan, Huanyao Zhang, Jiahao Xu, Haitao Mi, Wentao Zhang

    Abstract: Agentic Retrieval-Augmented Generation (RAG) empowers large language models to autonomously plan and retrieve information for complex problem-solving. However, the development of robust agents is hindered by the scarcity of high-quality training data that reflects the noise and complexity of real-world retrieval environments. Conventional manual annotation is unscalable and often fails to capture… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

  14. arXiv:2601.05163  [pdf, ps, other

    cs.CL

    DocDancer: Towards Agentic Document-Grounded Information Seeking

    Authors: Qintong Zhang, Xinjie Lv, Jialong Wu, Baixuan Li, Zhengwei Tao, Guochen Yan, Huanyao Zhang, Bin Wang, Jiahao Xu, Haitao Mi, Wentao Zhang

    Abstract: Document Question Answering (DocQA) focuses on answering questions grounded in given documents, yet existing DocQA agents lack effective tool utilization and largely rely on closed-source models. In this work, we introduce DocDancer, an end-to-end trained open-source Doc agent. We formulate DocQA as an information-seeking problem and propose a tool-driven agent framework that explicitly models doc… ▽ More

    Submitted 8 January, 2026; originally announced January 2026.

  15. arXiv:2512.24873  [pdf, ps, other

    cs.AI cs.CL

    Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

    Authors: Weixun Wang, XiaoXiao Xu, Wanhe An, Fangwen Dai, Wei Gao, Yancheng He, Ju Huang, Qiang Ji, Hanqi Jin, Xiaoyang Li, Yang Li, Zhongwen Li, Shirong Lin, Jiashun Liu, Zenan Liu, Tao Luo, Dilxat Muhtar, Yuanbin Qu, Jiaqiang Shi, Qinghui Sun, Yingshui Tan, Hao Tang, Runze Wang, Yi Wang, Zhaoguo Wang , et al. (65 additional authors not shown)

    Abstract: Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, the open-source community lacks a principled, end-to-end ecosystem to streamline agent development. We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimizes the production p… ▽ More

    Submitted 11 March, 2026; v1 submitted 31 December, 2025; originally announced December 2025.

    Comments: 36 pages, 15 figures

  16. arXiv:2512.18215  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CV

    Stable and Efficient Single-Rollout RL for Multimodal Reasoning

    Authors: Rui Liu, Dian Yu, Lei Ke, Haolin Liu, Yujun Zhou, Zhenwen Liang, Haitao Mi, Pratap Tokekar, Dong Yu

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a key paradigm to improve the reasoning capabilities of Multimodal Large Language Models (MLLMs). However, prevalent group-based algorithms such as GRPO require multi-rollout sampling for each prompt. While more efficient single-rollout variants have recently been explored in text-only settings, we find that they suffer from severe i… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

  17. arXiv:2512.15687  [pdf, ps, other

    cs.LG cs.AI

    Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

    Authors: Zhenwen Liang, Sidi Lu, Wenhao Yu, Kishan Panaganti, Yujun Zhou, Haitao Mi, Dong Yu

    Abstract: Reinforcement learning has become essential for strengthening the reasoning abilities of large language models, yet current exploration mechanisms remain fundamentally misaligned with how these models actually learn. Entropy bonuses and external semantic comparators encourage surface level variation but offer no guarantee that sampled trajectories differ in the update directions that shape optimiz… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  18. arXiv:2512.15086  [pdf, ps, other

    cs.LG physics.comp-ph

    PIP$^2$ Net: Physics-informed Partition Penalty Deep Operator Network

    Authors: Hongjin Mi, Huiqiang Lun, Changhong Mou, Yeyu Zhang

    Abstract: Operator learning has become a powerful tool for accelerating the solution of parameterized partial differential equations (PDEs), enabling rapid prediction of full spatiotemporal fields for new initial conditions or forcing functions. Existing architectures such as DeepONet and the Fourier Neural Operator (FNO) show strong empirical performance but often require large training datasets, lack expl… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  19. arXiv:2512.02472  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Guided Self-Evolving LLMs with Minimal Human Supervision

    Authors: Wenhao Yu, Zhenwen Liang, Chengsong Huang, Kishan Panaganti, Tianqing Fang, Haitao Mi, Dong Yu

    Abstract: AI self-evolution has long been envisioned as a path toward superintelligence, where models autonomously acquire, refine, and internalize knowledge from their own learning experiences. Yet in practice, unguided self-evolving systems often plateau quickly or even degrade as training progresses. These failures arise from issues such as concept drift, diversity collapse, and mis-evolution, as models… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  20. arXiv:2511.08151   

    cs.AI cs.CL cs.MA

    SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning

    Authors: Xuchen Li, Ruitao Wu, Xuanbo Liu, Xukai Wang, Jinbo Hu, Zhixin Bai, Bohan Zeng, Hao Liang, Leheng Chen, Mingrui Chen, Haitian Zhong, Xuanlin Yang, Xu-Yao Zhang, Liu Liu, Jia Li, Kaiqi Huang, Jiahao Xu, Haitao Mi, Wentao Zhang, Bin Dong

    Abstract: Recent advances in large language models have enabled AI systems to achieve expert-level performance on domain-specific scientific tasks, yet these systems remain narrow and handcrafted. We introduce SciAgent, a unified multi-agent system designed for generalistic scientific reasoning-the ability to adapt reasoning strategies across disciplines and difficulty levels. SciAgent organizes problem sol… ▽ More

    Submitted 17 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: 1. To ensure result rigor, the model outputs require further evaluation by human experts. 2. The results may affect our conclusions and methods, thus necessitating a more detailed review. 3. We anticipate subsequent revisions may be substantial, potentially involving major adjustments to the methodology. Given the uncertainty surrounding the revision process, we decide to request a withdrawal

  21. arXiv:2510.27419  [pdf, ps, other

    cs.AI cs.CL

    DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

    Authors: Tian Liang, Wenxiang Jiao, Zhiwei He, Jiahao Xu, Haitao Mi, Dong Yu

    Abstract: Large Reasoning Models (LRMs) have demonstrated impressive capabilities but suffer from cognitive inefficiencies like "overthinking" simple problems and "underthinking" complex ones. While existing methods that use supervised fine-tuning (SFT) or reinforcement learning (RL) with token-length rewards can improve efficiency, they often do so at the cost of accuracy. This paper introduces DeepCompres… ▽ More

    Submitted 23 March, 2026; v1 submitted 31 October, 2025; originally announced October 2025.

    Comments: ICLR 2026

  22. arXiv:2510.26697  [pdf, ps, other

    cs.CL cs.AI

    The End of Manual Decoding: Towards Truly End-to-End Language Models

    Authors: Zhichao Wang, Dongyang Ma, Xinting Huang, Deng Cai, Tian Lan, Jiahao Xu, Haitao Mi, Xiaoying Tang, Yan Wang

    Abstract: The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel architecture that enables truly "end-to-end" generation by learning to control its own decoding strategy. We augment the standard transformer with lightweight head… ▽ More

    Submitted 31 October, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

  23. arXiv:2510.20187  [pdf, ps, other

    cs.LG cs.CL

    Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

    Authors: Dian Yu, Yulai Zhao, Kishan Panaganti, Linfeng Song, Haitao Mi, Dong Yu

    Abstract: We propose Reinforcement Learning with Explicit Human Values (RLEV), a method that aligns Large Language Model (LLM) optimization directly with quantifiable human value signals. While Reinforcement Learning with Verifiable Rewards (RLVR) effectively trains models in objective domains using binary correctness rewards, it overlooks that not all tasks are equally significant. RLEV extends this framew… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: 15 pages, 4 figures

  24. arXiv:2510.14438  [pdf, ps, other

    cs.CL

    Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents

    Authors: Rui Wang, Ce Zhang, Jun-Yu Ma, Jianshu Zhang, Hongru Wang, Yi Chen, Boyang Xue, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu, Kam-Fai Wong

    Abstract: Deep research web agents not only retrieve information from diverse sources such as web environments, files, and multimodal inputs, but more importantly, they need to rigorously analyze and aggregate knowledge for insightful research. However, existing open-source deep research agents predominantly focus on enhancing information-seeking capabilities of web agents to locate specific information, wh… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  25. arXiv:2510.01591  [pdf, ps, other

    cs.CL

    CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

    Authors: Zhenwen Liang, Ruosen Li, Yujun Zhou, Linfeng Song, Dian Yu, Xinya Du, Haitao Mi, Dong Yu

    Abstract: Assessing the quality of Large Language Model (LLM) outputs presents a critical challenge. Previous methods either rely on text-level information (e.g., reward models, majority voting), which can overfit to superficial cues, or on calibrated confidence from token probabilities, which would fail on less-calibrated models. Yet both of these signals are, in fact, partial projections of a richer sourc… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

  26. arXiv:2510.01444  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning

    Authors: Rui Liu, Dian Yu, Tong Zheng, Runpeng Dai, Zongxia Li, Wenhao Yu, Zhenwen Liang, Linfeng Song, Haitao Mi, Pratap Tokekar, Dong Yu

    Abstract: Reinforcement learning with verifiable rewards (RLVR) has advanced reasoning capabilities in multimodal large language models. However, existing methods typically treat visual inputs as deterministic, overlooking the perceptual ambiguity inherent to the visual modality. Consequently, they fail to distinguish whether a model's uncertainty stems from complex reasoning or ambiguous perception, preven… ▽ More

    Submitted 15 January, 2026; v1 submitted 1 October, 2025; originally announced October 2025.

  27. arXiv:2509.21799  [pdf, ps, other

    cs.AI

    D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents

    Authors: Hongze Mi, Yibo Feng, Wenjie Lu, Yuqi Wang, Jinyuan Li, Song Cao, He Cui, Tengfei Tian, Xuelin Zhang, Haotian Luo, Di Sun, Jun Fang, Hua Chai, Naiqiang Tan, Gang Pan

    Abstract: Graphical User Interface (GUI) agents aim to automate a wide spectrum of human tasks by emulating user interaction. Despite rapid advancements, current approaches are hindered by several critical challenges: data bottleneck in end-to-end training, high cost of delayed error detection, and risk of contradictory guidance. Inspired by the human cognitive loop of Thinking, Alignment, and Reflection, w… ▽ More

    Submitted 6 January, 2026; v1 submitted 25 September, 2025; originally announced September 2025.

  28. arXiv:2509.21766  [pdf, ps, other

    cs.AI cs.CL

    UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios

    Authors: Haotian Luo, Huaisong Zhang, Xuelin Zhang, Haoyu Wang, Zeyu Qin, Wenjie Lu, Guozheng Ma, Haiying He, Yingsha Xie, Qiyang Zhou, Zixuan Hu, Hongze Mi, Yibo Wang, Naiqiang Tan, Hong Chen, Yi R. Fung, Chun Yuan, Li Shen

    Abstract: Autonomous agents have recently achieved remarkable progress across diverse domains, yet most evaluations focus on short-horizon, fully observable tasks. In contrast, many critical real-world tasks, such as large-scale software development, commercial investment, and scientific discovery, unfold in long-horizon and partially observable scenarios where success hinges on sustained reasoning, plannin… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  29. arXiv:2509.15763  [pdf, ps, other

    cs.CL

    UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression

    Authors: Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Tianqing Fang, Hongming Zhang, Haitao Mi, Dong Yu, Zhicheng Dou

    Abstract: Large language models are increasingly capable of handling long-context inputs, but the memory overhead of key-value (KV) cache remains a major bottleneck for general-purpose deployment. While various compression strategies have been explored, sequence-level compression, which drops the full KV caches for certain tokens, is particularly challenging as it can lead to the loss of important contextua… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

    Comments: 15 pages, 7 figures

  30. arXiv:2509.15194  [pdf, ps, other

    cs.LG cs.CL

    Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

    Authors: Yujun Zhou, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu

    Abstract: Large language models (LLMs) are increasingly trained with reinforcement learning from verifiable rewards (RLVR), yet real-world deployment demands models that can self-improve without labels or external judges. Existing self-improvement approaches primarily rely on self-confirmation signals (e.g., confidence, entropy, or consistency) to generate rewards. This reliance drives models toward over-co… ▽ More

    Submitted 17 February, 2026; v1 submitted 18 September, 2025; originally announced September 2025.

  31. arXiv:2509.12603  [pdf, ps, other

    cs.CL cs.AI

    EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

    Authors: Mukai Li, Linfeng Song, Zhenwen Liang, Jiahao Xu, Shansan Gong, Qi Liu, Haitao Mi, Dong Yu

    Abstract: Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP), attaining substantial performance gains through widely adopted test-time scaling strategies, notably reflective Chain-of-Thought (CoT) reasoning and increased sampling passes. However, they both introduce significant computational overhead for inference. Moreover, existing cost analyses typically regu… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

  32. arXiv:2509.09675  [pdf, ps, other

    cs.CL cs.AI cs.LG

    CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

    Authors: Runpeng Dai, Linfeng Song, Haolin Liu, Zhenwen Liang, Dian Yu, Haitao Mi, Zhaopeng Tu, Rui Liu, Tong Zheng, Hongtu Zhu, Dong Yu

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is a powerful paradigm for enhancing the reasoning ability of Large Language Models (LLMs). Yet current RLVR methods often explore poorly, leading to premature convergence and entropy collapse. To address this challenge, we introduce Curiosity-Driven Exploration (CDE), a framework that leverages the model's own intrinsic sense of curiosity to g… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: 21 pages

  33. arXiv:2508.19652  [pdf, ps, other

    cs.CV

    Self-Rewarding Vision-Language Model via Reasoning Decomposition

    Authors: Zongxia Li, Wenhao Yu, Chengsong Huang, Rui Liu, Zhenwen Liang, Fuxiao Liu, Jingxi Che, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu

    Abstract: Vision-Language Models (VLMs) often suffer from visual hallucinations, saying things that are not actually in the image, and language shortcuts, where they skip the visual part and just rely on text priors. These issues arise because most post-training methods for VLMs rely on simple verifiable answer matching and supervise only final outputs, leaving intermediate visual reasoning without explicit… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: 16 pages, two figures

  34. arXiv:2508.05004  [pdf, ps, other

    cs.LG cs.AI cs.CL

    R-Zero: Self-Evolving Reasoning LLM from Zero Data

    Authors: Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu

    Abstract: Self-evolving Large Language Models (LLMs) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward… ▽ More

    Submitted 13 February, 2026; v1 submitted 6 August, 2025; originally announced August 2025.

  35. arXiv:2508.00414  [pdf, ps, other

    cs.AI cs.CL

    Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

    Authors: Tianqing Fang, Zhisong Zhang, Xiaoyang Wang, Rui Wang, Can Qin, Yuxuan Wan, Jun-Yu Ma, Ce Zhang, Jiaqi Chen, Xiyun Li, Hongming Zhang, Haitao Mi, Dong Yu

    Abstract: General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence, enabling complex reasoning, web interaction, coding, and autonomous research capabilities. However, current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools, limiting accessibility and reproducibility for the research… ▽ More

    Submitted 12 August, 2025; v1 submitted 1 August, 2025; originally announced August 2025.

    Comments: 16 pages

  36. arXiv:2507.08794  [pdf, ps, other

    cs.LG cs.CL

    One Token to Fool LLM-as-a-Judge

    Authors: Yulai Zhao, Haolin Liu, Dian Yu, Sunyuan Kung, Meijia Chen, Haitao Mi, Dong Yu

    Abstract: Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluation and providing reward signals for training other models, particularly in reference-based settings like Reinforcement Learning with Verifiable Rewards (RLVR). However, we uncover a critical vulnerability even in this reference-based paradigm: generative reward models are systematically susceptible to rewa… ▽ More

    Submitted 26 September, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

  37. arXiv:2507.06804  [pdf, ps, other

    cs.LO cs.AI

    Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving

    Authors: Zhenwen Liang, Linfeng Song, Yang Li, Tao Yang, Feng Zhang, Haitao Mi, Dong Yu

    Abstract: Automated Theorem Proving (ATP) in formal languages is a foundational challenge for AI. While Large Language Models (LLMs) have driven remarkable progress, a significant gap remains between their powerful informal reasoning capabilities and their weak formal proving performance. Recent studies show that the informal accuracy exceeds 80% while formal success remains below 8% on benchmarks like Putn… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Work in progress

  38. arXiv:2507.05720  [pdf, ps, other

    cs.LG cs.CL

    MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

    Authors: Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, Dong Yu

    Abstract: Recently, there has been a surge of vision-based GUI agents designed to automate everyday mobile and web tasks. These agents interpret raw GUI screenshots and autonomously decide where to click, scroll, or type, which bypasses handcrafted rules and app-specific APIs. However, most existing methods trained GUI agent in the offline environment using pre-collected trajectories. This approach limits s… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 17 pages, 4 figures

  39. arXiv:2507.03839  [pdf, ps, other

    cs.AI cs.GR

    Participatory Evolution of Artificial Life Systems via Semantic Feedback

    Authors: Shuowen Li, Kexin Wang, Minglu Fang, Danqi Huang, Ali Asadipour, Haipeng Mi, Yitong Sun

    Abstract: We present a semantic feedback framework that enables natural language to guide the evolution of artificial life systems. Integrating a prompt-to-parameter encoder, a CMA-ES optimizer, and CLIP-based evaluation, the system allows user intent to modulate both visual outcomes and underlying behavioral rules. Implemented in an interactive ecosystem simulation, the framework supports prompt refinement… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 10 pages

  40. arXiv:2506.15683  [pdf, ps, other

    cs.CL cs.CY

    PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning

    Authors: Yuhui Shi, Yehan Yang, Qiang Sheng, Hao Mi, Beizhe Hu, Chaoxi Xu, Juan Cao

    Abstract: With the popularity of large language models (LLMs), undesirable societal problems like misinformation production and academic misconduct have been more severe, making LLM-generated text detection now of unprecedented importance. Although existing methods have made remarkable progress, a new challenge posed by text from privately tuned LLMs remains underexplored. Users could easily possess private… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 17 pages, 3 figures, 6 tables

  41. arXiv:2505.23754  [pdf, ps, other

    cs.CL cs.AI

    DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

    Authors: Ziyin Zhang, Jiahao Xu, Zhiwei He, Tian Liang, Qiuzhi Liu, Yansi Li, Linfeng Song, Zhenwen Liang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu

    Abstract: Theorem proving serves as a major testbed for evaluating complex reasoning abilities in large language models (LLMs). However, traditional automated theorem proving (ATP) approaches rely heavily on formal proof systems that poorly align with LLMs' strength derived from informal, natural language knowledge acquired during pre-training. In this work, we propose DeepTheorem, a comprehensive informal… ▽ More

    Submitted 3 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  42. arXiv:2505.22654  [pdf, ps, other

    cs.CV cs.CL

    VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models

    Authors: Ce Zhang, Kaixin Ma, Tianqing Fang, Wenhao Yu, Hongming Zhang, Zhisong Zhang, Haitao Mi, Dong Yu

    Abstract: Recent Large Vision-Language Models (LVLMs) have advanced multi-modal understanding by incorporating finer-grained visual perception and encoding. However, such methods incur significant computational costs due to longer visual token sequences, posing challenges for real-time deployment. To mitigate this, prior studies have explored pruning unimportant visual tokens either at the output layer of t… ▽ More

    Submitted 29 January, 2026; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted at TMLR 2026. Project page: https://zhangce01.github.io/VScan/

  43. arXiv:2505.22156  [pdf, ps, other

    cs.CL

    InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing

    Authors: Shuaiyi Li, Zhisong Zhang, Yang Deng, Chenlong Deng, Tianqing Fang, Hongming Zhang, Haitao Mi, Dong Yu, Wai Lam

    Abstract: Although existing model editing methods perform well in recalling exact edit facts, they often struggle in complex scenarios that require deeper semantic understanding rather than mere knowledge regurgitation. Leveraging the strong contextual reasoning abilities of large language models (LLMs), in-context learning (ICL) becomes a promising editing method by comprehending edit information through c… ▽ More

    Submitted 7 January, 2026; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: 18 pages,5 figures

  44. arXiv:2505.20013  [pdf, ps, other

    cs.CL

    WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback

    Authors: Minda Hu, Tianqing Fang, Jianshu Zhang, Junyu Ma, Zhisong Zhang, Jingyan Zhou, Hongming Zhang, Haitao Mi, Dong Yu, Irwin King

    Abstract: Web agents powered by Large Language Models (LLMs) show promise for next-generation AI, but their limited reasoning in uncertain, dynamic web environments hinders robust deployment. In this paper, we identify key reasoning skills essential for effective web agents, i.e., reflection & lookahead, branching, and rollback, and curate trajectory data that exemplifies these abilities by reconstructing t… ▽ More

    Submitted 18 September, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: 18 pages

  45. arXiv:2505.14681  [pdf, other

    cs.AI cs.CL cs.CV cs.IR cs.LG

    Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

    Authors: Mengru Wang, Xingyu Chen, Yue Wang, Zhiwei He, Jiahao Xu, Tian Liang, Qiuzhi Liu, Yunzhi Yao, Wenxuan Wang, Ruotian Ma, Haitao Mi, Ningyu Zhang, Zhaopeng Tu, Xiaolong Li, Dong Yu

    Abstract: Mixture-of-Experts (MoE) architectures within Large Reasoning Models (LRMs) have achieved impressive reasoning capabilities by selectively activating experts to facilitate structured cognitive processes. Despite notable advances, existing reasoning models often suffer from cognitive inefficiencies like overthinking and underthinking. To address these limitations, we introduce a novel inference-tim… ▽ More

    Submitted 27 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Work in progress

  46. arXiv:2505.13445  [pdf, other

    cs.AI cs.CL

    Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

    Authors: Xiaoyuan Liu, Tian Liang, Zhiwei He, Jiahao Xu, Wenxuan Wang, Pinjia He, Zhaopeng Tu, Haitao Mi, Dong Yu

    Abstract: Large Language Models (LLMs) show great promise in complex reasoning, with Reinforcement Learning with Verifiable Rewards (RLVR) being a key enhancement strategy. However, a prevalent issue is ``superficial self-reflection'', where models fail to robustly verify their own outputs. We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: code available at https://github.com/xyliu-cs/RISE

  47. arXiv:2505.10962  [pdf, ps, other

    cs.AI

    MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

    Authors: Zhenwen Liang, Linfeng Song, Yang Li, Tao Yang, Feng Zhang, Haitao Mi, Dong Yu

    Abstract: Automated Theorem Proving (ATP) in formal languages remains a formidable challenge in AI, demanding rigorous logical deduction and navigating vast search spaces. While large language models (LLMs) have shown promising performance, existing stepwise provers often suffer from biased search guidance, leading to inefficiencies and suboptimal proof strategies. This paper introduces the Multi-Perspectiv… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Work in Progress

  48. arXiv:2505.03320  [pdf, ps, other

    cs.CL

    Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

    Authors: Junyu Ma, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu

    Abstract: Mamba's theoretical infinite-context potential is limited in practice when sequences far exceed training lengths. This work explores unlocking Mamba's long-context memory ability by a simple-yet-effective method, Recall with Reasoning (RwR), by distilling chain-of-thought (CoT) summarization from a teacher model. Specifically, RwR prepends these summarization as CoT prompts during fine-tuning, tea… ▽ More

    Submitted 3 June, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  49. arXiv:2504.21024  [pdf, ps, other

    cs.CL

    WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model

    Authors: Tianqing Fang, Hongming Zhang, Zhisong Zhang, Kaixin Ma, Wenhao Yu, Haitao Mi, Dong Yu

    Abstract: Agent self-improvement, where the backbone Large Language Model (LLM) of the agent are trained on trajectories sampled autonomously based on their own policies, has emerged as a promising approach for enhancing performance. Recent advancements, particularly in web environments, face a critical limitation: their performance will reach a stagnation point during autonomous learning cycles, hindering… ▽ More

    Submitted 21 August, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: EMNLP 2025 Main Conference

  50. arXiv:2504.11788  [pdf, ps, other

    cs.CL cs.AI

    WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms

    Authors: Zhisong Zhang, Tianqing Fang, Kaixin Ma, Wenhao Yu, Hongming Zhang, Haitao Mi, Dong Yu

    Abstract: With recent advancements in large language models, web agents have been greatly improved. However, dealing with complex and dynamic web environments requires more advanced planning and search abilities. Previous studies usually adopt a greedy one-way search strategy, which may struggle to recover from erroneous states. In this work, we enhance web agents with an explicit rollback mechanism, enabli… ▽ More

    Submitted 14 January, 2026; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: EACL 2026