Skip to main content

Showing 1–50 of 137 results for author: Xiong, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.10096  [pdf, ps, other

    cs.CV

    ABot-Claw: A Foundation for Persistent, Cooperative, and Self-Evolving Robotic Agents

    Authors: Dongjie Huo, Haoyun Liu, Guoqing Liu, Dekang Qi, Zhiming Sun, Maoguo Gao, Jianxin He, Yandan Yang, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu

    Abstract: Current embodied intelligent systems still face a substantial gap between high-level reasoning and low-level physical execution in open-world environments. Although Vision-Language-Action (VLA) models provide strong perception and intuitive responses, their open-loop nature limits long-horizon performance. Agents incorporating System 2 cognitive mechanisms improve planning, but usually operate in… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    MSC Class: 68T45; 68T40; 68T42; 68T50 ACM Class: I.2.10; I.2.9; I.2.11; I.2.7

  2. arXiv:2604.09349  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Visually-Guided Policy Optimization for Multimodal Reasoning

    Authors: Zengbin Wang, Feng Xiong, Liang Lin, Xuecai Hu, Yong Wang, Yanlin Wang, Man Zhang, Xiangxiang Chu

    Abstract: Reinforcement learning with verifiable rewards (RLVR) has significantly advanced the reasoning ability of vision-language models (VLMs). However, the inherent text-dominated nature of VLMs often leads to insufficient visual faithfulness, characterized by sparse attention activation to visual tokens. More importantly, our empirical analysis reveals that temporal visual forgetting along reasoning st… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

    Comments: ACL 2026

  3. arXiv:2604.07877  [pdf, ps, other

    cs.CL

    MemReader: From Passive to Active Extraction for Long-Term Agent Memory

    Authors: Jingyi Kang, Chunyu Li, Ding Chen, Bo Tang, Feiyu Xiong, Zhiyu Li

    Abstract: Long-term memory is fundamental for personalized and autonomous agents, yet populating it remains a bottleneck. Existing systems treat memory extraction as a one-shot, passive transcription from context to structured entries, which struggles with noisy dialogue, missing references, and cross-turn dependencies, leading to memory pollution, low-value writes, and inconsistency. In this paper, we intr… ▽ More

    Submitted 10 April, 2026; v1 submitted 9 April, 2026; originally announced April 2026.

  4. arXiv:2603.29493  [pdf, ps, other

    cs.CL cs.AI

    MemFactory: Unified Inference & Training Framework for Agent Memory

    Authors: Ziliang Guo, Ziheng Li, Bo Tang, Feiyu Xiong, Zhiyu Li

    Abstract: Memory-augmented Large Language Models (LLMs) are essential for developing capable, long-term AI agents. Recently, applying Reinforcement Learning (RL) to optimize memory operations, such as extraction, updating, and retrieval, has emerged as a highly promising research direction. However, existing implementations remain highly fragmented and task-specific, lacking a unified infrastructure to stre… ▽ More

    Submitted 7 April, 2026; v1 submitted 31 March, 2026; originally announced March 2026.

    Comments: fixed Figure 1 typos, clarified ambiguous wording in the abstract, added 1 missing citation, Code: https://github.com/MemTensor/MemFactory

  5. arXiv:2603.23376  [pdf, ps, other

    cs.CV cs.RO

    ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

    Authors: Yuzhi Chen, Ronghan Chen, Dongjie Huo, Yandan Yang, Dekang Qi, Haoyun Liu, Tong Lin, Shuang Zeng, Junjin Xiao, Xinyuan Chang, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu

    Abstract: Video-based world models offer a powerful paradigm for embodied simulation and planning, yet state-of-the-art models often generate physically implausible manipulations - such as object penetration and anti-gravity motion - due to training on generic visual data and likelihood-based objectives that ignore physical laws. We present ABot-PhysWorld, a 14B Diffusion Transformer model that generates vi… ▽ More

    Submitted 27 March, 2026; v1 submitted 24 March, 2026; originally announced March 2026.

    Comments: Code: https://github.com/amap-cvlab/ABot-PhysWorld.git

  6. arXiv:2603.23231  [pdf, ps, other

    cs.AI

    PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

    Authors: Shuochen Liu, Junyi Zhu, Long Shu, Junda Lin, Yuhao Chen, Haotian Zhang, Chao Zhang, Derong Xu, Jia Li, Bo Tang, Zhiyu Li, Feiyu Xiong, Enhong Chen, Tong Xu

    Abstract: Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. However, prior evaluations typically interleave preference-related dialogues with irrelevant conversations, reducing the task to needle-in-a-haystack retrieval while ignoring relationships between events that drive the evolution of user preferences. Such settings overlook a fu… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  7. arXiv:2603.18554  [pdf, ps, other

    quant-ph cs.CV

    End-to-End QGAN-Based Image Synthesis via Neural Noise Encoding and Intensity Calibration

    Authors: Xue Yang, Rigui Zhou, Shizheng Jia, Dax Enshan Koh, Siong Thye Goh, Yaochong Li, Hongyu Chen, Fuhui Xiong

    Abstract: Quantum Generative Adversarial Networks (QGANs) offer a promising path for learning data distributions on near-term quantum devices. However, existing QGANs for image synthesis avoid direct full-image generation, relying on classical post-processing or patch-based methods. These approaches dilute the quantum generator's role and struggle to capture global image semantics. To address this, we propo… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  8. arXiv:2603.09716  [pdf, ps, other

    cs.AI

    AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents

    Authors: Xiaoxing Wang, Ning Liao, Shikun Wei, Chen Tang, Feiyu Xiong

    Abstract: Autonomous agent frameworks still struggle to reconcile long-term experiential learning with real-time, context-sensitive decision-making. In practice, this gap appears as static cognition, rigid workflow dependence, and inefficient context usage, which jointly limit adaptability in open-ended and non-stationary environments. To address these limitations, we present AutoAgent, a self-evolving mult… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  9. arXiv:2603.04448  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.MA

    SkillNet: Create, Evaluate, and Connect AI Skills

    Authors: Yuan Liang, Ruobin Zhong, Haoming Xu, Chen Jiang, Yi Zhong, Runnan Fang, Jia-Chen Gu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Xin Xu, Tongtong Wu, Kun Wang, Yang Liu, Zhen Bi, Jungang Lou, Yuchen Eleanor Jiang, Hangcheng Zhu, Gang Yu, Haiwen Hong, Longtao Huang, Hui Xue, Chenxi Wang, Yijun Wang , et al. (24 additional authors not shown)

    Abstract: Current AI agents can flexibly invoke tools and execute complex tasks, yet their long-term advancement is hindered by the lack of systematic accumulation and transfer of skills. Without a unified mechanism for skill consolidation, agents frequently ``reinvent the wheel'', rediscovering solutions in isolated contexts without leveraging prior strategies. To overcome this limitation, we introduce Ski… ▽ More

    Submitted 26 February, 2026; originally announced March 2026.

    Comments: http://skillnet.openkg.cn/

  10. arXiv:2603.01766  [pdf, ps, other

    cs.RO

    Neural Implicit Action Fields: From Discrete Waypoints to Continuous Functions for Vision-Language-Action Models

    Authors: Haoyun Liu, Jianzhuang Zhao, Xinyuan Chang, Tianle Shi, Chuanzhang Meng, Jiayuan Tan, Feng Xiong, Tong Lin, Dongjie Huo, Mu Xu, SongLin Dong, Zhiheng Ma, Yihong Gong, Sheng Zhong

    Abstract: Despite the rapid progress of Vision-Language-Action (VLA) models, the prevailing paradigm of predicting discrete waypoints remains fundamentally misaligned with the intrinsic continuity of physical motion. This discretization imposes rigid sampling rates, lacks high-order differentiability, and introduces quantization artifacts that hinder precise, compliant interaction. We propose Neural Implici… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

  11. arXiv:2602.23632  [pdf, ps, other

    cs.AI

    MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs

    Authors: Lun Zhan, Feng Xiong, Huanyong Liu, Feng Zhang, Yuhui Yin

    Abstract: Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches still fall short in functionality, granularity, customizability, and evaluation. To address these issues, we propose MMKG-RDS, a flexible framework f… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

  12. arXiv:2602.11241  [pdf, ps, other

    cs.CV cs.LG

    Active Zero: Self-Evolving Vision-Language Models through Active Environment Exploration

    Authors: Jinghan He, Junfeng Fang, Feng Xiong, Zijun Yao, Fei Shen, Haiyun Guo, Jinqiao Wang, Tat-Seng Chua

    Abstract: Self-play has enabled large language models to autonomously improve through self-generated challenges. However, existing self-play methods for vision-language models rely on passive interaction with static image collections, resulting in strong dependence on initial datasets and inefficient learning. Without the ability to actively seek visual data tailored to their evolving capabilities, agents w… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

  13. arXiv:2602.11236  [pdf, ps, other

    cs.CV cs.CL cs.RO

    ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning

    Authors: Yandan Yang, Shuang Zeng, Tong Lin, Xinyuan Chang, Dekang Qi, Junjin Xiao, Haoyun Liu, Ronghan Chen, Yuzhi Chen, Dongjie Huo, Feng Xiong, Xing Wei, Zhiheng Ma, Mu Xu

    Abstract: Building general-purpose embodied agents across diverse hardware remains a central challenge in robotics, often framed as the ''one-brain, many-forms'' paradigm. Progress is hindered by fragmented data, inconsistent representations, and misaligned training objectives. We present ABot-M0, a framework that builds a systematic data curation pipeline while jointly optimizing model architecture and tra… ▽ More

    Submitted 14 April, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

    Comments: Project website: https://amap-cvlab.github.io/ABot-Manipulation/ . Code: https://github.com/amap-cvlab/ABot-Manipulation . 22 pages, 10 figures, 10 tables

  14. arXiv:2602.07262  [pdf, ps, other

    cs.CV

    TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

    Authors: Junbo Jacob Lian, Feng Xiong, Yujun Sun, Kaichen Ouyang, Zong Ke, Mingyang Yu, Shengwei Fu, Zhong Rui, Zhang Yujun, Huiling Chen

    Abstract: Second-order feature statistics are central to texture recognition, yet current methods face a fundamental tension: bilinear pooling and Gram matrices capture global channel correlations but collapse spatial structure, while self-attention models spatial context through weighted aggregation rather than explicit pairwise feature interactions. We introduce TwistNet-2D, a lightweight module that comp… ▽ More

    Submitted 10 February, 2026; v1 submitted 6 February, 2026; originally announced February 2026.

    Comments: Code is available at https://github.com/junbolian/TwistNet-2D

  15. arXiv:2602.05467  [pdf, ps, other

    cs.CV cs.CL cs.RO

    MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation

    Authors: Dekang Qi, Shuang Zeng, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Mu Xu

    Abstract: Visual Language Navigation (VLN) is one of the fundamental capabilities for embodied intelligence and a critical challenge that urgently needs to be addressed. However, existing methods are still unsatisfactory in terms of both success rate (SR) and generalization: Supervised Fine-Tuning (SFT) approaches typically achieve higher SR, while Training-Free (TF) approaches often generalize better, but… ▽ More

    Submitted 12 April, 2026; v1 submitted 5 February, 2026; originally announced February 2026.

    Comments: 9 pages, 2 figures, 5 tables, conference

    MSC Class: 68T45; 68T40; 68T42; 68T50 ACM Class: I.2.10; I.2.9; I.2.11; I.2.7

  16. arXiv:2602.02178  [pdf, ps, other

    cs.CL

    AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models?

    Authors: Liang Lin, Feng Xiong, Zengbin Wang, Kun Wang, Junhao Dong, Xuecai Hu, Yong Wang, Xiangxiang Chu

    Abstract: Diffusion Large Language Models (DLLMs) have emerged as a powerful alternative to autoregressive models, enabling parallel token generation across multiple positions. However, preference alignment of DLLMs remains challenging due to high variance introduced by Evidence Lower Bound (ELBO)-based likelihood estimation. In this work, we propose AR-MAP, a novel transfer learning framework that leverage… ▽ More

    Submitted 2 February, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

  17. arXiv:2601.20354  [pdf, ps, other

    cs.CV

    Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

    Authors: Zengbin Wang, Xuecai Hu, Yong Wang, Feng Xiong, Man Zhang, Xiangxiang Chu

    Abstract: Text-to-image (T2I) models have achieved remarkable success in generating high-fidelity images, but they often fail in handling complex spatial relationships, e.g., spatial perception, reasoning, or interaction. These critical aspects are largely overlooked by current benchmarks due to their short or information-sparse prompt design. In this paper, we introduce SpatialGenEval, a new benchmark desi… ▽ More

    Submitted 29 January, 2026; v1 submitted 28 January, 2026; originally announced January 2026.

    Comments: Accepted by ICLR 2026, URL: https://github.com/AMAP-ML/SpatialGenEval

  18. arXiv:2601.19325  [pdf, ps, other

    cs.CV cs.AI

    Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

    Authors: Zichen Wen, Boxue Yang, Shuang Chen, Yaojie Zhang, Yuhang Han, Junlong Ke, Cong Wang, Yicheng Fu, Jiawang Zhao, Jiangchao Yao, Xi Fang, Zhen Wang, Henxing Cai, Lin Yao, Zhifeng Gao, Yanhui Hong, Nang Yuan, Yixuan Li, Guojiang Zhao, Haoyi Tao, Nan Wang, Han Lyu, Guolin Ke, Ning Liao, Xiaoxing Wang , et al. (9 additional authors not shown)

    Abstract: We present Innovator-VL, a scientific multimodal large language model designed to advance understanding and reasoning across diverse scientific domains while maintaining excellent performance on general vision tasks. Contrary to the trend of relying on massive domain-specific pretraining and opaque pipelines, our work demonstrates that principled training design and transparent methodology can yie… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

    Comments: Innovator-VL tech report

  19. arXiv:2601.05171  [pdf, ps, other

    cs.CL

    Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems

    Authors: Jihao Zhao, Ding Chen, Zhaoxin Fan, Kerun Xu, Mengting Hu, Bo Tang, Feiyu Xiong, Zhiyu Li

    Abstract: Existing long-term personalized dialogue systems struggle to reconcile unbounded interaction streams with finite context constraints, often succumbing to memory noise accumulation, reasoning degradation, and persona inconsistency. To address these challenges, this paper proposes Inside Out, a framework that utilizes a globally maintained PersonaTree as the carrier of long-term user profiling. By c… ▽ More

    Submitted 25 January, 2026; v1 submitted 8 January, 2026; originally announced January 2026.

  20. arXiv:2601.04562  [pdf, ps, other

    cs.AI

    Reasoning Over Space: Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation

    Authors: Dongyi Lv, Qiuyu Ding, Heng-Da Xu, Zhaoxu Sun, Zhi Wang, Feng Xiong, Mu Xu

    Abstract: Generative recommendation with large language models (LLMs) reframes prediction as sequence generation, yet existing LLM-based recommenders remain limited in leveraging geographic signals that are crucial in mobility and local-services scenarios. Here, we present Reasoning Over Space (ROS), a framework that utilizes geography as a vital decision variable within the reasoning process. ROS introduce… ▽ More

    Submitted 7 January, 2026; originally announced January 2026.

  21. arXiv:2601.03192  [pdf, ps, other

    cs.CL

    MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

    Authors: Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, Muning Wen

    Abstract: The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we pro… ▽ More

    Submitted 12 February, 2026; v1 submitted 6 January, 2026; originally announced January 2026.

    Comments: 41 pages, 11 figures

  22. arXiv:2512.24074  [pdf, ps, other

    cs.CV

    Balanced Hierarchical Contrastive Learning with Decoupled Queries for Fine-grained Object Detection in Remote Sensing Images

    Authors: Jingzhou Chen, Dexin Chen, Fengchao Xiong, Yuntao Qian, Liang Xiao

    Abstract: Fine-grained remote sensing datasets often use hierarchical label structures to differentiate objects in a coarse-to-fine manner, with each object annotated across multiple levels. However, embedding this semantic hierarchy into the representation learning space to improve fine-grained detection performance remains challenging. Previous studies have applied supervised contrastive learning at diffe… ▽ More

    Submitted 30 December, 2025; originally announced December 2025.

  23. arXiv:2512.20061  [pdf, ps, other

    cs.AI

    Scaling Reinforcement Learning for Content Moderation with Large Language Models

    Authors: Hamed Firooz, Rui Liu, Yuchen Lu, Zhenyu Hou, Fangzhou Xiong, Xiaoyang Zhang, Changshu Jian, Zhicheng Zhu, Jiayuan Ma, Jacob Tao, Chaitali Gupta, Xiaochang Peng, Shike Mei, Hang Cui, Yang Qin, Shuo Tang, Jason Gaedtke, Arpit Mittal

    Abstract: Content moderation at scale remains one of the most pressing challenges in today's digital ecosystem, where billions of user- and AI-generated artifacts must be continuously evaluated for policy violations. Although recent advances in large language models (LLMs) have demonstrated strong potential for policy-grounded moderation, the practical challenges of training these systems to achieve expert-… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

  24. arXiv:2512.19150  [pdf, ps, other

    cs.CV

    AMap: Distilling Future Priors for Ahead-Aware Online HD Map Construction

    Authors: Ruikai Li, Xinrun Li, Mengwei Xie, Hao Shan, Shoumeng Qiu, Xinyuan Chang, Yizhe Fan, Feng Xiong, Han Jiang, Yilong Ren, Haiyang Yu, Mu Xu, Yang Long, Varun Ojha, Zhiyong Cui

    Abstract: Online High-Definition (HD) map construction is pivotal for autonomous driving. While recent approaches leverage historical temporal fusion to improve performance, we identify a critical safety flaw in this paradigm: it is inherently ``spatially backward-looking." These methods predominantly enhance map reconstruction in traversed areas, offering minimal improvement for the unseen road ahead. Cruc… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

    Comments: 19 pages, 11 figures

  25. arXiv:2512.14718  [pdf, ps, other

    cs.LG cs.AI

    SEED: Spectral Entropy-Guided Evaluation of SpatialTemporal Dependencies for Multivariate Time Series Forecasting

    Authors: Feng Xiong, Zongxia Xie, Yanru Sun, Haoyu Wang, Jianhong Lin

    Abstract: Effective multivariate time series forecasting often benefits from accurately modeling complex inter-variable dependencies. However, existing attention- or graph-based methods face three key issues: (a) strong temporal self-dependencies are often disrupted by irrelevant variables; (b) softmax normalization ignores and reverses negative correlations; (c) variables struggle to perceive their tempora… ▽ More

    Submitted 17 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

  26. arXiv:2511.12520  [pdf, ps, other

    cs.CL

    TAdaRAG: Task Adaptive Retrieval-Augmented Generation via On-the-Fly Knowledge Graph Construction

    Authors: Jie Zhang, Bo Tang, Wanzi Shao, Wenqiang Wei, Jihao Zhao, Jianqing Zhu, Zhiyu li, Wen Xi, Zehao Lin, Feiyu Xiong, Yanchao Tan

    Abstract: Retrieval-Augmented Generation (RAG) improves large language models by retrieving external knowledge, often truncated into smaller chunks due to the input context window, which leads to information loss, resulting in response hallucinations and broken reasoning chains. Moreover, traditional RAG retrieves unstructured knowledge, introducing irrelevant details that hinder accurate reasoning. To addr… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026

  27. arXiv:2511.09478  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting

    Authors: Renda Li, Hailang Huang, Fei Wei, Feng Xiong, Yong Wang, Xiangxiang Chu

    Abstract: Reinforcement learning (RL) has demonstrated considerable potential for enhancing reasoning in large language models (LLMs). However, existing methods suffer from Gradient Starvation and Policy Degradation when training directly on samples with mixed difficulty. To mitigate this, prior approaches leverage Chain-of-Thought (CoT) data, but the construction of high-quality CoT annotations remains lab… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  28. arXiv:2511.07023  [pdf, ps, other

    cs.LG

    Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time

    Authors: Junjun Pan, Yixin Liu, Chuan Zhou, Fei Xiong, Alan Wee-Chung Liew, Shirui Pan

    Abstract: Graph anomaly detection (GAD), which aims to detect outliers in graph-structured data, has received increasing research attention recently. However, existing GAD methods assume identical training and testing distributions, which is rarely valid in practice. In real-world scenarios, unseen but normal samples may emerge during deployment, leading to a normality shift that degrades the performance of… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

    Comments: 9 pages, 5 figures, accepted by AAAI 2026

  29. arXiv:2511.03506  [pdf, ps, other

    cs.CL

    HaluMem: Evaluating Hallucinations in Memory Systems of Agents

    Authors: Ding Chen, Simin Niu, Kehang Li, Peng Liu, Xiangping Zheng, Bo Tang, Xinchi Li, Feiyu Xiong, Zhiyu Li

    Abstract: Memory systems are key components that enable AI systems such as LLMs and AI agents to achieve long-term learning and sustained interaction. However, during memory storage and retrieval, these systems frequently exhibit memory hallucinations, including fabrication, errors, conflicts, and omissions. Existing evaluations of memory hallucinations are primarily end-to-end question answering, which mak… ▽ More

    Submitted 4 January, 2026; v1 submitted 5 November, 2025; originally announced November 2025.

  30. arXiv:2510.24677  [pdf, ps, other

    cs.CL cs.AI

    Dissecting Role Cognition in Medical LLMs via Neuronal Ablation

    Authors: Xun Liang, Huayi Lai, Hanyu Wang, Wentao Zhang, Linfeng Zhang, Yanfang Chen, Feiyu Xiong, Zhiyu Li

    Abstract: Large language models (LLMs) have gained significant traction in medical decision support systems, particularly in the context of medical question answering and role-playing simulations. A common practice, Prompt-Based Role Playing (PBRP), instructs models to adopt different clinical roles (e.g., medical students, residents, attending physicians) to simulate varied professional behaviors. Ho… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: 15 pages, 9 figures

  31. arXiv:2510.21714  [pdf, ps, other

    cs.IR

    Practice on Long Behavior Sequence Modeling in Tencent Advertising

    Authors: Xian Hu, Ming Yue, Zhixiang Feng, Junwei Pan, Junjie Zhai, Ximei Wang, Xinrui Miao, Qian Li, Xun Liu, Shangyu Zhang, Letian Wang, Hua Lu, Zijian Zeng, Chen Cai, Wei Wang, Fei Xiong, Pengfei Xiong, Jintao Zhang, Zhiyuan Wu, Chunhui Zhang, Anan Liu, Jiulong You, Chao Deng, Yuekui Yang, Shudong Huang , et al. (2 additional authors not shown)

    Abstract: Long-sequence modeling has become an indispensable frontier in recommendation systems for capturing users' long-term preferences. However, user behaviors within advertising domains are inherently sparse, posing a significant barrier to constructing long behavioral sequences using data from a single advertising domain alone. This motivates us to collect users' behaviors not only across diverse adve… ▽ More

    Submitted 10 September, 2025; originally announced October 2025.

  32. arXiv:2510.14252  [pdf, ps, other

    cs.CL

    MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

    Authors: Jihao Zhao, Zhiyuan Ji, Simin Niu, Hanyu Wang, Feiyu Xiong, Zhiyu Li

    Abstract: The traditional RAG paradigm, which typically engages in the comprehension of relevant text chunks in response to received queries, inherently restricts both the depth of knowledge internalization and reasoning capabilities. To address this limitation, our research transforms the text processing in RAG from passive chunking to proactive understanding, defining this process as document memory extra… ▽ More

    Submitted 15 October, 2025; originally announced October 2025.

  33. arXiv:2510.13907  [pdf, ps, other

    cs.CL stat.ML

    LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization

    Authors: Yuanchen Wu, Saurabh Verma, Justin Lee, Fangzhou Xiong, Poppy Zhang, Amel Awadelkarim, Xu Chen, Yubai Yuan, Shawndra Hill

    Abstract: Large language models (LLMs) are highly sensitive to prompts, but most automatic prompt optimization (APO) methods assume access to ground-truth references (e.g., labeled validation data) that are costly to obtain. We propose the Prompt Duel Optimizer (PDO), a sample-efficient framework for label-free prompt optimization based on pairwise preference feedback from an LLM judge. PDO casts prompt sel… ▽ More

    Submitted 9 April, 2026; v1 submitted 14 October, 2025; originally announced October 2025.

    Comments: Accepted to Findings of ACL 2026. Camera-ready version

  34. arXiv:2510.02271  [pdf, ps, other

    cs.CL cs.AI

    InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents

    Authors: Yaxin Du, Yuanshuo Zhang, Xiyuan Yang, Yifan Zhou, Cheng Wang, Gongyi Zou, Xianghe Pang, Wenhao Wang, Menglan Chen, Shuo Tang, Zhiyu Li, Feiyu Xiong, Siheng Chen

    Abstract: Information seeking is a fundamental requirement for humans. However, existing LLM agents rely heavily on open-web search, which exposes two fundamental weaknesses: online content is noisy and unreliable, and many real-world tasks require precise, domain-specific knowledge unavailable from the web. The emergence of the Model Context Protocol (MCP) now allows agents to interface with thousands of s… ▽ More

    Submitted 4 October, 2025; v1 submitted 2 October, 2025; originally announced October 2025.

  35. arXiv:2509.26251  [pdf, ps, other

    cs.CV

    Seeing Space and Motion: Enhancing Latent Actions with Geometric and Dynamic Awareness for Vision-Language-Action Models

    Authors: Zhejia Cai, Yandan Yang, Xinyuan Chang, Shiyi Liang, Ronghan Chen, Feng Xiong, Mu Xu, Ruqi Huang

    Abstract: Latent Action Models (LAMs) enable Vision- Language-Action (VLA) systems to learn semantic action representations from large-scale unannotated data. Yet, we identify two bottlenecks of LAMs: 1) the commonly adopted end-to-end trained image encoder suffers from poor spatial understanding; 2) LAMs can be fragile when input frames are temporally distant, leading to limited temporal percep- tion. Such… ▽ More

    Submitted 11 March, 2026; v1 submitted 30 September, 2025; originally announced September 2025.

    Comments: 8 pages, correct errors, clarify details

  36. arXiv:2509.24948  [pdf, ps, other

    cs.RO

    World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training

    Authors: Junjin Xiao, Yandan Yang, Xinyuan Chang, Ronghan Chen, Feng Xiong, Mu Xu, Wei-Shi Zheng, Qing Zhang

    Abstract: Vision-Language-Action (VLA) models trained via imitation learning suffer from significant performance degradation in data-scarce scenarios due to their reliance on large-scale demonstration datasets. Although reinforcement learning (RL)-based post-training has proven effective in addressing data scarcity, its application to VLA models is hindered by the non-resettable nature of real-world environ… ▽ More

    Submitted 18 March, 2026; v1 submitted 29 September, 2025; originally announced September 2025.

    Comments: Accepted to CVPR2026

  37. arXiv:2509.22548  [pdf, ps, other

    cs.CV cs.RO

    JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation

    Authors: Shuang Zeng, Dekang Qi, Xinyuan Chang, Feng Xiong, Shichao Xie, Xiaolong Wu, Shiyi Liang, Mu Xu, Xing Wei, Ning Guo

    Abstract: Vision-and-Language Navigation requires an embodied agent to navigate through unseen environments, guided by natural language instructions and a continuous video stream. Recent advances in VLN have been driven by the powerful semantic understanding of Multimodal Large Language Models. However, these methods typically rely on explicit semantic memory, such as building textual cognitive maps or stor… ▽ More

    Submitted 25 February, 2026; v1 submitted 26 September, 2025; originally announced September 2025.

    Comments: Accepted to ICLR 2026. Project page: https://miv-xjtu.github.io/JanusVLN.github.io/

  38. arXiv:2509.11145  [pdf, ps, other

    cs.CL cs.PL

    Text2Mem: A Unified Memory Operation Language for Memory Operating System

    Authors: Yi Wang, Lihai Yang, Boyu Chen, Gongyi Zou, Kerun Xu, Bo Tang, Feiyu Xiong, Siheng Chen, Zhiyu Li

    Abstract: Large language model agents increasingly depend on memory to sustain long horizon interaction, but existing frameworks remain limited. Most expose only a few basic primitives such as encode, retrieve, and delete, while higher order operations like merge, promote, demote, split, lock, and expire are missing or inconsistently supported. Moreover, there is no formal and executable specification for m… ▽ More

    Submitted 23 October, 2025; v1 submitted 14 September, 2025; originally announced September 2025.

    Comments: 12 pages, 3 figures, 2 tables

  39. arXiv:2509.09995  [pdf, ps, other

    cs.CE

    QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

    Authors: Fei Xiong, Xiang Zhang, Aosong Feng, Siqi Sun, Chenyu You

    Abstract: Recent advances in Large Language Models (LLMs) have shown remarkable capabilities in financial reasoning and market understanding. Multi-agent LLM frameworks such as TradingAgent and FINMEM augment these models to long-horizon investment tasks by leveraging fundamental and sentiment-based inputs for strategic decision-making. However, these approaches are ill-suited for the high-speed, precision-… ▽ More

    Submitted 26 September, 2025; v1 submitted 12 September, 2025; originally announced September 2025.

  40. Iterative Low-rank Network for Hyperspectral Image Denoising

    Authors: Jin Ye, Fengchao Xiong, Jun Zhou, Yuntao Qian

    Abstract: Hyperspectral image (HSI) denoising is a crucial preprocessing step for subsequent tasks. The clean HSI usually reside in a low-dimensional subspace, which can be captured by low-rank and sparse representation, known as the physical prior of HSI. It is generally challenging to adequately use such physical properties for effective denoising while preserving image details. This paper introduces a no… ▽ More

    Submitted 30 August, 2025; originally announced September 2025.

    Journal ref: TGRS 2024

  41. arXiv:2508.15709  [pdf, ps, other

    cs.CL

    Position Bias Mitigates Position Bias:Mitigate Position Bias Through Inter-Position Knowledge Distillation

    Authors: Yifei Wang, Feng Xiong, Yong Wang, Linjing Li, Xiangxiang Chu, Daniel Dajun Zeng

    Abstract: Positional bias (PB), manifesting as non-uniform sensitivity across different contextual locations, significantly impairs long-context comprehension and processing capabilities. Previous studies have addressed PB either by modifying the underlying architectures or by employing extensive contextual awareness training. However, the former approach fails to effectively eliminate the substantial perfo… ▽ More

    Submitted 17 September, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

    Comments: EMNLP 2025 Oral

  42. arXiv:2508.15553  [pdf, ps, other

    eess.IV cs.CV

    Deep Equilibrium Convolutional Sparse Coding for Hyperspectral Image Denoising

    Authors: Jin Ye, Jingran Wang, Fengchao Xiong, Jingzhou Chen, Yuntao Qian

    Abstract: Hyperspectral images (HSIs) play a crucial role in remote sensing but are often degraded by complex noise patterns. Ensuring the physical property of the denoised HSIs is vital for robust HSI denoising, giving the rise of deep unfolding-based methods. However, these methods map the optimization of a physical model to a learnable network with a predefined depth, which lacks convergence guarantees.… ▽ More

    Submitted 22 December, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

  43. arXiv:2508.07250  [pdf, ps, other

    cs.CV

    SUIT: Spatial-Spectral Union-Intersection Interaction Network for Hyperspectral Object Tracking

    Authors: Fengchao Xiong, Zhenxing Wu, Sen Jia, Yuntao Qian

    Abstract: Hyperspectral videos (HSVs), with their inherent spatial-spectral-temporal structure, offer distinct advantages in challenging tracking scenarios such as cluttered backgrounds and small objects. However, existing methods primarily focus on spatial interactions between the template and search regions, often overlooking spectral interactions, leading to suboptimal performance. To address this issue,… ▽ More

    Submitted 10 August, 2025; originally announced August 2025.

  44. arXiv:2507.18671  [pdf, ps, other

    cs.LG cs.AI

    Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

    Authors: Ning Liao, Xiaoxing Wang, Zehao Lin, Weiyang Guo, Feng Hong, Shixiang Song, Geng Yu, Zihua Zhao, Sitao Xie, Longxuan Wei, Xiangqi Jin, Xiaohan Qin, Jiale Ma, Kai Chen, Jiangchao Yao, Zhouhan Lin, Junchi Yan, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Linfeng Zhang

    Abstract: A large language model (LLM) with knowledge in both scientific and general tasks is the foundation of science general intelligence. However, directly continued pretraining an LLM using science data usually leads to catastrophic forgetting, which indicates severe degradation in general ability. In this report, we present Innovator, which solves this problem by upcycling a pre-trained dense LLM into… ▽ More

    Submitted 16 October, 2025; v1 submitted 24 July, 2025; originally announced July 2025.

    Comments: Technical Report

  45. arXiv:2507.03724  [pdf, ps, other

    cs.CL

    MemOS: A Memory OS for AI System

    Authors: Zhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen, Boyu Chen, Shichao Song, Simin Niu, Hanyu Wang, Jiawei Yang, Chen Tang, Qingchen Yu, Jihao Zhao, Yezhaohui Wang, Peng Liu, Zehao Lin, Pengyuan Wang, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhen Tao, Huayi Lai, Hao Wu, Bo Tang, Zhengren Wang , et al. (14 additional authors not shown)

    Abstract: Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI), yet their lack of well-defined memory management systems hinders the development of long-context reasoning, continual personalization, and knowledge consistency.Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user prefer… ▽ More

    Submitted 2 December, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

    Comments: 36 pages, 10 figures, 5 tables

  46. arXiv:2506.23071  [pdf, ps, other

    cs.CL

    Text2VectorSQL: Towards a Unified Interface for Vector Search and SQL Queries

    Authors: Zhengren Wang, Dongwen Yao, Bozhou Li, Dongsheng Ma, Bo Li, Zhiyu Li, Feiyu Xiong, Bin Cui, Linpeng Tang, Wentao Zhang

    Abstract: The proliferation of unstructured data poses a fundamental challenge to traditional database interfaces. While Text-to-SQL has democratized access to structured data, it remains incapable of interpreting semantic or multi-modal queries. Concurrently, vector search has emerged as the de facto standard for querying unstructured data, but its integration with SQL-termed VectorSQL-still relies on manu… ▽ More

    Submitted 6 November, 2025; v1 submitted 28 June, 2025; originally announced June 2025.

    Comments: Manuscript

  47. arXiv:2506.18382  [pdf, ps, other

    cs.IR cs.AI cs.LG

    PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario Matching

    Authors: Haotong Du, Yaqing Wang, Fei Xiong, Lei Shao, Ming Liu, Hao Gu, Quanming Yao, Zhen Wang

    Abstract: With the expansion of business scales and scopes on online platforms, multi-scenario matching has become a mainstream solution to reduce maintenance costs and alleviate data sparsity. The key to effective multi-scenario recommendation lies in capturing both user preferences shared across all scenarios and scenario-aware preferences specific to each scenario. However, existing methods often overloo… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted by KDD 2025

  48. arXiv:2506.05779  [pdf, ps, other

    cs.NI cs.LG

    Pegasus: A Universal Framework for Scalable Deep Learning Inference on the Dataplane

    Authors: Yinchao Zhang, Su Yao, Yong Feng, Kang Chen, Tong Li, Zhuotao Liu, Yi Zhao, Lexuan Zhang, Xiangyu Gao, Feng Xiong, Qi Li, Ke Xu

    Abstract: The paradigm of Intelligent DataPlane (IDP) embeds deep learning (DL) models on the network dataplane to enable intelligent traffic analysis at line-speed. However, the current use of the match-action table (MAT) abstraction on the dataplane is misaligned with DL inference, leading to several key limitations, including accuracy degradation, limited scale, and lack of generality. This paper propose… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: to be published in Sigcomm 2025

  49. arXiv:2506.04805  [pdf, ps, other

    cs.LG

    Adaptive Preconditioners Trigger Loss Spikes in Adam

    Authors: Zhiwei Bai, Zhangchen Zhou, Jiajie Zhao, Xiaolong Li, Zhiyu Li, Feiyu Xiong, Hongkang Yang, Yaoyu Zhang, Zhi-Qin John Xu

    Abstract: Loss spikes emerge commonly during training across neural networks of varying architectures and scales when using the Adam optimizer. In this work, we investigate the underlying mechanism responsible for Adam spikes. While previous explanations attribute these phenomena to the lower-loss-as-sharper characteristics of the loss landscape, our analysis reveals that Adam's adaptive preconditioners the… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  50. arXiv:2505.24369  [pdf, ps, other

    cs.LG cs.AI

    Adversarial Preference Learning for Robust LLM Alignment

    Authors: Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, Yijun Niu, Keming Mao, Zhiyu Li, Feiyu Xiong, Jie Hu, Mingchuan Yang

    Abstract: Modern language models often rely on Reinforcement Learning from Human Feedback (RLHF) to encourage safe behaviors. However, they remain vulnerable to adversarial attacks due to three key limitations: (1) the inefficiency and high cost of human annotation, (2) the vast diversity of potential adversarial attacks, and (3) the risk of feedback bias and reward hacking. To address these challenges, we… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted at ACL2025 Findings