Skip to main content

Showing 1–50 of 653 results for author: Xiong, H

Searching in archive cs. Search in all archives.
.
  1. Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging

    Authors: Xinyu Peng, Ziyang Zheng, Wenrui Dai, Duoduo Xue, Shaohui Li, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Task-adapted compressed sensing magnetic resonance imaging (CS-MRI) is emerging to address the specific demands of downstream clinical tasks with significantly fewer k-space measurements than required by Nyquist sampling. However, existing task-adapted CS-MRI methods suffer from the uncertainty problem for medical diagnosis and cannot achieve adaptive sampling in end-to-end optimization with recon… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: 68 pages, 15 figures, accepted by IEEE TPAMI

  2. arXiv:2604.12332  [pdf, ps, other

    cs.IT math.CO

    Turán-Theoretic Bounds on Several Elementary Trapping Sets in LDPC Codes

    Authors: Ziyang Zhao, Haoran Xiong, Zicheng Ye, Guiying Yan

    Abstract: LDPC codes have attracted significant attention because of their superior performance close to the Shannon limit. Elementary trapping sets are the main cause of the error floor phenomenon in LDPC codes. We consider typical graphs related to trapping sets, including theta graphs, dumbbell graphs, and short cycles with chords. Based on the Turán numbers of $θ(2,2,2)$, $θ(1,3,3)$ and $D(4,4;0)$, we p… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  3. arXiv:2604.10017  [pdf, ps, other

    cs.CV

    What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters

    Authors: Shaobo Liu, Haobo Xiong, Kai Liu, Yuna Lin

    Abstract: Parameter-efficient fine-tuning of pre-trained codecs is a promising direction in image compression for human and machine vision. While most existing works have primarily focused on tuning the feature structure within the encoder-decoder backbones, the adaptation of the statistical semantics within the entropy model has received limited attention despite its function of predicting the probability… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: Accepted by the IEEE/CVF Conference on Computer Vision and Pattern Recognition Findings, 2026

  4. arXiv:2604.09670  [pdf, ps, other

    cs.LG cs.AI

    Human-like Working Memory Interference in Large Language Models

    Authors: Hua-Dong Xiong, Li Ji-An, Jiaqi Huang, Robert C. Wilson, Kwonjoon Lee, Xue-Xin Wei

    Abstract: Intelligent systems must maintain and manipulate task-relevant information online to adapt to dynamic environments and changing goals. This capacity, known as working memory, is fundamental to human reasoning and intelligence. Despite having on the order of 100 billion neurons, both biological and artificial systems exhibit limitations in working memory. This raises a key question: why do large la… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  5. arXiv:2604.09668  [pdf, ps, other

    cs.IR cs.CV

    Decoding Ancient Oracle Bone Script via Generative Dictionary Retrieval

    Authors: Yin Wu, Gangjian Zhang, Jiayu Chen, Chang Xu, Yuyu Luo, Nan Tang, Hui Xiong

    Abstract: Understanding humanity's earliest writing systems is crucial for reconstructing civilization's origins, yet many ancient scripts remain undeciphered. Oracle Bone Script (OBS) from China's Shang dynasty exemplifies this challenge: only approximately 1,500 of roughly 4,600 characters have been decoded, and a substantial portion of these 3,000-year-old inscriptions remains only partially understood.… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: 19 pages, 4 figures. Under review at Nature Machine Intelligence

  6. arXiv:2604.09568  [pdf, ps, other

    cs.HC cs.CL cs.CV

    EvoDiagram: Agentic Editable Diagram Creation via Design Expertise Evolution

    Authors: Tianfu Wang, Leilei Ding, Ziyang Tao, Yi Zhan, Zhiyuan Ma, Wei Wu, Yuxuan Lei, Yuan Feng, Junyang Wang, Yin Wu, Yizhao Xu, Hongyuan Zhu, Qi Liu, Nicholas Jing Yuan, Yanyong Zhang, Hui Xiong

    Abstract: High-fidelity diagram creation requires the complex orchestration of semantic topology, visual styling, and spatial layout, posing a significant challenge for automated systems. Existing methods also suffer from a representation gap: pixel-based models often lack precise control, while code-based synthesis limits intuitive flexibility. To bridge this gap, we introduce EvoDiagram, an agentic framew… ▽ More

    Submitted 20 February, 2026; originally announced April 2026.

  7. arXiv:2604.07607  [pdf, ps, other

    cs.RO cs.CV

    EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

    Authors: Ryan Punamiya, Simar Kareer, Zeyi Liu, Josh Citron, Ri-Zhao Qiu, Xiongyi Cai, Alexey Gavryushin, Jiaqi Chen, Davide Liconti, Lawrence Y. Zhu, Patcharapong Aphiwetsa, Baoyu Li, Aniketh Cheluva, Pranav Kuppili, Yangcen Liu, Dhruv Patel, Aidan Gao, Hye-Young Chung, Ryan Co, Renee Zbizika, Jeff Liu, Xiaomeng Xu, Haoyu Xiong, Geng Chen, Sebastiano Oliani , et al. (14 additional authors not shown)

    Abstract: Robot learning increasingly depends on large and diverse data, yet robot data collection remains expensive and difficult to scale. Egocentric human data offer a promising alternative by capturing rich manipulation behavior across everyday environments. However, existing human datasets are often limited in scope, difficult to extend, and fragmented across institutions. We introduce EgoVerse, a coll… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  8. arXiv:2604.03660  [pdf, ps, other

    cs.AI

    TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables

    Authors: Xiaoyu Chen, Lu Dai, Hanqing Wang, Zhuoyu Li, Wenbin Dai, Yanzong Zheng, Zhenggang Xia, Junyong Lin, Hui Xiong

    Abstract: Structured tables are essential for conveying high-density information in professional domains such as finance, healthcare, and scientific research. Despite the progress in Multimodal Large Language Models (MLLMs), reasoning performance remains limited for complex tables with hierarchical layouts. In this paper, we identify a critical Perception Bottleneck through quantitative analysis. We find th… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

  9. arXiv:2603.28971  [pdf, ps, other

    eess.SY cs.LG

    A Pontryagin Method of Model-based Reinforcement Learning via Hamiltonian Actor-Critic

    Authors: Chengyang Gu, Yuxin Pan, Hui Xiong, Yize Chen

    Abstract: Model-based reinforcement learning (MBRL) improves sample efficiency by leveraging learned dynamics models for policy optimization. However, the effectiveness of methods such as actor-critic is often limited by compounding model errors, which degrade long-horizon value estimation. Existing approaches, such as Model-Based Value Expansion (MVE), partially mitigate this issue through multi-step rollo… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: 18 pages, 4 figures, in submission

  10. arXiv:2603.16781  [pdf, ps, other

    cs.CV cs.AI

    IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

    Authors: Huimin Xiong, Zijie Meng, Tianxiang Hu, Chenyi Zhou, Yang Feng, Zuozhu Liu

    Abstract: 3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communication. While recent works introduce dental vision-language models (VLMs) to enable unified diagnosis and report generation on 2D images or multi-view images rendered from IOS, they do not fully leverage na… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  11. arXiv:2603.14245  [pdf, ps, other

    cs.LG cs.AI

    GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies

    Authors: He Zhang, Ying Sun, Hui Xiong

    Abstract: Flow-matching policies hold great promise for reinforcement learning (RL) by capturing complex, multi-modal action distributions. However, their practical application is often hindered by prohibitive inference latency and ineffective online exploration. Although recent works have employed one-step distillation for fast inference, the structure of the initial noise distribution remains an overlooke… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: 23 pages, 13 figures

    MSC Class: 68T05 ACM Class: I.2.6; I.2.8

  12. arXiv:2603.12759  [pdf, ps, other

    cs.CV

    SAP: Segment Any 4K Panorama

    Authors: Lutao Jiang, Zidong Cao, Weikai Chen, Xu Zheng, Yuanhuiyi Lyu, Zhenyang Li, Zeyu HU, Yingda Yin, Keyang Luo, Runze Zhang, Kai Yan, Shengju Qian, Haidi Fan, Yifan Peng, Xin Wang, Hui Xiong, Ying-Cong Chen

    Abstract: Promptable instance segmentation is widely adopted in embodied and AR systems, yet the performance of foundation models trained on perspective imagery often degrades on 360° panoramas. In this paper, we introduce Segment Any 4K Panorama (SAP), a foundation model for 4K high-resolution panoramic instance-level segmentation. We reformulate panoramic segmentation as fixed-trajectory perspective video… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: Project Page: https://lutao2021.github.io/SAP_Page/

  13. arXiv:2603.10473  [pdf, ps, other

    cs.CL cs.AI

    Aligning Large Language Models with Searcher Preferences

    Authors: Wei Wu, Peilun Zhou, Liyi Chen, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui Xiong

    Abstract: The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress has applied generative techniques to closed-set item ranking in e-commerce, research and deployment of open-ended generative search on large content platforms remain limited. This setting introduces challenges, including robustness to noisy retrieval,… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  14. arXiv:2603.09385  [pdf, ps, other

    cs.CV

    EventVGGT: Exploring Cross-Modal Distillation for Consistent Event-based Depth Estimation

    Authors: Yinrui Ren, Jinjing Zhu, Kanghao Chen, Zhuoxiao Li, Jing Ou, Zidong Cao, Tongyan Hua, Peilun Shi, Yingchun Fu, Wufan Zhao, Hui Xiong

    Abstract: Event cameras offer superior sensitivity to high-speed motion and extreme lighting, making event-based monocular depth estimation a promising approach for robust 3D perception in challenging conditions. However, progress is severely hindered by the scarcity of dense depth annotations. While recent annotation-free approaches mitigate this by distilling knowledge from Vision Foundation Models (VFMs)… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  15. arXiv:2603.06766  [pdf, ps, other

    eess.IV cs.CV cs.MM

    HiDE: Hierarchical Dictionary-Based Entropy Modeling for Learned Image Compression

    Authors: Haoxuan Xiong, Yuanyuan Xu, Kun Zhu, Yiming Wang, Baoliu Ye

    Abstract: Learned image compression (LIC) has achieved remarkable coding efficiency, where entropy modeling plays a pivotal role in minimizing bitrate through informative priors. Existing methods predominantly exploit internal contexts within the input image, yet the rich external priors embedded in large-scale training data remain largely underutilized. Recent advances in dictionary-based entropy models ha… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  16. arXiv:2603.06165  [pdf, ps, other

    cs.CV cs.AI

    Reflective Flow Sampling Enhancement

    Authors: Zikai Zhou, Muyao Wang, Shitong Shao, Lichen Bai, Haoyi Xiong, Bo Han, Zeke Xie

    Abstract: The growing demand for text-to-image generation has led to rapid advances in generative modeling. Recently, text-to-image diffusion models trained with flow matching algorithms, such as FLUX, have achieved remarkable progress and emerged as strong alternatives to conventional diffusion models. At the same time, inference-time enhancement strategies have been shown to improve the generation quality… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  17. arXiv:2603.03839  [pdf, ps, other

    cs.CV

    All-in-One Image Restoration via Causal-Deconfounding Wavelet-Disentangled Prompt Network

    Authors: Bingnan Wang, Bin Qin, Jiangmeng Li, Fanjiang Xu, Fuchun Sun, Hui Xiong

    Abstract: Image restoration represents a promising approach for addressing the inherent defects of image content distortion. Standard image restoration approaches suffer from high storage cost and the requirement towards the known degradation pattern, including type and degree, which can barely be satisfied in dynamic practical scenarios. In contrast, all-in-one image restoration (AiOIR) eliminates multiple… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: Accepted by IEEE TIP 2026

  18. arXiv:2602.22923  [pdf, ps, other

    cs.CV cs.RO

    WaterVideoQA: ASV-Centric Perception and Rule-Compliant Reasoning via Multi-Modal Agents

    Authors: Runwei Guan, Shaofeng Liang, Ningwei Ouyang, Weichen Fei, Shanliang Yao, Wei Dai, Chenhao Ge, Penglei Sun, Xiaohui Zhu, Tao Huang, Ryan Wen Liu, Hui Xiong

    Abstract: While autonomous navigation has achieved remarkable success in passive perception (e.g., object detection and segmentation), it remains fundamentally constrained by a void in knowledge-driven, interactive environmental cognition. In the high-stakes domain of maritime navigation, the ability to bridge the gap between raw visual perception and complex cognitive reasoning is not merely an enhancement… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

    Comments: 11 pages,8 figures

  19. arXiv:2602.22025  [pdf, ps, other

    cs.CV

    Olbedo: An Albedo and Shading Aerial Dataset for Large-Scale Outdoor Environments

    Authors: Shuang Song, Debao Huang, Deyan Deng, Haolin Xiong, Yang Tang, Yajie Zhao, Rongjun Qin

    Abstract: Intrinsic image decomposition (IID) of outdoor scenes is crucial for relighting, editing, and understanding large-scale environments, but progress has been limited by the lack of real-world datasets with reliable albedo and shading supervision. We introduce Olbedo, a large-scale aerial dataset for outdoor albedo--shading decomposition in the wild. Olbedo contains 5,664 UAV images captured across f… ▽ More

    Submitted 27 March, 2026; v1 submitted 24 February, 2026; originally announced February 2026.

    Comments: CVPR 2026

  20. arXiv:2602.21099  [pdf, ps, other

    cs.IR

    Turning Semantics into Topology: LLM-Driven Attribute Augmentation for Collaborative Filtering

    Authors: Junjie Meng, Ranxu zhang, Wei Wu, Rui Zhang, Chuan Qin, Qi Zhang, Qi Liu, Hui Xiong, Chao Wang

    Abstract: Large Language Models (LLMs) have shown great potential for enhancing recommender systems through their extensive world knowledge and reasoning capabilities. However, effectively translating these semantic signals into traditional collaborative embeddings remains an open challenge. Existing approaches typically fall into two extremes: direct inference methods are computationally prohibitive for la… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

  21. arXiv:2602.20566  [pdf, ps, other

    cs.RO cs.CV

    BFA++: Hierarchical Best-Feature-Aware Token Prune for Multi-View Vision Language Action Model

    Authors: Haosheng Li, Weixin Mao, Zihan Lan, Hongwei Xiong, Hongan Wang, Chenyang Si, Ziwei Liu, Xiaoming Deng, Hua Chen

    Abstract: Vision-Language-Action (VLA) models have achieved significant breakthroughs by leveraging Large Vision Language Models (VLMs) to jointly interpret instructions and visual inputs. However, the substantial increase in visual tokens, particularly from multi-view inputs, poses serious challenges to real-time robotic manipulation. Existing acceleration techniques for VLMs, such as token pruning, often… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

    Comments: 9 pages, 10 figures

  22. arXiv:2602.13665  [pdf, ps, other

    cs.AI

    HyFunc: Accelerating LLM-based Function Calls for Agentic AI through Hybrid-Model Cascade and Dynamic Templating

    Authors: Weibin Liao, Jian-guang Lou, Haoyi Xiong

    Abstract: While agentic AI systems rely on LLMs to translate user intent into structured function calls, this process is fraught with computational redundancy, leading to high inference latency that hinders real-time applications. This paper identifies and addresses three key redundancies: (1) the redundant processing of a large library of function descriptions for every request; (2) the redundant use of a… ▽ More

    Submitted 14 February, 2026; originally announced February 2026.

    Comments: Accepted by KDD'26

  23. arXiv:2602.10847  [pdf, ps, other

    cs.LG cs.AI

    Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval

    Authors: Fanpu Cao, Lu Dai, Jindong Han, Hui Xiong

    Abstract: Multivariate time series forecasting (MTSF) plays a vital role in numerous real-world applications, yet existing models remain constrained by their reliance on a limited historical context. This limitation prevents them from effectively capturing global periodic patterns that often span cycles significantly longer than the input horizon - despite such patterns carrying strong predictive signals. N… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

    Comments: ICLR 2026

  24. arXiv:2602.10016  [pdf, ps, other

    cs.IR cs.AI

    Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

    Authors: Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen, Yue Dong, Yi Yang, Shuo Chang, Xiaorui Gan, Wenlin Chen, Santanu Kolay, Darren Liu, Jade Nie , et al. (4 additional authors not shown)

    Abstract: Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify… ▽ More

    Submitted 13 February, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

    Comments: 10 pages, 4 figures

  25. arXiv:2602.09657  [pdf, ps, other

    cs.RO

    AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild

    Authors: Xiaolou Sun, Wufei Si, Wenhui Ni, Yuntian Li, Dongming Wu, Fei Xie, Runwei Guan, He-Yang Xu, Henghui Ding, Yuan Wu, Yutao Yue, Yongming Huang, Hui Xiong

    Abstract: Vision-language navigation (VLN) requires intelligent agents to navigate environments by interpreting linguistic instructions alongside visual observations, serving as a cornerstone task in Embodied AI. Current VLN research for unmanned aerial vehicles (UAVs) relies on detailed, pre-specified instructions to guide the UAV along predetermined routes. However, real-world outdoor exploration typicall… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

    Comments: Acceped by ICLR 2026

  26. arXiv:2602.09638  [pdf, ps, other

    cs.CV

    VideoAfford: Grounding 3D Affordance from Human-Object-Interaction Videos via Multimodal Large Language Model

    Authors: Hanqing Wang, Mingyu Liu, Xiaoyu Chen, Chengwei MA, Yiming Zhong, Wenti Yin, Yuhao Liu, Zhiqing Cui, Jiahao Yuan, Lu Dai, Zhiyuan Ma, Hui Xiong

    Abstract: 3D affordance grounding aims to highlight the actionable regions on 3D objects, which is crucial for robotic manipulation. Previous research primarily focused on learning affordance knowledge from static cues such as language and images, which struggle to provide sufficient dynamic interaction context that can reveal temporal and causal cues. To alleviate this predicament, we collect a comprehensi… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

  27. arXiv:2602.07953  [pdf, ps, other

    cs.IT

    OFDM Enabled Over-the-Air Computation Systems with Two-Dimensional Fluid Antennas

    Authors: Heyang Xiong, Quanzhong Li, Qi Zhang

    Abstract: Fluid antenna system (FAS) is able to exploit spatial degrees of freedom (DoFs) in wireless channels. In this letter, to exploit spatial DoFs in frequency-selective environments, we investigate an orthogonal frequency division multiplexing enabled over-the-air computation system, where the access point is equipped with a two-dimensional FAS to enhance performance. We solve the computation mean squ… ▽ More

    Submitted 8 February, 2026; originally announced February 2026.

  28. arXiv:2602.07565  [pdf, ps, other

    cs.CV

    Human Identification at a Distance: Challenges, Methods and Results on the Competition HID 2025

    Authors: Jingzhe Ma, Meng Zhang, Jianlong Yu, Kun Liu, Zunxiao Xu, Xue Cheng, Junjie Zhou, Yanfei Wang, Jiahang Li, Zepeng Wang, Kazuki Osamura, Rujie Liu, Narishige Abe, Jingjie Wang, Shunli Zhang, Haojun Xie, Jiajun Wu, Weiming Wu, Wenxiong Kang, Qingshuo Gao, Jiaming Xiong, Xianye Ben, Lei Chen, Lichen Song, Junjian Cui , et al. (12 additional authors not shown)

    Abstract: Human identification at a distance (HID) is challenging because traditional biometric modalities such as face and fingerprints are often difficult to acquire in real-world scenarios. Gait recognition provides a practical alternative, as it can be captured reliably at a distance. To promote progress in gait recognition and provide a fair evaluation platform, the International Competition on Human I… ▽ More

    Submitted 7 February, 2026; originally announced February 2026.

    Comments: Accepted by IJCB 2025(https://ijcb2025.ieee-biometrics.org/competitions/)

  29. arXiv:2602.07026  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models

    Authors: Xiaomin Yu, Yi Xin, Wenjie Zhang, Chonghan Liu, Hanzhen Zhao, Xiaoxing Hu, Xinlei Yu, Ziyue Qiao, Hao Tang, Xue Yang, Xiaobin Hu, Chengwei Qin, Hui Xiong, Yu Qiao, Shuicheng Yan

    Abstract: Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct modalities expressing identical semantics occupy systematically offset regions. Prior approaches to bridge this gap are largely limited by oversimplified isotropic assumptions, hindering their application in larg… ▽ More

    Submitted 2 February, 2026; originally announced February 2026.

  30. arXiv:2602.06475  [pdf, ps, other

    cs.LG

    Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning

    Authors: Jingyao Wang, Peizheng Guo, Wenwen Qiang, Jiahuan Zhou, Huijie Guo, Changwen Zheng, Hui Xiong

    Abstract: Large language models (LLMs) excel at complex tasks with advances in reasoning capabilities. However, existing reward mechanisms remain tightly coupled to final correctness and pay little attention to the underlying reasoning process: trajectories with sound reasoning but wrong answers receive low credit, while lucky guesses with flawed logic may be highly rewarded, affecting reasoning generalizat… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  31. arXiv:2602.06453  [pdf, ps, other

    cs.LG

    On the Plasticity and Stability for Post-Training Large Language Models

    Authors: Wenwen Qiang, Ziyin Gu, Jiahuan Zhou, Jie Hu, Jingyao Wang, Changwen Zheng, Hui Xiong

    Abstract: Training stability remains a critical bottleneck for Group Relative Policy Optimization (GRPO), often manifesting as a trade-off between reasoning plasticity and general capability retention. We identify a root cause as the geometric conflict between plasticity and stability gradients, which leads to destructive interference. Crucially, we argue that deterministic projection methods are suboptimal… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  32. arXiv:2602.05444  [pdf, ps, other

    cs.CL

    Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs

    Authors: Yao Zhou, Zeen Song, Wenwen Qiang, Fengge Wu, Shuyi Zhou, Changwen Zheng, Hui Xiong

    Abstract: Safety alignment mechanisms in Large Language Models (LLMs) often operate as latent internal states, obscuring the model's inherent capabilities. Building on this observation, we model the safety mechanism as an unobserved confounder from a causal perspective. Then, we propose the Causal Front-Door Adjustment Attack (CFA{$^2$}) to jailbreak LLM, which is a framework that leverages Pearl's Front-Do… ▽ More

    Submitted 6 February, 2026; v1 submitted 5 February, 2026; originally announced February 2026.

  33. arXiv:2602.01077  [pdf, ps, other

    cs.CV

    PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers

    Authors: Haopeng Li, Shitong Shao, Wenliang Zhong, Zikai Zhou, Lichen Bai, Hui Xiong, Zeke Xie

    Abstract: Diffusion Transformers are fundamental for video and image generation, but their efficiency is bottlenecked by the quadratic complexity of attention. While block sparse attention accelerates computation by attending only critical key-value blocks, it suffers from degradation at high sparsity by discarding context. In this work, we discover that attention scores of non-critical blocks exhibit distr… ▽ More

    Submitted 3 February, 2026; v1 submitted 1 February, 2026; originally announced February 2026.

    Comments: 17 pages

  34. arXiv:2601.18692  [pdf, ps, other

    cs.RO cs.CV

    A Pragmatic VLA Foundation Model

    Authors: Wei Wu, Fan Lu, Yunnan Wang, Shuai Yang, Shi Liu, Fangjing Wang, Qian Zhu, He Sun, Yong Wang, Shuailei Ma, Yiyu Ren, Kejia Zhang, Hui Yu, Jingmei Zhao, Shuai Zhou, Zhenqi Qiu, Houlong Xiong, Ziyu Wang, Zechen Wang, Ran Cheng, Yong-Lu Li, Yongtao Huang, Xing Zhu, Yujun Shen, Kecheng Zheng

    Abstract: Offering great potential in robotic manipulation, a capable Vision-Language-Action (VLA) foundation model is expected to faithfully generalize across tasks and platforms while ensuring cost efficiency (e.g., data and GPU hours required for adaptation). To this end, we develop LingBot-VLA with around 20,000 hours of real-world data from 9 popular dual-arm robot configurations. Through a systematic… ▽ More

    Submitted 25 February, 2026; v1 submitted 26 January, 2026; originally announced January 2026.

    Comments: Project Webpage: https://technology.robbyant.com/lingbot-vla/, Code: https://github.com/Robbyant/lingbot-vla/, GM-100: https://huggingface.co/datasets/robbyant/lingbot-GM-100

  35. arXiv:2601.14959  [pdf, ps, other

    cs.CV

    Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers

    Authors: Xinyu Peng, Han Li, Yuyang Huang, Ziyang Zheng, Yaoming Wang, Xin Chen, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Existing video frame interpolation (VFI) methods often adopt a frame-centric approach, processing videos as independent short segments (e.g., triplets), which leads to temporal inconsistencies and motion artifacts. To overcome this, we propose a holistic, video-centric paradigm named Local Diffusion Forcing for Video Frame Interpolation (LDF-VFI). Our framework is built upon an auto-regressive dif… ▽ More

    Submitted 30 March, 2026; v1 submitted 21 January, 2026; originally announced January 2026.

  36. arXiv:2601.14628  [pdf, ps, other

    cs.RO cs.AI

    A Brain-inspired Embodied Intelligence for Fluid and Fast Reflexive Robotics Control

    Authors: Weiyu Guo, He Zhang, Pengteng Li, Tiefu Cai, Ziyang Chen, Yandong Guo, Xiao He, Yongkui Yang, Ying Sun, Hui Xiong

    Abstract: Recent advances in embodied intelligence have leveraged massive scaling of data and model parameters to master natural-language command following and multi-task control. In contrast, biological systems demonstrate an innate ability to acquire skills rapidly from sparse experience. Crucially, current robotic policies struggle to replicate the dynamic stability, reflexive responsiveness, and tempora… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

  37. arXiv:2601.11634  [pdf, ps, other

    cs.CV

    When Rules Fall Short: Agent-Driven Discovery of Emerging Content Issues in Short Video Platforms

    Authors: Chenghui Yu, Hongwei Wang, Junwen Chen, Zixuan Wang, Bingfeng Deng, Zhuolin Hao, Hongyu Xiong, Yang Song

    Abstract: Trends on short-video platforms evolve at a rapid pace, with new content issues emerging every day that fall outside the coverage of existing annotation policies. However, traditional human-driven discovery of emerging issues is too slow, which leads to delayed updates of annotation policies and poses a major challenge for effective content governance. In this work, we propose an automatic issue d… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

  38. arXiv:2601.10193  [pdf, ps, other

    cs.AI

    GFM4GA: Graph Foundation Model for Group Anomaly Detection

    Authors: Jiujiu Chen, Weijun Zeng, Shaofeng Hu, Sihong Xie, Hui Xiong

    Abstract: Group anomaly detection is crucial in many network applications, but faces challenges due to diverse anomaly patterns. Motivated by the success of large language models (LLMs) in natural language processing, graph foundation models (GFMs) is proposed to handle few-shot learning task with fewer labeling efforts. GFMs have been successfully applied to detection of individual anomalies but cannot be… ▽ More

    Submitted 15 January, 2026; originally announced January 2026.

  39. arXiv:2601.08107  [pdf, ps, other

    cs.LG cs.AI eess.SY

    STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order

    Authors: Chengyang Gu, Yuxin Pan, Hui Xiong, Yize Chen

    Abstract: Offline reinforcement learning (RL) enables policy learning from pre-collected datasets, avoiding costly and risky online interactions, but it often struggles with long-horizon tasks involving sparse rewards. Existing goal-conditioned and hierarchical offline RL methods decompose such tasks and generate intermediate rewards to mitigate limitations of traditional offline RL, but usually overlook te… ▽ More

    Submitted 12 January, 2026; originally announced January 2026.

    Comments: Accepted at International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

  40. arXiv:2601.03969  [pdf, ps, other

    cs.AI cs.CL

    Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models

    Authors: Wei Wu, Liyi Chen, Congxi Xiao, Tianfu Wang, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui Xiong

    Abstract: Large reasoning models enhanced by reinforcement learning with verifiable rewards have achieved significant performance gains by extending their chain-of-thought. However, this paradigm incurs substantial deployment costs as models often exhibit excessive verbosity on simple queries. Existing efficient reasoning methods relying on explicit length penalties often introduce optimization conflicts an… ▽ More

    Submitted 7 January, 2026; originally announced January 2026.

  41. arXiv:2601.00644  [pdf, ps, other

    cs.DC

    FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding

    Authors: Yuchen Li, Rui Kong, Zhonghao Lyu, Qiyang Li, Xinran Chen, Hengyi Cai, Lingyong Yan, Shuaiqiang Wang, Jiashu Zhao, Guangxu Zhu, Linghe Kong, Guihai Chen, Haoyi Xiong, Dawei Yin

    Abstract: Deploying large language models (LLMs) in mobile and edge computing environments is constrained by limited on-device resources, scarce wireless bandwidth, and frequent model evolution. Although edge-cloud collaborative inference with speculative decoding (SD) can reduce end-to-end latency by executing a lightweight draft model at the edge and verifying it with a cloud-side target model, existing f… ▽ More

    Submitted 2 January, 2026; originally announced January 2026.

  42. arXiv:2601.00451  [pdf, ps, other

    cs.LG

    Controllable Concept Bottleneck Models

    Authors: Hongbin Lin, Chenyang Ren, Juangui Xu, Zhengyu Hu, Cheng-Long Wang, Yao Shu, Hui Xiong, Jingfeng Zhang, Di Wang, Lijie Hu

    Abstract: Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studies focused on static scenarios where the data and concepts are assumed to be fixed and clean. In real-world applications, deployed models require continuous maintenance: we often need to remove erroneous or sen… ▽ More

    Submitted 1 January, 2026; originally announced January 2026.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.15476

  43. arXiv:2512.22972  [pdf, ps, other

    cs.CV eess.SP

    Wavelet-based Multi-View Fusion of 4D Radar Tensor and Camera for Robust 3D Object Detection

    Authors: Runwei Guan, Jianan Liu, Shaofeng Liang, Fangqiang Ding, Shanliang Yao, Xiaokai Bai, Daizong Liu, Tao Huang, Guoqiang Mao, Hui Xiong

    Abstract: 4D millimeter-wave (mmWave) radar has been widely adopted in autonomous driving and robot perception due to its low cost and all-weather robustness. However, point-cloud-based radar representations suffer from information loss due to multi-stage signal processing, while directly utilizing raw 4D radar tensors incurs prohibitive computational costs. To address these challenges, we propose WRCFormer… ▽ More

    Submitted 15 January, 2026; v1 submitted 28 December, 2025; originally announced December 2025.

    Comments: 10 pages, 10 figures

  44. arXiv:2512.18411  [pdf, ps, other

    cs.CV cs.AI

    AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning

    Authors: Fei Song, Yi Li, Jiangmeng Li, Rui Wang, Changwen Zheng, Fanjiang Xu, Hui Xiong

    Abstract: Multi-prompt learning methods have emerged as an effective approach for facilitating the rapid adaptation of vision-language models to downstream tasks with limited resources. Existing multi-prompt learning methods primarily focus on utilizing various meticulously designed prompts within a single foundation vision-language model to achieve superior performance. However, the overlooked model-prompt… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

    Comments: Accepted by IJCV2025

  45. arXiv:2512.16842  [pdf, ps, other

    cs.CV cs.AI cs.RO

    OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

    Authors: Yuxin Ray Song, Jinzhou Li, Rao Fu, Devin Murphy, Kaichen Zhou, Rishi Shiv, Yaqi Li, Haoyu Xiong, Crystal Elaine Owens, Yilun Du, Yiyue Luo, Xianyi Cheng, Antonio Torralba, Wojciech Matusik, Paul Pu Liang

    Abstract: The human hand is our primary interface to the physical world, yet egocentric perception rarely knows when, where, or how forcefully it makes contact. Robust wearable tactile sensors are scarce, and no existing in-the-wild datasets align first-person video with full-hand touch. To bridge the gap between visual perception and physical interaction, we present OpenTouch, the first in-the-wild egocent… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

    Comments: https://opentouch-tactile.github.io/

  46. arXiv:2512.14200  [pdf, ps, other

    cs.CV

    Beyond a Single Light: A Large-Scale Aerial Dataset for Urban Scene Reconstruction Under Varying Illumination

    Authors: Zhuoxiao Li, Wenzong Ma, Taoyu Wu, Jinjing Zhu, Zhenchao Q, Shuai Zhang, Jing Ou, Yinrui Ren, Weiqing Qi, Guobin Shen, Hui Xiong, Wufan Zhao

    Abstract: Recent advances in Neural Radiance Fields and 3D Gaussian Splatting have demonstrated strong potential for large-scale UAV-based 3D reconstruction tasks by fitting the appearance of images. However, real-world large-scale captures are often based on multi-temporal data capture, where illumination inconsistencies across different times of day can significantly lead to color artifacts, geometric ina… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  47. arXiv:2512.13120  [pdf, ps, other

    cs.IR cs.LG

    Towards Practical Large-scale Dynamical Heterogeneous Graph Embedding: Cold-start Resilient Recommendation

    Authors: Mabiao Long, Jiaxi Liu, Yufeng Li, Hao Xiong, Junchi Yan, Kefan Wang, Yi Cao, Jiandong Ding

    Abstract: Deploying dynamic heterogeneous graph embeddings in production faces key challenges of scalability, data freshness, and cold-start. This paper introduces a practical, two-stage solution that balances deep graph representation with low-latency incremental updates. Our framework combines HetSGFormer, a scalable graph transformer for static learning, with Incremental Locally Linear Embedding (ILLE),… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  48. arXiv:2512.10450  [pdf, ps, other

    cs.CV

    Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment

    Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Xinlong Pan, Haipeng Wang, Junni Zou, Hongkai Xiong

    Abstract: Existing frameworks for learned video compression suffer from a dilemma between inaccurate temporal alignment and error propagation for motion estimation and compensation (ME/MC). The separate-transform framework employs distinct transforms for intra-frame and inter-frame compression to yield impressive rate-distortion (R-D) performance but causes evident error propagation, while the unified-trans… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  49. arXiv:2512.04313  [pdf, ps, other

    cs.CV

    Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding

    Authors: Haolin Xiong, Tianwen Fu, Pratusha Bhuvana Prasad, Yunxuan Cai, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao

    Abstract: Current expressive avatar systems rely heavily on visual cues, failing when faces are occluded or when emotions remain internal. We present Mind-to-Face, the first framework that decodes non-invasive electroencephalogram (EEG) signals directly into high-fidelity facial expressions. We build a dual-modality recording setup to obtain synchronized EEG and multi-view facial video during emotion-elicit… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: 16 pages, 11 figures

  50. arXiv:2512.00030  [pdf, ps, other

    cs.RO cs.AI

    Perturbation-mitigated USV Navigation with Distributionally Robust Reinforcement Learning

    Authors: Zhaofan Zhang, Minghao Yang, Sihong Xie, Hui Xiong

    Abstract: The robustness of Unmanned Surface Vehicles (USV) is crucial when facing unknown and complex marine environments, especially when heteroscedastic observational noise poses significant challenges to sensor-based navigation tasks. Recently, Distributional Reinforcement Learning (DistRL) has shown promising results in some challenging autonomous navigation tasks without prior environmental informatio… ▽ More

    Submitted 7 November, 2025; originally announced December 2025.