Skip to main content

Showing 1–50 of 372 results for author: Pan, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.07958  [pdf, ps, other

    cs.CV

    ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks

    Authors: Jiayang Xu, Fan Zhuo, Majun Zhang, Changhao Pan, Zehan Wang, Siyu Chen, Xiaoda Yang, Tao Jin, Zhou Zhao

    Abstract: Current video editing models often rely on expensive paired video data, which limits their practical scalability. In essence, most video editing tasks can be formulated as a decoupled spatiotemporal process, where the temporal dynamics of the pretrained model are preserved while spatial content is selectively and precisely modified. Based on this insight, we propose ImVideoEdit, an efficient frame… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  2. WeatherRemover: All-in-one Adverse Weather Removal with Multi-scale Feature Map Compression

    Authors: Weikai Qu, Sijun Liang, Cheng Pan, Zikuan Yang, Guanchi Zhou, Xianjun Fu, Bo Liu, Changmiao Wang, Ahmed Elazab

    Abstract: Photographs taken in adverse weather conditions often suffer from blurriness, occlusion, and low brightness due to interference from rain, snow, and fog. These issues can significantly hinder the performance of subsequent computer vision tasks, making the removal of weather effects a crucial step in image enhancement. Existing methods primarily target specific weather conditions, with only a few c… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: Accepted by IEEE Transactions on Artificial Intelligence

  3. Balancing Efficiency and Restoration: Lightweight Mamba-Based Model for CT Metal Artifact Reduction

    Authors: Weikai Qu, Sijun Liang, Xianfeng Li, Cheng Pan, An Yan, Ahmed Elazab, Shanzhou Niu, Dong Zeng, Xiang Wan, Changmiao Wang

    Abstract: In computed tomography imaging, metal implants frequently generate severe artifacts that compromise image quality and hinder diagnostic accuracy. There are three main challenges in the existing methods: the deterioration of organ and tissue structures, dependence on sinogram data, and an imbalance between resource use and restoration efficiency. Addressing these issues, we introduce MARMamba, whic… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: Accepted by IEEE Transactions on Radiation and Plasma Medical Sciences

  4. arXiv:2604.02289  [pdf, ps, other

    cs.CV cs.AI

    Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation

    Authors: Chongjie Ye, Cheng Cao, Chuanyu Pan, Yiming Hao, Yihao Zhi, Yuanming Hu, Xiaoguang Han

    Abstract: Recent multimodal large language models have achieved strong performance in unified text and image understanding and generation, yet extending such native capability to 3D remains challenging due to limited data. Compared to abundant 2D imagery, high-quality 3D assets are scarce, making 3D synthesis under-constrained. Existing methods often rely on indirect pipelines that edit in 2D and lift resul… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  5. arXiv:2603.22866  [pdf, ps, other

    cs.IT

    Aerial Agentic AI: Synergizing LLM and SLM for Low-Altitude Wireless Networks

    Authors: Li Dong, Feibo Jiang, Kezhi Wang, Cunhua Pan, Dong In Kim, Ekram Hossain

    Abstract: Low-Altitude Wireless Networks (LAWNs), composed of Unmanned Aerial Vehicles (UAVs) and mobile terminals, are emerging as a critical extension of 6G. However, applying Large Language Models in LAWNs faces three major challenges: 1) Computational and energy constraints; 2) Communication and bandwidth limitations; 3) Real-time and reliability conflicts. To address these challenges, we propose Aerial… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  6. arXiv:2603.18273  [pdf, ps, other

    cs.AI

    EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

    Authors: Chenguang Pan, Zhou Zhang, Weixuan Xiao, Chengyuan Yao

    Abstract: In this technical report, we present the Educational Data Mining Automated Research System (EDM-ARS), a domain-specific multi-agent pipeline that automates end-to-end educational data mining (EDM) research. We conceptualize EDM-ARS as a general framework for domain-aware automated research pipelines, where educational expertise is embedded into each stage of the research lifecycle. As a first inst… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  7. arXiv:2603.14889  [pdf, ps, other

    eess.AS cs.CL cs.LG

    Modeling and Benchmarking Spoken Dialogue Rewards with Modality and Colloquialness

    Authors: Jingyu Lu, Yuhan Wang, Fan Zhuo, Xize Cheng, Changhao Pan, Xueyi Pu, Yifu Chen, Chenyuhao Wen, Tianle Liang, Zhou Zhao

    Abstract: The rapid evolution of end-to-end spoken dialogue systems demands transcending mere textual semantics to incorporate paralinguistic nuances and the spontaneous nature of human conversation. However, current methods struggle with two critical gaps: the modality gap, involving prosody and emotion, and the colloquialness gap, distinguishing written scripts from natural speech. To address these challe… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  8. arXiv:2603.09293  [pdf, ps, other

    cs.IT

    Tensor Train Decomposition-based Channel Estimation for MIMO-AFDM Systems with Fractional Delay and Doppler

    Authors: Ruizhe Wang, Cunhua Pan, Hong Ren, Haisu Wu, Jiangzhou Wang

    Abstract: Affine Frequency Division Multiplexing (AFDM) has emerged as a promising chirp-based multicarrier technology for high-speed communication systems. To fully exploit the diversity gain offered by AFDM, accurate channel estimation is essential. However, existing studies have mainly focused on the integer-delay-tap scenario and single-symbol pilot-based estimation. Since delay taps in practice are gen… ▽ More

    Submitted 19 March, 2026; v1 submitted 10 March, 2026; originally announced March 2026.

  9. arXiv:2603.06887  [pdf, ps, other

    cs.RO

    VertiAdaptor: Online Kinodynamics Adaptation for Vertically Challenging Terrain

    Authors: Tong Xu, Chenhui Pan, Aniket Datar, Xuesu Xiao

    Abstract: Autonomous driving in off-road environments presents significant challenges due to the dynamic and unpredictable nature of unstructured terrain. Traditional kinodynamic models often struggle to generalize across diverse geometric and semantic terrain types, underscoring the need for real-time adaptation to ensure safe and reliable navigation. We propose VertiAdaptor (VA), a novel online adaptation… ▽ More

    Submitted 20 March, 2026; v1 submitted 6 March, 2026; originally announced March 2026.

  10. arXiv:2603.06866  [pdf, ps, other

    cs.RO

    CAR: Cross-Vehicle Kinodynamics Adaptation via Mobility Representation

    Authors: Tong Xu, Chenhui Pan, Xuesu Xiao

    Abstract: Developing autonomous off-road mobility typically requires either extensive, platform-specific data collection or relies on simplified abstractions, such as unicycle or bicycle models, that fail to capture the complex kinodynamics of diverse platforms, ranging from wheeled to tracked vehicles. This limitation hinders scalability across evolving heterogeneous autonomous robot fleets. To address thi… ▽ More

    Submitted 20 March, 2026; v1 submitted 6 March, 2026; originally announced March 2026.

  11. arXiv:2603.00482  [pdf, ps, other

    cs.CV cs.IT

    TokenCom: Vision-Language Model for Multimodal and Multitask Token Communications

    Authors: Feibo Jiang, Siwei Tu, Li Dong, Xiaolong Li, Kezhi Wang, Cunhua Pan, Zhu Han, Jiangzhou Wang

    Abstract: Visual-Language Models (VLMs), with their strong capabilities in image and text understanding, offer a solid foundation for intelligent communications. However, their effectiveness is constrained by limited token granularity, overlong visual token sequences, and inadequate cross-modal alignment. To overcome these challenges, we propose TaiChi, a novel VLM framework designed for token communication… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

  12. arXiv:2602.09336  [pdf, ps, other

    cs.CL

    FM SO.P: A Progressive Task Mixture Framework with Automatic Evaluation for Cross-Domain SOP Understanding

    Authors: Siyuan Huang, Ziyu Wang, Chao Pan, Han Zhao

    Abstract: Standard Operating Procedures (SOPs) are critical for enterprise operations, yet existing language models struggle with SOP understanding and cross-domain generalization. Current methods fail because joint training cannot differentiate between reasoning capabilities that SOP requires: terminology precision, sequential ordering, and constraint reasoning. We propose FM SO.P, solving these challenges… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

  13. arXiv:2602.09065  [pdf, ps, other

    cs.LG cs.AI

    Enhanced Graph Transformer with Serialized Graph Tokens

    Authors: Ruixiang Wang, Yuyang Hong, Shiming Xiang, Chunhong Pan

    Abstract: Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level representations. The prevalent single token paradigm fails to fully leverage the inherent strength of self-attention in encoding token sequences, and degenerates into a weighted sum of node signals. To address this is… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

    Comments: ICASSP 2026

  14. arXiv:2602.06485  [pdf, ps, other

    cs.AI

    AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

    Authors: Haotian Chen, Xin Cong, Shengda Fan, Yuyang Fu, Ziqin Gong, Yaxi Lu, Yishan Li, Boye Niu, Chengjun Pan, Zijun Song, Huadong Wang, Yesai Wu, Yueying Wu, Zihao Xie, Yukun Yan, Zhong Zhang, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: While Large Language Model (LLM)-based agents have shown remarkable potential for solving complex tasks, existing systems remain heavily reliant on large-scale models, leaving the capabilities of edge-scale models largely underexplored. In this paper, we present the first systematic study on training agentic models at the 4B-parameter scale. We identify three primary bottlenecks hindering the perf… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  15. arXiv:2602.00148  [pdf, ps, other

    cs.CV cs.AI

    Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields

    Authors: Shiqian Li, Ruihong Shen, Junfeng Ni, Chang Pan, Chi Zhang, Yixin Zhu

    Abstract: Predicting physical dynamics from raw visual data remains a major challenge in AI. While recent video generation models have achieved impressive visual quality, they still cannot consistently generate physically plausible videos due to a lack of modeling of physical laws. Recent approaches combining 3D Gaussian splatting and physics engines can produce physically plausible videos, but are hindered… ▽ More

    Submitted 12 February, 2026; v1 submitted 29 January, 2026; originally announced February 2026.

    Comments: 43 pages, ICLR 2026

  16. arXiv:2601.20331  [pdf, ps, other

    cs.CV

    GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction

    Authors: Mai Su, Qihan Yu, Zhongtao Wang, Yilong Li, Chengwei Pan, Yisong Chen, Guoping Wang, Fei Zhu

    Abstract: 3D Gaussian Splatting (3DGS) enables efficient rendering, yet accurate surface reconstruction remains challenging due to unreliable geometric supervision. Existing approaches predominantly rely on depth-based reprojection to infer visibility and enforce multi-view consistency, leading to a fundamental circular dependency: visibility estimation requires accurate depth, while depth supervision itsel… ▽ More

    Submitted 2 April, 2026; v1 submitted 28 January, 2026; originally announced January 2026.

  17. arXiv:2601.16935  [pdf, ps, other

    cs.AR cs.OS

    AERO: Adaptive and Efficient Runtime-Aware OTA Updates for Energy-Harvesting IoT

    Authors: Wei Wei, Jingye Xu, Sahidul Islam, Dakai Zhu, Chen Pan, Mimi Xie

    Abstract: Energy-harvesting (EH) Internet of Things (IoT) devices operate under intermittent energy availability, which disrupts task execution and makes energy-intensive over-the-air (OTA) updates particularly challenging. Conventional OTA update mechanisms rely on reboots and incur significant overhead, rendering them unsuitable for intermittently powered systems. Recent live OTA update techniques reduce… ▽ More

    Submitted 23 January, 2026; originally announced January 2026.

    Comments: Accepted at DATE 2026

  18. arXiv:2601.14127  [pdf, ps, other

    cs.CV cs.CL

    The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

    Authors: Renmiao Chen, Yida Lu, Shiyao Cui, Xuan Ouyang, Victor Shea-Jay Huang, Shumin Zhang, Chengwei Pan, Han Qiu, Minlie Huang

    Abstract: As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations. Our extensive evaluations on 19… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

    Comments: *15 pages, 5 figures. Introduces MIR-SafetyBench (2,676 instances; 9 multi-image relations). Equal contribution; †Corresponding author. Code/data: https://github.com/thu-coai/MIR-SafetyBench

  19. arXiv:2601.13910  [pdf, ps, other

    eess.AS cs.SD

    Synthetic Singers: A Review of Deep-Learning-based Singing Voice Synthesis Approaches

    Authors: Changhao Pan, Dongyu Yao, Yu Zhang, Wenxiang Guo, Jingyu Lu, Zhiyuan Zhu, Zhou Zhao

    Abstract: Recent advances in singing voice synthesis (SVS) have attracted substantial attention from both academia and industry. With the advent of large language models and novel generative paradigms, producing controllable, high-fidelity singing voices has become an attainable goal. Yet the field still lacks a comprehensive survey that systematically analyzes deep-learning-based singing voice synthesis sy… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

    Comments: Accepetd by IJCNLP-AACL 2025(Oral)

  20. arXiv:2601.09988  [pdf, ps, other

    cs.RO

    In-the-Wild Compliant Manipulation with UMI-FT

    Authors: Hojung Choi, Yifan Hou, Chuer Pan, Seongheon Hong, Austin Patel, Xiaomeng Xu, Mark R. Cutkosky, Shuran Song

    Abstract: Many manipulation tasks require careful force modulation. With insufficient force the task may fail, while excessive force could cause damage. The high cost, bulky size and fragility of commercial force/torque (F/T) sensors have limited large-scale, force-aware policy learning. We introduce UMI-FT, a handheld data-collection platform that mounts compact, six-axis force/torque sensors on each finge… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

    Comments: submitted to ICRA 2026

  21. arXiv:2601.08363  [pdf, ps, other

    cs.IR cs.CL

    PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark

    Authors: Ziyang Zeng, Dun Zhang, Yu Yan, Xu Sun, Cuiqiaoshu Pan, Yudong Zhou, Yuqing Yang

    Abstract: In real-world documents, the information relevant to a user query may reside anywhere from the beginning to the end. This makes position bias -- a systematic tendency of retrieval models to favor or neglect content based on its location -- a critical concern. Although recent studies have identified such bias, existing analyses focus predominantly on English, fail to disentangle document length fro… ▽ More

    Submitted 12 March, 2026; v1 submitted 13 January, 2026; originally announced January 2026.

    Comments: Work in progress

  22. arXiv:2601.07280  [pdf, ps, other

    cs.CL

    ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios

    Authors: Changzai Pan, Jie Zhang, Kaiwen Wei, Chenshuo Pan, Yu Zhao, Jingwang Huang, Jian Yang, Zhenhe Wu, Haoyang Zeng, Xiaoyan Gu, Weichao Sun, Yanbo Zhai, Yujie Mao, Zhuoru Jiang, Jiang Zhong, Shuangyong Song, Yongxiang Li, Zhongjiang He

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly catalyzed table-based question answering (TableQA). However, existing TableQA benchmarks often overlook the intricacies of industrial scenarios, which are characterized by multi-table structures, nested headers, and massive scales. These environments demand robust table reasoning through deep structured inference, presenting a… ▽ More

    Submitted 12 January, 2026; originally announced January 2026.

  23. arXiv:2512.22131  [pdf

    cs.AR eess.IV

    An Energy-Efficient RFET-Based Stochastic Computing Neural Network Accelerator

    Authors: Sheng Lu, Qianhou Qu, Sungyong Jung, Qilian Liang, Chenyun Pan

    Abstract: Stochastic computing (SC) offers significant reductions in hardware complexity for traditional convolutional neural networks(CNNs). However, despite its advantages, stochastic computing neural networks (SCNNs) often suffer from high resource consumption due to components such as stochastic number generators (SNGs) and accumulative parallel counters (APCs), which limit overall performance. This pap… ▽ More

    Submitted 27 January, 2026; v1 submitted 5 December, 2025; originally announced December 2025.

  24. arXiv:2512.14222  [pdf, ps, other

    cs.CV cs.RO

    History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation

    Authors: Xichen Ding, Jianzhe Gao, Cong Pan, Wenguan Wang, Jie Qin

    Abstract: Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments based on linguistic instructions. While successful navigation demands both global environmental reasoning and local scene comprehension, existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects. To address… ▽ More

    Submitted 16 December, 2025; v1 submitted 16 December, 2025; originally announced December 2025.

  25. arXiv:2512.09663  [pdf, ps, other

    cs.CV

    IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

    Authors: Tao Zhang, Yuyang Hong, Yang Xia, Kun Ding, Zeyu Zhang, Ying Wang, Shiming Xiang, Chunhong Pan

    Abstract: Recent advances in multimodal large language models (MLLMs) have led to impressive progress across various benchmarks. However, their capability in understanding infrared images remains unexplored. To address this gap, we introduce IF-Bench, the first high-quality benchmark designed for evaluating multimodal understanding of infrared images. IF-Bench consists of 499 images sourced from 23 infrared… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

  26. arXiv:2512.04521  [pdf, ps, other

    cs.CV eess.SP

    WiFi-based Cross-Domain Gesture Recognition Using Attention Mechanism

    Authors: Ruijing Liu, Cunhua Pan, Jiaming Zeng, Hong Ren, Kezhi Wang, Lei Kong, Jiangzhou Wang

    Abstract: While fulfilling communication tasks, wireless signals can also be used to sense the environment. Among various types of sensing media, WiFi signals offer advantages such as widespread availability, low hardware cost, and strong robustness to environmental conditions like light, temperature, and humidity. By analyzing Wi-Fi signals in the environment, it is possible to capture dynamic changes of t… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

  27. arXiv:2512.02461  [pdf, ps, other

    cs.IT

    Artificial Noise Aided Physical Layer Security for Near-Field MIMO with Fluid Antenna Systems

    Authors: Peng Zhang, Jian Dang, Miaowen Wen, Ziyang Liu, Chen Zhao, Huaifeng Shi, Chengsheng Pan, Zaichen Zhang

    Abstract: With the evolution of wireless systems toward large-scale arrays and high-frequency reconfigurable architectures, fluid antenna systems (FAS) operating in the near-field (NF) regime provide new degrees of freedom (DoF) for physical layer security (PLS). This paper proposes an artificial-noise (AN)-aided PLS scheme for NF fluid-antenna multiple-input multiple-output (FA-MIMO) systems, with joint be… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  28. arXiv:2512.02353  [pdf, ps, other

    cs.IT

    A Cyclic Shift Embedded Pilot based Channel Estimation for Multi-User MIMO-OTFS systems with fractional delay and Doppler

    Authors: Ruizhe Wang, Hong Ren, Cunhua Pan, Ruisong Weng, Jiangzhou Wang

    Abstract: Orthogonal time frequency space (OTFS) modulation has been proposed to meet the demand for reliable communication in high-mobility scenarios for future wireless networks. However, in multi-user OTFS systems, conventional embedded pilot schemes require independent pilot allocation for each user, leading to linearly increasing pilot overhead. To address these issues, in this paper, we investigate th… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  29. arXiv:2512.01809  [pdf, ps, other

    cs.RO cs.LG

    Much Ado About Noising: Dispelling the Myths of Generative Robotic Control

    Authors: Chaoyi Pan, Giri Anantharaman, Nai-Chieh Huang, Claire Jin, Daniel Pfrommer, Chenyang Yuan, Frank Permenter, Guannan Qu, Nicholas Boffi, Guanya Shi, Max Simchowitz

    Abstract: Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs… ▽ More

    Submitted 23 February, 2026; v1 submitted 1 December, 2025; originally announced December 2025.

  30. arXiv:2512.01128  [pdf, ps, other

    cs.CV

    OmniFD: A Unified Model for Versatile Face Forgery Detection

    Authors: Haotian Liu, Haoyu Chen, Chenhui Pan, You Hu, Guoying Zhao, Xiaobai Li

    Abstract: Face forgery detection encompasses multiple critical tasks, including identifying forged images and videos and localizing manipulated regions and temporal segments. Current approaches typically employ task-specific models with independent architectures, leading to computational redundancy and ignoring potential correlations across related tasks. We introduce OmniFD, a unified framework that jointl… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  31. arXiv:2511.21796  [pdf, ps, other

    cs.ET

    Sneak Path Current Modeling in Memristor Crossbar Arrays for Analog In-Memory Computing

    Authors: Shah Zayed Riam, Zhenlin Pei, Kyle Mooney, Chenyun Pan, Na Gong, Jinhui Wang

    Abstract: Memristor crossbar arrays have emerged as a key component for next-generation non-volatile memories, artificial neural networks, and analog in-memory computing (IMC) systems. By minimizing data transfer between the processor and memory, they offer substantial energy savings. However, a major design challenge in memristor crossbar arrays is the presence of sneak path currents, which degrade electri… ▽ More

    Submitted 14 January, 2026; v1 submitted 26 November, 2025; originally announced November 2025.

  32. arXiv:2511.18794  [pdf, ps, other

    cs.GR cs.CV

    ChronoGS: Disentangling Invariants and Changes in Multi-Period Scenes

    Authors: Zhongtao Wang, Jiaqi Dai, Qingtian Zhu, Yilong Li, Mai Su, Fei Zhu, Meng Gai, Shaorong Wang, Chengwei Pan, Yisong Chen, Guoping Wang

    Abstract: Multi-period image collections are common in real-world applications. Cities are re-scanned for mapping, construction sites are revisited for progress tracking, and natural regions are monitored for environmental change. Such data form multi-period scenes, where geometry and appearance evolve. Reconstructing such scenes is an important yet underexplored problem. Existing pipelines rely on incompat… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

    MSC Class: 68U05

  33. arXiv:2511.18538  [pdf, ps, other

    cs.SE cs.CL

    From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

    Authors: Jian Yang, Xianglong Liu, Weifeng Lv, Ken Deng, Shawn Guo, Lin Jing, Yizhi Li, Shark Liu, Xianzhen Luo, Yuyu Luo, Changzai Pan, Ensheng Shi, Yingshui Tan, Renshuai Tao, Jiajun Wu, Xianjie Wu, Zhenhe Wu, Daoguang Zan, Chenchen Zhang, Wei Zhang, He Zhu, Terry Yue Zhuo, Kerui Cao, Xianfu Cheng, Jun Dong , et al. (46 additional authors not shown)

    Abstract: Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from rule-based systems to Transformer-b… ▽ More

    Submitted 6 December, 2025; v1 submitted 23 November, 2025; originally announced November 2025.

  34. arXiv:2511.09484  [pdf, ps, other

    cs.RO cs.CV

    SPIDER: Scalable Physics-Informed Dexterous Retargeting

    Authors: Chaoyi Pan, Changhao Wang, Haozhi Qi, Zixi Liu, Homanga Bharadhwaj, Akash Sharma, Tingfan Wu, Guanya Shi, Jitendra Malik, Francois Hogan

    Abstract: Learning dexterous and agile policy for humanoid and dexterous hand control requires large-scale demonstrations, but collecting robot-specific data is prohibitively expensive. In contrast, abundant human motion data is readily available from motion capture, videos, and virtual reality, which could help address the data scarcity problem. However, due to the embodiment gap and missing dynamic inform… ▽ More

    Submitted 5 February, 2026; v1 submitted 12 November, 2025; originally announced November 2025.

    Comments: Project website: https://jc-bao.github.io/spider-project/

  35. arXiv:2511.04029  [pdf, ps, other

    cs.CV cs.GR

    Faithful Contouring: Near-Lossless 3D Voxel Representation Free from Iso-surface

    Authors: Yihao Luo, Xianglong He, Chuanyu Pan, Yiwen Chen, Jiaqi Wu, Yangguang Li, Wanli Ouyang, Yuanming Hu, Guang Yang, ChoonHwai Yap

    Abstract: Accurate and efficient voxelized representations of 3D meshes are the foundation of 3D reconstruction and generation. However, existing representations based on iso-surface heavily rely on water-tightening or rendering optimization, which inevitably compromise geometric fidelity. We propose Faithful Contouring, a sparse voxelized representation that supports 2048+ resolutions for arbitrary meshes,… ▽ More

    Submitted 12 November, 2025; v1 submitted 5 November, 2025; originally announced November 2025.

  36. arXiv:2510.22339  [pdf, ps, other

    cs.RO

    Estimating Continuum Robot Shape under External Loading using Spatiotemporal Neural Networks

    Authors: Enyi Wang, Zhen Deng, Chuanchuan Pan, Bingwei He, Jianwei Zhang

    Abstract: This paper presents a learning-based approach for accurately estimating the 3D shape of flexible continuum robots subjected to external loads. The proposed method introduces a spatiotemporal neural network architecture that fuses multi-modal inputs, including current and historical tendon displacement data and RGB images, to generate point clouds representing the robot's deformed configuration. Th… ▽ More

    Submitted 25 October, 2025; originally announced October 2025.

    Comments: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  37. arXiv:2510.18477  [pdf, ps, other

    cs.AI cs.CR cs.DC cs.MA

    LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources

    Authors: Haichao Ji, Zibo Wang, Cheng Pan, Meng Han, Yifei Zhu, Dan Wang, Zhu Han

    Abstract: Large Language Models (LLMs) have shown great promise in automating data analytics tasks by interpreting natural language queries and generating multi-operation execution plans. However, existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection. In contrast, federated analytics (FA) enables privacy-preserving com… ▽ More

    Submitted 30 October, 2025; v1 submitted 21 October, 2025; originally announced October 2025.

    Comments: This paper has been accepted by the 16th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2025)

  38. arXiv:2510.13995  [pdf, ps, other

    cs.CV cs.AI

    Finding Holes: Pathologist Level Performance Using AI for Cribriform Morphology Detection in Prostate Cancer

    Authors: Kelvin Szolnoky, Anders Blilie, Nita Mulliqi, Toyonori Tsuzuki, Hemamali Samaratunga, Matteo Titus, Xiaoyi Ji, Sol Erika Boman, Einar Gudlaugsson, Svein Reidar Kjosavik, José Asenjo, Marcello Gambacorta, Paolo Libretti, Marcin Braun, Radisław Kordek, Roman Łowicki, Brett Delahunt, Kenneth A. Iczkowski, Theo van der Kwast, Geert J. L. H. van Leenders, Katia R. M. Leite, Chin-Chen Pan, Emiel Adrianus Maria Janssen, Martin Eklund, Lars Egevad , et al. (1 additional authors not shown)

    Abstract: Background: Cribriform morphology in prostate cancer is a histological feature that indicates poor prognosis and contraindicates active surveillance. However, it remains underreported and subject to significant interobserver variability amongst pathologists. We aimed to develop and validate an AI-based system to improve cribriform pattern detection. Methods: We created a deep learning model usin… ▽ More

    Submitted 15 October, 2025; originally announced October 2025.

  39. arXiv:2510.10396  [pdf, ps, other

    cs.SD

    MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations

    Authors: Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Xintong Hu, Yu Zhang, Li Tang, Rui Yang, Han Wang, Zongbao Zhang, Yuhan Wang, Yixuan Chen, Hankun Xu, Ke Xu, Pengfei Fan, Zhetao Chen, Yanhao Yu, Qiange Huang, Fei Wu, Zhou Zhao

    Abstract: Humans rely on multisensory integration to perceive spatial environments, where auditory cues enable sound source localization in three-dimensional space. Despite the critical role of spatial audio in immersive technologies such as VR/AR, most existing multimodal datasets provide only monaural audio, which limits the development of spatial audio generation and understanding. To address these chall… ▽ More

    Submitted 17 October, 2025; v1 submitted 11 October, 2025; originally announced October 2025.

    Comments: 24 pages

  40. arXiv:2510.09266  [pdf, ps, other

    cs.CL

    CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation

    Authors: Kaiwen Wei, Xiao Liu, Jie Zhang, Zijian Wang, Ruida Liu, Yuming Yang, Xin Xiao, Xiao Sun, Haoyang Zeng, Changzai Pan, Yidan Zhang, Jiang Zhong, Peijin Wang, Yingchao Feng

    Abstract: Multimodal Retrieval-Augmented Generation (MRAG) enables Multimodal Large Language Models (MLLMs) to generate responses with external multimodal evidence, and numerous video-based MRAG benchmarks have been proposed to evaluate model capabilities across retrieval and generation stages. However, existing benchmarks remain limited in modality coverage and format diversity, often focusing on single- o… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  41. arXiv:2510.02614  [pdf, ps, other

    cs.RO

    UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies

    Authors: Harsh Gupta, Xiaofeng Guo, Huy Ha, Chuer Pan, Muqing Cao, Dongjae Lee, Sebastian Scherer, Shuran Song, Guanya Shi

    Abstract: We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in c… ▽ More

    Submitted 13 March, 2026; v1 submitted 2 October, 2025; originally announced October 2025.

    Comments: Result videos can be found at umi-on-air.github.io

  42. arXiv:2509.01312  [pdf, ps, other

    cs.CL

    TableZoomer: A Collaborative Agent Framework for Large-scale Table Question Answering

    Authors: Sishi Xiong, Ziyang He, Zhongjiang He, Yu Zhao, Changzai Pan, Jie Zhang, Zhenhe Wu, Shuangyong Song, Yongxiang Li

    Abstract: While large language models (LLMs) have shown promise in the table question answering (TQA) task through prompt engineering, they face challenges in industrial applications, including structural heterogeneity, difficulties in target data localization, and bottlenecks in complex reasoning. To address these limitations, this paper presents TableZoomer, a novel LLM-powered, programming-based agent fr… ▽ More

    Submitted 1 September, 2025; originally announced September 2025.

  43. arXiv:2508.19813  [pdf, ps, other

    cs.CL

    T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

    Authors: Jie Zhang, Changzai Pan, Kaiwen Wei, Sishi Xiong, Yu Zhao, Xiangyu Li, Jiaxin Peng, Xiaoyan Gu, Jian Yang, Wenhan Chang, Zhenhe Wu, Jiang Zhong, Shuangyong Song, Yongxiang Li, Xuelong Li

    Abstract: Extensive research has been conducted to explore the capabilities of large language models (LLMs) in table reasoning. However, the essential task of transforming tables information into reports remains a significant challenge for industrial applications. This task is plagued by two critical issues: 1) the complexity and diversity of tables lead to suboptimal reasoning outcomes; and 2) existing tab… ▽ More

    Submitted 23 September, 2025; v1 submitted 27 August, 2025; originally announced August 2025.

  44. arXiv:2508.16379  [pdf, ps, other

    cs.IT eess.SP

    Agentic AI Empowered Multi-UAV Trajectory Optimization in Low-Altitude Economy Networks

    Authors: Feibo Jiang, Li Dong, Xitao Pan, Kezhi Wang, Cunhua Pan

    Abstract: This paper proposes a novel Agentic Retrieval-augmented generation with Mamba-Attention Integrated Transformer (ARMAIT) framework for multi-Unmanned Aerial Vehicle (UAV) trajectory optimization. The framework is built upon Large Language Models (LLMs), incorporating Retrieval-Augmented Generation (RAG) empowered by Agentic AI and integrated with a UAV-specific knowledge base. Through the Agentic R… ▽ More

    Submitted 22 August, 2025; originally announced August 2025.

  45. arXiv:2508.10924  [pdf, ps, other

    eess.AS cs.SD

    ASAudio: A Survey of Advanced Spatial Audio Research

    Authors: Zhiyuan Zhu, Yu Zhang, Wenxiang Guo, Changhao Pan, Zhou Zhao

    Abstract: With the rapid development of spatial audio technologies today, applications in AR, VR, and other scenarios have garnered extensive attention. Unlike traditional mono sound, spatial audio offers a more realistic and immersive auditory experience. Despite notable progress in the field, there remains a lack of comprehensive surveys that systematically organize and analyze these methods and their und… ▽ More

    Submitted 20 August, 2025; v1 submitted 8 August, 2025; originally announced August 2025.

  46. arXiv:2508.08606  [pdf, ps, other

    cs.LG math.OC stat.ML

    Distributed optimization: designed for federated learning

    Authors: Wenyou Guo, Ting Qu, Chunrong Pan, George Q. Huang

    Abstract: Federated learning (FL), as a distributed collaborative machine learning (ML) framework under privacy-preserving constraints, has garnered increasing research attention in cross-organizational data collaboration scenarios. This paper proposes a class of distributed optimization algorithms based on the augmented Lagrangian technique, designed to accommodate diverse communication topologies in both… ▽ More

    Submitted 30 October, 2025; v1 submitted 11 August, 2025; originally announced August 2025.

    Comments: 16 pages, 6 figures

  47. arXiv:2508.05087  [pdf, ps, other

    cs.MM cs.AI cs.CL cs.CR

    JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering

    Authors: Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, Minlie Huang

    Abstract: Jailbreak attacks against multimodal large language Models (MLLMs) are a significant research focus. Current research predominantly focuses on maximizing attack success rate (ASR), often overlooking whether the generated responses actually fulfill the attacker's malicious intent. This oversight frequently leads to low-quality outputs that bypass safety filters but lack substantial harmful content.… ▽ More

    Submitted 7 August, 2025; originally announced August 2025.

    Comments: 10 pages, 3 tables, 2 figures, to appear in the Proceedings of the 33rd ACM International Conference on Multimedia (MM '25)

    ACM Class: I.2.7; K.4.1; K.6.5

  48. arXiv:2507.19017  [pdf, ps, other

    cs.LG cs.AI

    MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

    Authors: Laingjun Feng, Chenyi Pan, Xinjie Guo, Fei Mei, Benzhe Ning, Jianxiang Zhang, Xinyang Liu, Beirong Zhou, Zeng Shu, Chang Liu, Guang Yang, Zhenyu Han, Jiangben Wang, Bo Wang

    Abstract: Reinforcement learning (RL) is a paradigm increasingly used to align large language models. Popular RL algorithms utilize multiple workers and can be modeled as a graph, where each node is the status of a worker and each edge represents dataflow between nodes. Owing to the heavy cross-node dependencies, the RL training system usually suffers from poor cluster scalability and low memory utilization… ▽ More

    Submitted 25 July, 2025; originally announced July 2025.

    Comments: 9 pages

    MSC Class: CS

  49. arXiv:2507.16666  [pdf, ps, other

    cs.IT eess.SP

    Reconfigurable Intelligent Surface-Enabled Green and Secure Offloading for Mobile Edge Computing Networks

    Authors: Tong-Xing Zheng, Xinji Wang, Xin Chen, Di Mao, Jia Shi, Cunhua Pan, Chongwen Huang, Haiyang Ding, Zan Li

    Abstract: This paper investigates a multi-user uplink mobile edge computing (MEC) network, where the users offload partial tasks securely to an access point under the non-orthogonal multiple access policy with the aid of a reconfigurable intelligent surface (RIS) against a multi-antenna eavesdropper. We formulate a non-convex optimization problem of minimizing the total energy consumption subject to secure… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

    Comments: 15 pages, 9 figures, accepted by IEEE Internet of Things Journal

  50. arXiv:2507.09061  [pdf, ps, other

    cs.LG eess.SY stat.ML

    Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control

    Authors: Thomas T. Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, Max Simchowitz

    Abstract: This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can su… ▽ More

    Submitted 26 November, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

    Comments: Updated manuscript. New visualization figures and control-theory primer