Skip to main content

Showing 1–50 of 535 results for author: Cao, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.05096  [pdf, ps, other

    cs.CL

    RAG or Learning? Understanding the Limits of LLM Adaptation under Continuous Knowledge Drift in the Real World

    Authors: Hanbing Liu, Lang Cao, Yang Li

    Abstract: Large language models (LLMs) acquire most of their knowledge during pretraining, which ties them to a fixed snapshot of the world and makes adaptation to continuously evolving knowledge challenging. As facts, entities, and events change over time, models may experience continuous knowledge drift, resulting not only in outdated predictions but also in temporally inconsistent reasoning. Although exi… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  2. arXiv:2604.04295  [pdf, ps, other

    cs.CL

    Adaptive Cost-Efficient Evaluation for Reliable Patent Claim Validation

    Authors: Yongmin Yoo, Qiongkai Xu, Longbing Cao

    Abstract: Automated validation of patent claims demands zero-defect tolerance, as even a single structural flaw can render a claim legally defective. Existing evaluation paradigms suffer from a rigidity-resource dilemma: lightweight encoders struggle with nuanced legal dependencies, while exhaustive verification via Large Language Models (LLMs) is prohibitively costly. To bridge this gap, we propose ACE (Ad… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  3. arXiv:2604.01881  [pdf, ps, other

    cs.CV cs.CL

    HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models

    Authors: Yansong Guo, Chaoyang Zhu, Jiayi Ji, Jianghang Lin, Liujuan Cao

    Abstract: Video Large Language Models (VideoLLMs) have demonstrated impressive capabilities in video understanding, yet the massive number of input video tokens incurs a significant computational burden for deployment. Existing methods mainly prune video tokens at input level while neglecting the inherent information structure embedded in videos and large language models (LLMs). To address this, we propose… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  4. arXiv:2604.01542  [pdf, ps, other

    cs.CV physics.optics

    Universal computational thermal imaging overcoming the ghosting effect

    Authors: Hongyi Xu, Du Wang, Chenjun Zhao, Jiashuo Chen, Jiale Lin, Liqin Cao, Yanfei Zhong, Yiyuan She, Fanglin Bao

    Abstract: Thermal imaging is crucial for night vision but fundamentally hampered by the ghosting effect, a loss of detailed texture in cluttered photon streams. While conventional ghosting mitigation has relied on data post-processing, the recent breakthrough in heat-assisted detection and ranging (HADAR) opens a promising frontier for hyperspectral computational thermal imaging that produces night vision w… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: 9 pages, 6 figures

  5. arXiv:2603.29368  [pdf, ps, other

    cs.CV

    StereoVGGT: A Training-Free Visual Geometry Transformer for Stereo Vision

    Authors: Ziyang Chen, Yansong Qu, You Shen, Xuan Cheng, Liujuan Cao

    Abstract: Driven by the advancement of 3D devices, stereo vision tasks including stereo matching and stereo conversion have emerged as a critical research frontier. Contemporary stereo vision backbones typically rely on either monocular depth estimation (MDE) models or visual foundation models (VFMs). Crucially, these models are predominantly pretrained without explicit supervision of camera poses. Given th… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

  6. arXiv:2603.27195  [pdf, ps, other

    cs.AI

    AutoMS: Multi-Agent Evolutionary Search for Cross-Physics Inverse Microstructure Design

    Authors: Zhenyuan Zhao, Yu Xing, Tianyang Xue, Lingxin Cao, Xin Yan, Lin Lu

    Abstract: Designing microstructures that satisfy coupled cross-physics objectives is a fundamental challenge in material science. This inverse design problem involves a vast, discontinuous search space where traditional topology optimization is computationally prohibitive, and deep generative models often suffer from "physical hallucinations," lacking the capability to ensure rigorous validity. To address t… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

  7. arXiv:2603.20811  [pdf, ps, other

    cs.CV

    Lean Learning Beyond Clouds: Efficient Discrepancy-Conditioned Optical-SAR Fusion for Semantic Segmentation

    Authors: Chenxing Meng, Wuzhou Quan, Yingjie Cai, Liqun Cao, Liyan Zhang, Mingqiang Wei

    Abstract: Cloud occlusion severely degrades the semantic integrity of optical remote sensing imagery. While incorporating Synthetic Aperture Radar (SAR) provides complementary observations, achieving efficient global modeling and reliable cross-modal fusion under cloud interference remains challenging. Existing methods rely on dense global attention to capture long-range dependencies, yet such aggregation i… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: 14 page, 7 figures

  8. arXiv:2603.19621  [pdf, ps, other

    cs.LG cs.AI

    DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management

    Authors: Yaqi Xie, Xinru Hao, Jiaxi Liu, Will Ma, Linwei Xin, Lei Cao, Yidong Zhang

    Abstract: Deep Reinforcement Learning (DRL) provides a general-purpose methodology for training inventory policies that can leverage big data and compute. However, off-the-shelf implementations of DRL have seen mixed success, often plagued by high sensitivity to the hyperparameters used during training. In this paper, we show that by imposing policy regularizations, grounded in classical inventory concepts… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  9. arXiv:2603.19121  [pdf, ps, other

    cs.CV cs.AI

    CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization

    Authors: Weilin Chen, Jiahao Rao, Wenhao Wang, Xinyang Li, Xuan Cheng, Liujuan Cao

    Abstract: The creation of high-fidelity, customizable 3D indoor scene textures remains a significant challenge. While text-driven methods offer flexibility, they lack the precision for fine-grained, instance-level control, and often produce textures with insufficient quality, artifacts, and baked-in shading. To overcome these limitations, we introduce CustomTex, a novel framework for instance-level, high-fi… ▽ More

    Submitted 19 March, 2026; v1 submitted 19 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026. This version integrates the main paper and supplementary material

  10. arXiv:2603.18432  [pdf, ps, other

    cs.LG

    MLOW: Interpretable Low-Rank Frequency Magnitude Decomposition of Multiple Effects for Time Series Forecasting

    Authors: Runze Yang, Longbing Cao, Xiaoming Wu, Xin You, Kun Fang, Jianxun Li, Jie Yang

    Abstract: Separating multiple effects in time series is fundamental yet challenging for time-series forecasting (TSF). However, existing TSF models cannot effectively learn interpretable multi-effect decomposition by their smoothing-based temporal techniques. Here, a new interpretable frequency-based decomposition pipeline MLOW captures the insight: a time series can be represented as a magnitude spectrum m… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  11. arXiv:2603.16436  [pdf, ps, other

    cs.LG

    DISCOVER: A Solver for Distributional Counterfactual Explanations

    Authors: Yikai Gu, Lele Cao, Bo Zhao, Lei Lei, Lei You

    Abstract: Counterfactual explanations (CE) explain model decisions by identifying input modifications that lead to different predictions. Most existing methods operate at the instance level. Distributional Counterfactual Explanations (DCE) extend this setting by optimizing an optimal transport objective that balances proximity to a factual input distribution and alignment to a target output distribution, wi… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: 20 pages, 8 figures, 4 tables

  12. arXiv:2603.15409  [pdf, ps, other

    cs.CL

    SEA-Vision: A Multilingual Benchmark for Comprehensive Document and Scene Text Understanding in Southeast Asia

    Authors: Pengfei Yue, Xingran Zhao, Juntao Chen, Peng Hou, Wang Longchao, Jianghang Lin, Shengchuan Zhang, Anxiang Zeng, Liujuan Cao

    Abstract: Multilingual document and scene text understanding plays an important role in applications such as search, finance, and public services. However, most existing benchmarks focus on high-resource languages and fail to evaluate models in realistic multilingual environments. In Southeast Asia, the diversity of languages, complex writing systems, and highly varied document types make this challenge eve… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: Accepted By CVPR2026

  13. arXiv:2603.13788  [pdf, ps, other

    cs.RO

    ST-VLA: Enabling 4D-Aware Spatiotemporal Understanding for General Robot Manipulation

    Authors: You Wu, Zixuan Chen, Cunxu Ou, Wenxuan Wang, Wenbo Huang, Lin Cao, Yangtao Chen, Weichao Qiu, Xingyue Quan, Jieqi Shi, Jing Huo, Yang Gao

    Abstract: Robotic manipulation in open-world environments requires reasoning across semantics, geometry, and long-horizon action dynamics. Existing hierarchical Vision-Language-Action (VLA) frameworks typically use 2D representations to connect high-level reasoning with low-level control, but lack depth awareness and temporal consistency, limiting robustness in complex 3D scenes. We propose ST-VLA, a hierar… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

    Comments: 25 pages, under review

  14. arXiv:2603.12365  [pdf, ps, other

    cond-mat.mtrl-sci cs.LG math.NA physics.comp-ph stat.CO

    Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws

    Authors: Kaushik Bhattacharya, Lianghao Cao, Andrew Stuart

    Abstract: History-dependent constitutive models serve as macroscopic closures for the aggregated effects of micromechanics. Their parameters are typically learned from experimental data. With a limited experimental budget, eliciting the full range of responses needed to characterize the constitutive relation can be difficult. As a result, the data can be well explained by a range of parameter choices, leadi… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  15. arXiv:2603.07625  [pdf, ps, other

    cs.CV

    Duala: Dual-Level Alignment of Subjects and Stimuli for Cross-Subject fMRI Decoding

    Authors: Shumeng Li, Jintao Guo, Jian Zhang, Yulin Zhou, Luyang Cao, Yinghuan Shi

    Abstract: Cross-subject visual decoding aims to reconstruct visual experiences from brain activity across individuals, enabling more scalable and practical brain-computer interfaces. However, existing methods often suffer from degraded performance when adapting to new subjects with limited data, as they struggle to preserve both the semantic consistency of stimuli and the alignment of brain responses. To ad… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

  16. arXiv:2603.07264  [pdf, ps, other

    cs.RO cs.AI

    Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

    Authors: Jiazhuo Li, Linjiang Cao, Qi Liu, Xi Xiong

    Abstract: Data-efficient learning remains a central challenge in autonomous driving due to the high cost and safety risks of large-scale real-world interaction. Although world-model-based reinforcement learning enables policy optimization through latent imagination, existing approaches often lack explicit mechanisms to encode spatial and kinematic structure essential for driving tasks. In this work, we buil… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

    Comments: 6 pages, 5 figures. Under review at IEEE ITSC

  17. arXiv:2603.03616  [pdf, ps, other

    cs.CV

    LeafInst - Unified Instance Segmentation Network for Fine-Grained Forestry Leaf Phenotype Analysis: A New UAV based Benchmark

    Authors: Taige Luo, Junru Xie, Chenyang Fan, Bingrong Liu, Ruisheng Wang, Yang Shao, Sheng Xu, Lin Cao

    Abstract: Intelligent forest tree breeding has advanced plant phenotyping, yet existing research largely focuses on large-leaf agricultural crops, with limited attention to fine-grained leaf analysis of sapling trees in open-field environments. Natural scenes introduce challenges including scale variation, illumination changes, and irregular leaf morphology. To address these issues, we collected UAV RGB ima… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

  18. arXiv:2603.02896  [pdf, ps, other

    cs.CV

    3D-DRES: Detailed 3D Referring Expression Segmentation

    Authors: Qi Chen, Changli Wu, Jiayi Ji, Yiwei Ma, Liujuan Cao

    Abstract: Current 3D visual grounding tasks only process sentence level detection or segmentation, which critically fails to leverage the rich compositional contextual reasonings within natural language expressions. To address this challenge, we introduce Detailed 3D Referring Expression Segmentation (3D-DRES), a new task that provides a phrase to 3D instance mapping, aiming at enhancing fine-grained 3D vis… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: AAAI2026

  19. arXiv:2602.23711  [pdf, ps, other

    cs.CV

    Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities?

    Authors: Hongbo Jiang, Jie Li, Yunhang Shen, Pingyang Dai, Xing Sun, Haoyu Cao, Liujuan Cao

    Abstract: Unified Multimodal Large Language Models (U-MLLMs) integrate understanding and generation within a single architecture. However, existing evaluations typically assess these capabilities separately, overlooking semantic equivalence, i.e., the ability to manifest consistent reasoning results regardless of the output modality. In this work, we investigate whether current U-MLLMs satisfy this premise.… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

    Comments: Equal contribution by Jie Li

  20. arXiv:2602.22674  [pdf, ps, other

    cs.CV

    SPMamba-YOLO: An Underwater Object Detection Network Based on Multi-Scale Feature Enhancement and Global Context Modeling

    Authors: Guanghao Liao, Zhen Liu, Liyuan Cao, Yonghui Yang, Qi Li

    Abstract: Underwater object detection is a critical yet challenging research problem owing to severe light attenuation, color distortion, background clutter, and the small scale of underwater targets. To address these challenges, we propose SPMamba-YOLO, a novel underwater object detection network that integrates multi-scale feature enhancement with global context modeling. Specifically, a Spatial Pyramid P… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

    Comments: 31 pages, 10 figures, 6 tables. This paper presents SPMamba-YOLO, an underwater object detection framework integrating multi-scale feature enhancement and global context modeling. The work is under review

    ACM Class: I.4.8; I.4.6; I.2.10

  21. arXiv:2602.19944  [pdf, ps, other

    cs.CV

    Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation

    Authors: Yilong Yang, Jianxin Tian, Shengchuan Zhang, Liujuan Cao

    Abstract: Current zero-shot Camouflaged Object Segmentation methods typically employ a two-stage pipeline (discover-then-segment): using MLLMs to obtain visual prompts, followed by SAM segmentation. However, relying solely on MLLMs for camouflaged object discovery often leads to inaccurate localization, false positives, and missed detections. To address these issues, we propose the \textbf{D}iscover-\textbf… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

    Comments: Accepted by CVPR 2026 (main conference)

  22. arXiv:2602.19505  [pdf, ps, other

    cs.CV

    Test-Time Computing for Referring Multimodal Large Language Models

    Authors: Mingrui Wu, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Zhiyuan Liu, Liujuan Cao, Ming-Ming Cheng, Rongrong Ji

    Abstract: We propose ControlMLLM++, a novel test-time adaptation framework that injects learnable visual prompts into frozen multimodal large language models (MLLMs) to enable fine-grained region-based visual reasoning without any model retraining or fine-tuning. Leveraging the insight that cross-modal attention maps intrinsically encode semantic correspondences between textual tokens and visual regions, Co… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

    Comments: arXiv admin note: substantial text overlap with arXiv:2407.21534

  23. arXiv:2602.17568  [pdf, ps, other

    cs.LG cs.AI

    Be Wary of Your Time Series Preprocessing

    Authors: Sofiane Ennadir, Tianze Wang, Oleg Smirnov, Sahar Asadi, Lele Cao

    Abstract: Normalization and scaling are fundamental preprocessing steps in time series modeling, yet their role in Transformer-based models remains underexplored from a theoretical perspective. In this work, we present the first formal analysis of how different normalization strategies, specifically instance-based and global scaling, impact the expressivity of Transformer-based architectures for time series… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

    Comments: Accepted at the AI4TS workshop at AAAI-26

  24. arXiv:2602.13656  [pdf, ps, other

    cs.RO

    A Kung Fu Athlete Bot That Can Do It All Day: Highly Dynamic, Balance-Challenging Motion Dataset and Autonomous Fall-Resilient Tracking

    Authors: Zhongxiang Lei, Lulu Cao, Xuyang Wang, Tianyi Qian, Jinyan Liu, Xuesong Li

    Abstract: Current humanoid motion tracking systems can execute routine and moderately dynamic behaviors, yet significant gaps remain near hardware performance limits and algorithmic robustness boundaries. Martial arts represent an extreme case of highly dynamic human motion, characterized by rapid center-of-mass shifts, complex coordination, and abrupt posture transitions. However, datasets tailored to such… ▽ More

    Submitted 14 February, 2026; originally announced February 2026.

    Comments: 18 pages, 8 figures,5 tables

  25. arXiv:2602.13594  [pdf, ps, other

    cs.AI

    Hippocampus: An Efficient and Scalable Memory Module for Agentic AI

    Authors: Yi Li, Lianjie Cao, Faraz Ahmed, Puneet Sharma, Bingzhe Li

    Abstract: Agentic AI require persistent memory to store user-specific histories beyond the limited context window of LLMs. Existing memory systems use dense vector databases or knowledge-graph traversal (or hybrid), incurring high retrieval latency and poor storage scalability. We introduce Hippocampus, an agentic memory management system that uses compact binary signatures for semantic search and lossless… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

  26. arXiv:2602.13259  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition

    Authors: Xu Zhang, Longbing Cao, Runze Yang, Zhangkai Wu

    Abstract: Speech emotion recognition (SER) is essential for humanoid robot tasks such as social robotic interactions and robotic psychological diagnosis, where interpretable and efficient models are critical for safety and performance. Existing deep models trained on large datasets remain largely uninterpretable, often insufficiently modeling underlying emotional acoustic signals and failing to capture and… ▽ More

    Submitted 2 February, 2026; originally announced February 2026.

    Comments: 13 pages, 5 figures

  27. arXiv:2602.12936  [pdf, ps, other

    cs.CV

    Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation

    Authors: Hongbo Jiang, Jie Li, Xinqi Cai, Tianyu Xie, Yunhang Shen, Pingyang Dai, Liujuan Cao

    Abstract: Practical cloud-edge deployment of Cross-Modal Re-identification (CM-ReID) faces challenges due to maintaining a fragmented ecosystem of specialized cloud models for diverse modalities. While Multi-Modal Large Language Models (MLLMs) offer strong unification potential, existing approaches fail to adapt them into a single end-to-end backbone and lack effective knowledge distillation strategies for… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

    Comments: Equal contribution by Jie Li

  28. arXiv:2602.12285  [pdf, ps, other

    cs.CL cs.AI

    From Biased Chatbots to Biased Agents: Examining Role Assignment Effects on LLM Agent Robustness

    Authors: Linbo Cao, Lihao Sun, Yang Yue

    Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous agents capable of actions with real-world impacts beyond text generation. While persona-induced biases in text generation are well documented, their effects on agent task performance remain largely unexplored, even though such effects pose more direct operational risks. In this work, we present the first systematic case study sho… ▽ More

    Submitted 20 January, 2026; originally announced February 2026.

    Comments: Accepted to the AAAI 2026 TrustAgent Workshop. 6 pages, 4 figures

  29. arXiv:2602.07506  [pdf, ps, other

    cs.RO cs.AI cs.HC

    VividFace: Real-Time and Realistic Facial Expression Shadowing for Humanoid Robots

    Authors: Peizhen Li, Longbing Cao, Xiao-Ming Wu, Yang Zhang

    Abstract: Humanoid facial expression shadowing enables robots to realistically imitate human facial expressions in real time, which is critical for lifelike, facially expressive humanoid robots and affective human-robot interaction. Existing progress in humanoid facial expression imitation remains limited, often failing to achieve either real-time performance or realistic expressiveness due to offline video… ▽ More

    Submitted 14 February, 2026; v1 submitted 7 February, 2026; originally announced February 2026.

    Comments: Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA)

  30. arXiv:2602.07342  [pdf, ps, other

    cs.AI

    SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

    Authors: Shengyue Guan, Yihao Liu, Lang Cao

    Abstract: Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain workflows require reliable long-horizon, multi-step orchestration grounded in domain-specific procedures, which remains challenging for current models. To systematically evaluate LLM performance in this settin… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  31. arXiv:2602.07303  [pdf, ps, other

    cs.DB cs.AI cs.SE

    KRONE: Hierarchical and Modular Log Anomaly Detection

    Authors: Lei Ma, Jinyang Liu, Tieying Zhang, Peter M. VanNostrand, Dennis M. Hofmann, Lei Cao, Elke A. Rundensteiner, Jianjun Chen

    Abstract: Log anomaly detection is crucial for uncovering system failures and security risks. Although logs originate from nested component executions with clear boundaries, this structure is lost when stored as flat sequences. As a result, state-of-the-art methods often miss true dependencies within executions while learning spurious correlations across unrelated events. We propose KRONE, the first hierarc… ▽ More

    Submitted 25 March, 2026; v1 submitted 6 February, 2026; originally announced February 2026.

    Comments: Accepted at ICDE 2026

  32. arXiv:2602.05250  [pdf, ps, other

    cs.CV

    Active Label Cleaning for Reliable Detection of Electron Dense Deposits in Transmission Electron Microscopy Images

    Authors: Jieyun Tan, Shuo Liu, Guibin Zhang, Ziqi Li, Jian Geng, Lei Zhang, Lei Cao

    Abstract: Automated detection of electron dense deposits (EDD) in glomerular disease is hindered by the scarcity of high-quality labeled data. While crowdsourcing reduces annotation cost, it introduces label noise. We propose an active label cleaning method to efficiently denoise crowdsourced datasets. Our approach uses active learning to select the most valuable noisy samples for expert re-annotation, buil… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

    Comments: 10 pages, 6 figures

  33. arXiv:2602.04399  [pdf, ps, other

    cs.CL

    Swordsman: Entropy-Driven Adaptive Block Partition for Efficient Diffusion Language Models

    Authors: Yu Zhang, Xinchen Li, Jialei Zhou, Hongnan Ma, Zhongwei Wan, Yiwei Shi, Duoqian Miao, Qi Zhang, Longbing Cao

    Abstract: Block-wise decoding effectively improves the inference speed and quality in diffusion language models (DLMs) by combining inter-block sequential denoising and intra-block parallel unmasking. However, existing block-wise decoding methods typically partition blocks in a rigid and fixed manner, which inevitably fragments complete semantic or syntactic constituents, leading to suboptimal performance.… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  34. arXiv:2602.03673  [pdf, ps, other

    cs.CV

    Referring Industrial Anomaly Segmentation

    Authors: Pengfei Yue, Xiaokang Jiang, Yilin Lu, Jianghang Lin, Shengchuan Zhang, Liujuan Cao

    Abstract: Industrial Anomaly Detection (IAD) is vital for manufacturing, yet traditional methods face significant challenges: unsupervised approaches yield rough localizations requiring manual thresholds, while supervised methods overfit due to scarce, imbalanced data. Both suffer from the "One Anomaly Class, One Model" limitation. To address this, we propose Referring Industrial Anomaly Segmentation (RIAS)… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  35. arXiv:2602.02171  [pdf

    cs.CV

    Lung Nodule Image Synthesis Driven by Two-Stage Generative Adversarial Networks

    Authors: Lu Cao, Xiquan He, Junying Zeng, Chaoyun Mai, Min Luo

    Abstract: The limited sample size and insufficient diversity of lung nodule CT datasets severely restrict the performance and generalization ability of detection models. Existing methods generate images with insufficient diversity and controllability, suffering from issues such as monotonous texture features and distorted anatomical structures. Therefore, we propose a two-stage generative adversarial networ… ▽ More

    Submitted 2 February, 2026; originally announced February 2026.

  36. arXiv:2602.01345  [pdf, ps, other

    cs.CV

    Adaptive Visual Autoregressive Acceleration via Dual-Linkage Entropy Analysis

    Authors: Yu Zhang, Jingyi Liu, Feng Liu, Duoqian Miao, Qi Zhang, Kexue Fu, Changwei Wang, Longbing Cao

    Abstract: Visual AutoRegressive modeling (VAR) suffers from substantial computational cost due to the massive token count involved. Failing to account for the continuous evolution of modeling dynamics, existing VAR token reduction methods face three key limitations: heuristic stage partition, non-adaptive schedules, and limited acceleration scope, thereby leaving significant acceleration potential untapped.… ▽ More

    Submitted 1 February, 2026; originally announced February 2026.

    Comments: 11 pages, 8 figures

  37. arXiv:2601.22505  [pdf

    cs.DL

    Constructing BERT Models: How Team Dynamics and Focus Shape AI Model Impact

    Authors: Likun Cao, Kai Li

    Abstract: The rapid evolution of AI technologies, exemplified by BERT-family models, has transformed scientific research, yet little is known about their production and recognition dynamics in the scientific system. This study investigates the development and impact of BERT-family models, focusing on team size, topic specialization, and citation patterns behind the models. Using a dataset of 4,208 BERT-rela… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

    Comments: The paper has been accepted by Quantitative Science Studies

  38. arXiv:2601.21340  [pdf, ps, other

    cs.AI

    EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation

    Authors: Lang Cao, Qingyu Chen, Yue Guo

    Abstract: Electronic Health Records (EHRs) provide rich longitudinal clinical evidence that is central to medical decision-making, motivating the use of retrieval-augmented generation (RAG) to ground large language model (LLM) predictions. However, long-horizon EHRs often exceed LLM context limits, and existing approaches commonly rely on truncation or vanilla retrieval strategies that discard clinically re… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

  39. arXiv:2601.18195  [pdf, ps, other

    cs.CV

    QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding

    Authors: Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Kaiwei Zhang, Jun Jia, Dandan Zhu, Guangtao Zhai, Xiongkuo Min

    Abstract: Visual quality assessment (VQA) is increasingly shifting from scalar score prediction toward interpretable quality understanding -- a paradigm that demands \textit{fine-grained spatiotemporal perception} and \textit{auxiliary contextual information}. Current approaches rely on supervised fine-tuning or reinforcement learning on curated instruction datasets, which involve labor-intensive annotation… ▽ More

    Submitted 26 January, 2026; originally announced January 2026.

  40. arXiv:2601.16909  [pdf, ps, other

    cs.AI

    Preventing the Collapse of Peer Review Requires Verification-First AI

    Authors: Lei You, Lele Cao, Iryna Gurevych

    Abstract: This paper argues that AI-assisted peer review should be verification-first rather than review-mimicking. We propose truth-coupling, i.e. how tightly venue scores track latent scientific truth, as the right objective for review tools. We formalize two forces that drive a phase transition toward proxy-sovereign evaluation: verification pressure, when claims outpace verification capacity, and signal… ▽ More

    Submitted 12 February, 2026; v1 submitted 23 January, 2026; originally announced January 2026.

  41. arXiv:2601.15772  [pdf

    cs.CV

    LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting

    Authors: Yuhan Chen, Wenxuan Yu, Guofa Li, Yijun Xu, Ying Fang, Yicui Shi, Long Cao, Wenbo Chu, Keqiang Li

    Abstract: 2D Gaussian Splatting (2DGS) is an emerging explicit scene representation method with significant potential for image compression due to high fidelity and high compression ratios. However, existing low-light enhancement algorithms operate predominantly within the pixel domain. Processing 2DGS-compressed images necessitates a cumbersome decompression-enhancement-recompression pipeline, which compro… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  42. arXiv:2601.10379  [pdf, ps, other

    cs.RO eess.SY

    Online identification of nonlinear time-varying systems with uncertain information

    Authors: He Ren, Gaowei Yan, Hang Liu, Lifeng Cao, Zhijun Zhao, Gang Dang

    Abstract: Digital twins (DTs), serving as the core enablers for real-time monitoring and predictive maintenance of complex cyber-physical systems, impose critical requirements on their virtual models: high predictive accuracy, strong interpretability, and online adaptive capability. However, existing techniques struggle to meet these demands simultaneously: Bayesian methods excel in uncertainty quantificati… ▽ More

    Submitted 15 January, 2026; originally announced January 2026.

  43. HGATSolver: A Heterogeneous Graph Attention Solver for Fluid-Structure Interaction

    Authors: Qin-Yi Zhang, Hong Wang, Siyao Liu, Haichuan Lin, Linying Cao, Xiao-Hu Zhou, Chen Chen, Shuangyi Wang, Zeng-Guang Hou

    Abstract: Fluid-structure interaction (FSI) systems involve distinct physical domains, fluid and solid, governed by different partial differential equations and coupled at a dynamic interface. While learning-based solvers offer a promising alternative to costly numerical simulations, existing methods struggle to capture the heterogeneous dynamics of FSI within a unified framework. This challenge is further… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 40(2), 1534-1542 (2026)

  44. arXiv:2601.06874  [pdf, ps, other

    cs.CV

    MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation

    Authors: Changli Wu, Haodong Wang, Jiayi Ji, Yutian Yao, Chunsai Du, Jihua Kang, Yanwei Fu, Liujuan Cao

    Abstract: Most existing 3D referring expression segmentation (3DRES) methods rely on dense, high-quality point clouds, while real-world agents such as robots and mobile phones operate with only a few sparse RGB views and strict latency constraints. We introduce Multi-view 3D Referring Expression Segmentation (MV-3DRES), where the model must recover scene structure and segment the referred object directly fr… ▽ More

    Submitted 31 March, 2026; v1 submitted 11 January, 2026; originally announced January 2026.

    Comments: Accepted to CVPR 2026; Project Website: https://mvggt.github.io/

  45. arXiv:2601.03703  [pdf, ps, other

    cs.LG cs.AI

    TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL

    Authors: Lang Cao, Hui Ruan, Yongqian Li, Peng Chao, Wu Ning, Haonan Song, Renhong Chen, Yitong Li

    Abstract: Reinforcement learning with group-based objectives, such as Group Relative Policy Optimization (GRPO), is a common framework for aligning large language models on complex reasoning tasks. However, standard GRPO treats each rollout trajectory as an independent flat sequence and assigns a single sequence-level advantage to all tokens, which leads to sample inefficiency and a length bias toward verbo… ▽ More

    Submitted 9 April, 2026; v1 submitted 7 January, 2026; originally announced January 2026.

  46. arXiv:2601.00871  [pdf

    physics.soc-ph cs.DL cs.LG cs.SI

    Deep versus Broad Technology Search and the Timing of Innovation Impact

    Authors: Likun Cao, James Evans

    Abstract: This study offers a new perspective on the depth-versus-breadth debate in innovation strategy, by modeling inventive search within dynamic collective knowledge systems, and underscoring the importance of timing for technological impact. Using frontier machine learning to project patent citation networks in hyperbolic space, we analyze 4.9 million U.S. patents to examine how search strategies give… ▽ More

    Submitted 30 December, 2025; originally announced January 2026.

    Comments: 47 pages, 8 figures, 3 tables

  47. arXiv:2512.24702  [pdf, ps, other

    cs.CV cs.AI

    Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting

    Authors: Kai Ye, Xiaotong You, Jianghang Lin, Jiayi Ji, Pingyang Dai, Liujuan Cao

    Abstract: Reasoning Segmentation requires models to interpret complex, context-dependent linguistic queries to achieve pixel-level localization. Current dominant approaches rely heavily on Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL). However, SFT suffers from catastrophic forgetting and domain dependency, while RL is often hindered by training instability and rigid reliance on predefined rew… ▽ More

    Submitted 31 December, 2025; originally announced December 2025.

  48. arXiv:2512.23982  [pdf, ps, other

    cs.SE cs.AI

    Coding With AI: From a Reflection on Industrial Practices to Future Computer Science and Software Engineering Education

    Authors: Hung-Fu Chang, MohammadShokrolah Shirazi, Lizhou Cao, Supannika Koolmanojwong Mobasser

    Abstract: Recent advances in large language models (LLMs) have introduced new paradigms in software development, including vibe coding, AI-assisted coding, and agentic coding, fundamentally reshaping how software is designed, implemented, and maintained. Prior research has primarily examined AI-based coding at the individual level or in educational settings, leaving industrial practitioners' perspectives un… ▽ More

    Submitted 29 December, 2025; originally announced December 2025.

    Comments: 21 pages, 5 figures

    ACM Class: D.2; K.6.3; K.3.2

  49. arXiv:2512.22560  [pdf, ps, other

    cs.DC cs.AI cs.LG

    RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

    Authors: Wei Gao, Yuheng Zhao, Tianyuan Wu, Shaopan Xiong, Weixun Wang, Dakai An, Lunxi Cao, Dilxat Muhtar, Zichen Liu, Haizhou Zhao, Ju Huang, Siran Yang, Yongbin Li, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang

    Abstract: Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning. Unlike standard LLM post-training, agentic RL workloads are highly heterogeneous, combining compute-intensive prefill phases, bandwidth-bound decoding, and stateful, CPU-heavy environment simulations. We argue that efficient agentic RL training requires disaggregat… ▽ More

    Submitted 27 December, 2025; originally announced December 2025.

    Comments: 17 pages, 17 figures

  50. arXiv:2512.22535  [pdf, ps, other

    cs.NI

    A Lightweight Coordinate-Conditioned Diffusion Approach for 6G C-V2X Radio Environment Maps

    Authors: Liu Cao, Zhaoyu Liu, Dongyu Wei, Yuan Yang, Yukun Pan, Lyutianyang Zhang

    Abstract: Transmitter vehicles that broadcast 6G Cellular Vehicle-to-Everything (C-V2X)-based messages, e.g., Basic Safety Messages (BSMs), are prone to be impacted by PHY issues due to the lack of dynamic high-fidelity Radio Environment Map (REM) with dynamic location variation. This paper explores a lightweight diffusion-based generative approach, the Coordinate-Conditioned Denoising Diffusion Probabilist… ▽ More

    Submitted 30 December, 2025; v1 submitted 27 December, 2025; originally announced December 2025.

    Comments: 5 pages,5 figures