Skip to main content

Showing 1–50 of 296 results for author: Yan, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.12254  [pdf, ps, other

    cs.CR cs.AI

    SpanKey: Dynamic Key Space Conditioning for Neural Network Access Control

    Authors: WenBin Yan

    Abstract: SpanKey is a lightweight way to gate inference without encrypting weights or chasing leaderboard accuracy on gated inference. The idea is to condition activations on secret keys. A basis matrix $B$ defines a low-dimensional key subspace $Span(B)$; during training we sample coefficients $α$ and form keys $k=α^\top B$, then inject them into intermediate activations with additive or multiplicative ma… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: 15 pages, 1 figure, multiple tables. Preprint (not yet published in a journal). Affiliation: University of Colorado Boulder. Code: https://github.com/mindmemory-ai/dksc

  2. arXiv:2604.10634  [pdf, ps, other

    cs.CV

    NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Suhang Yao, Beibei Lin, Zhaoxin Fan, Wending Yan, Xin Jin, Zongwei Wu, Bingchen Li, Peishu Shi, Yufei Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Runzhe Li, Kui Jiang, Zhaocheng Yu, Yiang Chen, Junjun Jiang, Xianming Liu, Hongde Gu, Zeliang Li, Mache You , et al. (73 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for train… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: Accepted by CVPR2026 Workshop; NTIRE 2026 Challenge Report

  3. arXiv:2604.10597  [pdf, ps, other

    cs.CV cs.AI

    COREY: A Prototype Study of Entropy-Guided Operator Fusion with Hadamard Reparameterization for Selective State Space Models

    Authors: Bo Ma, Jinsong Wu, Hongjiang Wei, Weiqi Yan

    Abstract: State Space Models (SSMs), represented by the Mamba family, provide linear-time sequence modeling and are attractive for long-context inference. Yet practical deployments remain memory-bandwidth limited because selective state updates are often decomposed into fragmented kernels with repeated intermediate tensor materialization. We present COREY, a prototype framework that combines memory-aware op… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

  4. arXiv:2604.09511  [pdf, ps, other

    cs.CV

    RIRF: Reasoning Image Restoration Framework

    Authors: Wending Yan, Rongkai Zhang, Kaihua Tang, Yu Cheng, Qiankun Liu

    Abstract: Universal image restoration (UIR) aims to recover clean images from diverse and unknown degradations using a unified model. Existing UIR methods primarily focus on pixel reconstruction and often lack explicit diagnostic reasoning over degradation composition, severity, and scene semantics prior to restoration. We propose Reason and Restore (R\&R), a novel framework that integrates structured Chain… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  5. arXiv:2604.06672  [pdf

    cs.CY

    Rhythm-consistent semi-Markov simulation of tourist mobility rhythms with probabilistic event-to-POI assignment: Hakone, Japan

    Authors: Jianhao Shi, Tomio Miwa, Wanglin Yan

    Abstract: Understanding the timing and sequencing of activity participation in tourist mobility is central to travel behavior research, yet GPS trajectories are noisy, irregularly sampled, and only weakly linked to activity locations, which limits interpretation and scenario analysis. We address this by mapping each stay event to candidate points of interest (POIs) probabilistically, using explicit prior-li… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: Preprint. Under review

  6. arXiv:2604.05793  [pdf, ps, other

    cs.CR cs.CV

    BodhiPromptShield: Pre-Inference Prompt Mediation for Suppressing Privacy Propagation in LLM/VLM Agents

    Authors: Bo Ma, Jinsong Wu, Weiqi Yan

    Abstract: In LLM/VLM agents, prompt privacy risk propagates beyond a single model call because raw user content can flow into retrieval queries, memory writes, tool calls, and logs. Existing de-identification pipelines address document boundaries but not this cross-stage propagation. We propose BodhiPromptShield, a policy-aware framework that detects sensitive spans, routes them via typed placeholders, sema… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  7. arXiv:2604.05172  [pdf, ps, other

    cs.AI

    ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

    Authors: Xiangyi Li, Kyoung Whan Choe, Yimin Liu, Xiaokun Chen, Chujun Tao, Bingran You, Wenbo Chen, Zonglin Di, Jiankai Sun, Shenghan Zheng, Jiajun Bao, Yuanli Wang, Weixiang Yan, Yiyuan Li, Han-chung Lee

    Abstract: Large language model (LLM) agents are increasingly deployed to automate productivity tasks (e.g., email, scheduling, document management), but evaluating them on live services is risky due to potentially irreversible changes. Existing benchmarks rely on simplified environments and fail to capture realistic, stateful, multi-service workflows. We introduce ClawsBench, a benchmark for evaluating and… ▽ More

    Submitted 8 April, 2026; v1 submitted 6 April, 2026; originally announced April 2026.

    Comments: 25 pages, 5 figures

  8. arXiv:2604.00985  [pdf, ps, other

    cs.CV

    Maximizing T2-Only Prostate Cancer Localization from Expected Diffusion Weighted Imaging

    Authors: Weixi Yi, Yipei Wang, Wen Yan, Hanyuan Zhang, Natasha Thorley, Alexander Ng, Shonit Punwani, Fernando Bianco, Mark Emberton, Veeru Kasivisvanathan, Dean C. Barratt, Shaheer U. Saeed, Yipeng Hu

    Abstract: Multiparametric MRI is increasingly recommended as a first-line noninvasive approach to detect and localize prostate cancer, requiring at minimum diffusion-weighted (DWI) and T2-weighted (T2w) MR sequences. Early machine learning attempts using only T2w images have shown promising diagnostic performance in segmenting radiologist-annotated lesions. Such uni-modal T2-only approaches deliver substant… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  9. arXiv:2603.29167  [pdf, ps, other

    cs.CV

    CT-to-X-ray Distillation Under Tiny Paired Cohorts: An Evidence-Bounded Reproducible Pilot Study

    Authors: Bo Ma, Jinsong Wu, Weiqi Yan, Hongjiang Wei

    Abstract: Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier w… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  10. arXiv:2603.28618  [pdf, ps, other

    cs.AI

    Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning

    Authors: Ziqi Miao, Haonan Jia, Lijun Li, Chen Qian, Yuan Xiong, Wenting Yan, Jing Shao

    Abstract: Reinforcement learning with verifiable rewards (RLVR) has substantially enhanced the reasoning capabilities of multimodal large language models (MLLMs). However, existing RLVR approaches typically rely on outcome-driven optimization that updates both perception and reasoning using a shared reward based solely on the final answer. This shared reward blurs credit assignment, frequently improving rea… ▽ More

    Submitted 9 April, 2026; v1 submitted 30 March, 2026; originally announced March 2026.

    Comments: 21 pages, 15 figures, 6 tables

  11. From Passersby to Placemaking: Designing Autonomous Vehicle-Pedestrian Encounters for an Urban Shared Space

    Authors: Yiyuan Wang, Martin Tomitsch, Marius Hoggenmüller, Senuri Wijenayake, Wai Yan, Luke Hespanhol

    Abstract: Autonomous vehicles (AVs) tend to disrupt the atmosphere and pedestrian experience in urban shared spaces, undermining the focus of these spaces on people and placemaking. We investigate how external human-machine interfaces (eHMIs) supporting AV-pedestrian interaction can be extended to consider the characteristics of an urban shared space. Inspired by urban HCI, we devised three place-based eHMI… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Journal ref: Multimedia Tools and Applications, 84(21), 24379-24403 (2025)

  12. arXiv:2603.23666  [pdf

    cs.RO physics.app-ph

    Quadrature Oscillation System for Coordinated Motion in Crawling Origami Robot

    Authors: Sean Liu, Ankur Mehta, Wenzhong Yan

    Abstract: Origami-inspired robots offer rapid, accessible design and manufacture with diverse functionalities. In particular, origami robots without conventional electronics have the unique advantage of functioning in extreme environments such as ones with high radiation or large magnetic fields. However, the absence of sophisticated control systems limits these robots to simple autonomous behaviors. In our… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: 8 pages, 11 figures, Accepted to ICRA 2026

  13. arXiv:2603.18012  [pdf, ps, other

    cs.CL cs.AI cs.IR

    DynaRAG: Bridging Static and Dynamic Knowledge in Retrieval-Augmented Generation

    Authors: Penghao Liang, Mengwei Yuan, Jianan Liu, Jing Yang, Xianyou Li, Weiran Yan, Yichao Wu

    Abstract: We present DynaRAG, a retrieval-augmented generation (RAG) framework designed to handle both static and time-sensitive information needs through dynamic knowledge integration. Unlike traditional RAG pipelines that rely solely on static corpora, DynaRAG selectively invokes external APIs when retrieved documents are insufficient for answering a query. The system employs an LLM-based reranker to asse… ▽ More

    Submitted 23 February, 2026; originally announced March 2026.

  14. arXiv:2603.16940  [pdf, ps, other

    eess.IV cs.AI cs.CV

    On the Degrees of Freedom of Gridded Control Points in Learning-Based Medical Image Registration

    Authors: Wen Yan, Qianye Yang, Yipei Wang, Shonit Punwani, Mark Emberton, Vasilis Stavrinides, Yipeng Hu, Dean Barratt

    Abstract: Many registration problems are ill-posed in homogeneous or noisy regions, and dense voxel-wise decoders can be unnecessarily high-dimensional. A sparse control-point parameterisation provides a compact, smooth deformation representation while reducing memory and improving stability. This work investigates the required control points for learning-based registration network development. We present G… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: 27 pages; 8 figures

  15. arXiv:2603.14418  [pdf, ps, other

    cs.CV cs.AI

    Deep EM with Hierarchical Latent Label Modelling for Multi-Site Prostate Lesion Segmentation

    Authors: Wen Yan, Yipei Wang, Shiqi Huang, Natasha Thorley, Mark Emberton, Vasilis Stavrinides, Yipeng Hu, Dean Barratt

    Abstract: Label variability is a major challenge for prostate lesion segmentation. In multi-site datasets, annotations often reflect centre-specific contouring protocols, causing segmentation networks to overfit to local styles and generalise poorly to unseen sites in inference. We treat each observed annotation as a noisy observation of an underlying latent 'clean' lesion mask, and propose a hierarchical e… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: 10 pages, 2 figures

    MSC Class: I.2.0

  16. arXiv:2603.13728  [pdf, ps, other

    cs.CV cs.CR

    Bodhi VLM: Privacy-Alignment Modeling for Hierarchical Visual Representations in Vision Backbones and VLM Encoders via Bottom-Up and Top-Down Feature Search

    Authors: Bo Ma, Wei Qi Yan, Jinsong Wu

    Abstract: Learning systems that preserve privacy often inject noise into hierarchical visual representations; a central challenge is to \emph{model} how such perturbations align with a declared privacy budget in a way that is interpretable and applicable across vision backbones and vision--language models (VLMs). We propose \emph{Bodhi VLM}, a \emph{privacy-alignment modeling} framework for \emph{hierarchic… ▽ More

    Submitted 18 March, 2026; v1 submitted 13 March, 2026; originally announced March 2026.

  17. arXiv:2603.13709  [pdf, ps, other

    cs.CR cs.CV

    REAEDP: Entropy-Calibrated Differentially Private Data Release with Formal Guarantees and Attack-Based Evaluation

    Authors: Bo Ma, Jinsong Wu, Wei Qi Yan

    Abstract: Sensitive data release is vulnerable to output-side privacy threats such as membership inference, attribute inference, and record linkage. This creates a practical need for release mechanisms that provide formal privacy guarantees while preserving utility in measurable ways. We propose REAEDP, a differential privacy framework that combines entropy-calibrated histogram release, a synthetic-data rel… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  18. arXiv:2603.13667  [pdf, ps, other

    cs.CV

    TSDCRF: Balancing Privacy and Multi-Object Tracking via Time-Series CRF and Normalized Control Penalty

    Authors: Bo Ma, Jinsong Wu, Weiqi Yan

    Abstract: Multi-object tracking in video often requires appearance or location cues that can reveal sensitive identity information, while adding privacy-preserving noise typically disrupts cross-frame association and causes ID switches or target loss. We propose TSDCRF, a plug-in refinement framework that balances privacy and tracking by combining three components: (i) $(\varepsilon,δ)$-differential privacy… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  19. arXiv:2603.12719  [pdf, ps, other

    cs.CV cs.AI

    IGASA: Integrated Geometry-Aware and Skip-Attention Modules for Enhanced Point Cloud Registration

    Authors: Dongxu Zhang, Jihua Zhu, Shiqi Li, Wenbiao Yan, Haoran Xu, Peilin Fan, Huimin Lu

    Abstract: Point cloud registration (PCR) is a fundamental task in 3D vision and provides essential support for applications such as autonomous driving, robotics, and environmental modeling. Despite its widespread use, existing methods often fail when facing real-world challenges like heavy noise, significant occlusions, and large-scale transformations. These limitations frequently result in compromised regi… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  20. arXiv:2603.09358  [pdf, ps, other

    cs.CR

    ProvAgent: Threat Detection Based on Identity-Behavior Binding and Multi-Agent Collaborative Attack Investigation

    Authors: Wenhao Yan, Ning An, Linxu Li, Bingsheng Bi, Bo Jiang, Zhigang Lu, Baoxu Liu, Junrong Liu, Cong Dong

    Abstract: Advanced Persistent Threats (APTs) pose critical challenges to modern cybersecurity due to their multi-stage and stealthy nature. While provenance-based detection approaches show promise in capturing causal attack semantics, current threat provenance practices face two paradoxical issues: (1) expert skepticism, where human analysts doubt the capability of traditional detection models to identify c… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

    Comments: The code of ProvAgent is publicly available at \url{https://github.com/Win7ery/ProvAgent}

  21. arXiv:2603.09297  [pdf

    cs.IR cs.CL

    TA-Mem: Tool-Augmented Autonomous Memory Retrieval for LLM in Long-Term Conversational QA

    Authors: Mengwei Yuan, Jianan Liu, Jing Yang, Xianyou Li, Weiran Yan, Yichao Wu, Penghao Liang

    Abstract: Large Language Model (LLM) has exhibited strong reasoning ability in text-based contexts across various domains, yet the limitation of context window poses challenges for the model on long-range inference tasks and necessitates a memory storage system. While many current storage approaches have been proposed with episodic notes and graph representations of memory, retrieval methods still primarily… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  22. arXiv:2603.05143  [pdf, ps, other

    cs.CL cs.LG

    Feature Resemblance: Towards a Theoretical Understanding of Analogical Reasoning in Transformers

    Authors: Ruichen Xu, Wenjing Yan, Ying-Jun Angela Zhang

    Abstract: Understanding reasoning in large language models is complicated by evaluations that conflate multiple reasoning types. We isolate analogical reasoning (inferring shared properties between entities based on known similarities) and analyze its emergence in transformers. We theoretically prove three key results: (1) Joint training on similarity and attribution premises enables analogical reasoning th… ▽ More

    Submitted 22 March, 2026; v1 submitted 5 March, 2026; originally announced March 2026.

  23. arXiv:2603.03961  [pdf, ps, other

    cs.CV

    ProFound: A moderate-sized vision foundation model for multi-task prostate imaging

    Authors: Yipei Wang, Yinsong Xu, Weixi Yi, Shaheer Ullah Saeed, Natasha Thorley, Alexander Ng, Yukun Zhou, Wen Yan, Dean Barratt, Shonit Punwani, Veeru Kasivisvanathan, Mark Emberton, Daniel C. Alexander, Yipeng Hu

    Abstract: Many diagnostic and therapeutic clinical tasks for prostate cancer increasingly rely on multi-parametric MRI. Automating these tasks is challenging because they necessitate expert interpretations, which are difficult to scale to capitalise on modern deep learning. Although modern automated systems achieve expert-level performance in isolated tasks, their general clinical utility remains limited by… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  24. arXiv:2603.03447  [pdf, ps, other

    cs.CV

    Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

    Authors: Weicai Yan, Yuhong Dai, Qi Ran, Haodong Li, Wang Lin, Hao Liao, Xing Xie, Tao Jin, Jianxun Lian

    Abstract: Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) controlling both quality and quantity of generated content to meet real-time constraints. In this work, we instantiate AI companions through two gaming sce… ▽ More

    Submitted 22 March, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

  25. arXiv:2603.01593   

    cs.CV

    PPEDCRF: Privacy-Preserving Enhanced Dynamic CRF for Location-Privacy Protection for Sequence Videos with Minimal Detection Degradation

    Authors: Bo Ma, Jinsong Wu, Weiqi Yan, Catherine Shi, Minh Nguyen

    Abstract: Dashcam videos collected by autonomous or assisted-driving systems are increasingly shared for safety auditing and model improvement. Even when explicit GPS metadata are removed, an attacker can still infer the recording location by matching background visual cues (e.g., buildings and road layouts) against large-scale street-view imagery. This paper studies location-privacy leakage under a backgro… ▽ More

    Submitted 2 April, 2026; v1 submitted 2 March, 2026; originally announced March 2026.

    Comments: We would like to withdraw this paper due to identified issues in the experimental design and insufficient supporting data, which affect the reliability of the reported results. A substantially revised version with corrected experiments and extended evaluations will be prepared and submitted in the future

  26. arXiv:2603.01073  [pdf, ps, other

    cs.CV

    Flow Matching-enabled Test-Time Refinement for Unsupervised Cardiac MR Registration

    Authors: Yunguan Fu, Wenjia Bai, Wen Yan, Matthew J Clarkson, Rhodri Huw Davies, Yipeng Hu

    Abstract: Diffusion-based unsupervised image registration has been explored for cardiac cine MR, but expensive multi-step inference limits practical use. We propose FlowReg, a flow-matching framework in displacement field space that achieves strong registration in as few as two steps and supports further refinement with more steps. FlowReg uses warmup-reflow training: a single-step network first acts as a t… ▽ More

    Submitted 3 March, 2026; v1 submitted 1 March, 2026; originally announced March 2026.

  27. arXiv:2603.00907  [pdf, ps, other

    cs.CL

    KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging

    Authors: Lianjun Liu, Hongli An, Weiqi Yan, Xin Du, Shengchuan Zhang, Huazhong Liu, Yunshan Zhong

    Abstract: The growing computational and memory demands of the Key-Value (KV) cache significantly limit the ability of Large Language Models (LLMs). While KV merging has emerged as a promising solution, existing methods that rely on empirical observations of KV asymmetry and gradient-based Hessian approximations lack a theoretical foundation and incur suboptimal compression and inference overhead. To bridge… ▽ More

    Submitted 8 March, 2026; v1 submitted 28 February, 2026; originally announced March 2026.

  28. arXiv:2603.00846  [pdf, ps, other

    cs.IR cs.LG

    Tiny-Critic RAG: Empowering Agentic Fallback with Parameter-Efficient Small Language Models

    Authors: Yichao Wu, Penghao Liang, Yafei Xiang, Mengwei Yuan, Jianan Liu, Jing Yang, Xianyou Li, Weiran Yan

    Abstract: Retrieval-Augmented Generation (RAG) grounds Large Language Models (LLMs) to mitigate factual hallucinations. Recent paradigms shift from static pipelines to Modular and Agentic RAG frameworks, granting models autonomy for multi-hop reasoning or self-correction. However, current reflective RAG heavily relies on massive LLMs as universal evaluators. In high-throughput systems, executing complete fo… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

  29. arXiv:2602.23945  [pdf, ps, other

    cs.CV cs.AI cs.MM

    PointCoT: A Multi-modal Benchmark for Explicit 3D Geometric Reasoning

    Authors: Dongxu Zhang, Yiding Sun, Pengcheng Li, Yumou Liu, Hongqiang Lin, Haoran Xu, Xiaoxuan Mu, Liang Lin, Wenbiao Yan, Ning Yang, Chaowei Fang, Juanjuan Zhao, Jihua Zhu, Conghui He, Cheng Tan

    Abstract: While Multimodal Large Language Models (MLLMs) demonstrate proficiency in 2D scenes, extending their perceptual intelligence to 3D point cloud understanding remains a significant challenge. Current approaches focus primarily on aligning 3D features with pre-trained models. However, they typically treat geometric reasoning as an implicit mapping process. These methods bypass intermediate logical st… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

  30. arXiv:2602.18735  [pdf, ps, other

    cs.CV cs.RO

    LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency

    Authors: Weilong Yan, Haipeng Li, Hao Xu, Nianjin Ye, Yihao Ai, Shuaicheng Liu, Jingyu Hu

    Abstract: This paper introduces LaS-Comp, a zero-shot and category-agnostic approach that leverages the rich geometric priors of 3D foundation models to enable 3D shape completion across diverse types of partial observations. Our contributions are threefold: First, \ourname{} harnesses these powerful generative priors for completion through a complementary two-stage design: (i) an explicit replacement stage… ▽ More

    Submitted 18 March, 2026; v1 submitted 21 February, 2026; originally announced February 2026.

    Comments: Accepted by CVPR2026

  31. Temporal Consistency-Aware Text-to-Motion Generation

    Authors: Hongsong Wang, Wenjing Yan, Qiuxia Lai, Xin Geng

    Abstract: Text-to-Motion (T2M) generation aims to synthesize realistic human motion sequences from natural language descriptions. While two-stage frameworks leveraging discrete motion representations have advanced T2M research, they often neglect cross-sequence temporal consistency, i.e., the shared temporal structures present across different instances of the same action. This leads to semantic misalignmen… ▽ More

    Submitted 20 February, 2026; originally announced February 2026.

    Comments: Code is on https://github.com/Giat995/TCA-T2M/

    Journal ref: Visual Intelligence, 2026

  32. arXiv:2602.12268  [pdf, ps, other

    cs.AI

    CM2: Reinforcement Learning with Checklist Rewards for Multi-Turn and Multi-Step Agentic Tool Use

    Authors: Zhen Zhang, Kaiqiang Song, Xun Wang, Yebowen Hu, Weixiang Yan, Chenyang Zhao, Henry Peng Zou, Haoyun Deng, Sathish Reddy Indurthi, Shujian Liu, Simin Ma, Xiaoyang Wang, Xin Eric Wang, Song Wang

    Abstract: AI agents are increasingly used to solve real-world tasks by reasoning over multi-turn user interactions and invoking external tools. However, applying reinforcement learning to such settings remains difficult: realistic objectives often lack verifiable rewards and instead emphasize open-ended behaviors; moreover, RL for multi-turn, multi-step agentic tool use is still underexplored; and building… ▽ More

    Submitted 20 February, 2026; v1 submitted 12 February, 2026; originally announced February 2026.

  33. arXiv:2602.11700  [pdf, ps, other

    cs.LG cs.AI

    TabSieve: Explicit In-Table Evidence Selection for Tabular Prediction

    Authors: Yongyao Wang, Ziqi Miao, Lu Yang, Haonan Jia, Wenting Yan, Chen Qian, Lijun Li

    Abstract: Tabular prediction can benefit from in-table rows as few-shot evidence, yet existing tabular models typically perform instance-wise inference and LLM-based prompting is often brittle. Models do not consistently leverage relevant rows, and noisy context can degrade performance. To address this challenge, we propose TabSieve, a select-then-predict framework that makes evidence usage explicit and aud… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

    Comments: 13 pages

  34. arXiv:2602.04159  [pdf, ps, other

    cs.HC

    Paint by Odor: An Exploration of Odor Visualization through Large Language Model and Generative AI

    Authors: Gang Yu, Yuchi Sun, Weining Yan, Xinyu Wang, Qi Lu

    Abstract: Odor visualization translates odor information and perception into visual outcomes and arouses the corresponding olfactory synesthesia, surpassing the spatial limitation that odors can only be perceived where they are present. Traditional odor visualization has typically relied on unidimensional mappings, such as odor-to-color associations, and has required extensive manual design efforts. However… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  35. arXiv:2602.03890  [pdf, ps, other

    cs.CV

    4DPC$^2$hat: Towards Dynamic Point Cloud Understanding with Failure-Aware Bootstrapping

    Authors: Xindan Zhang, Weilong Yan, Yufei Shi, Xuerui Qiu, Tao He, Ying Li, Ming Li, Hehe Fan

    Abstract: Point clouds provide a compact and expressive representation of 3D objects, and have recently been integrated into multimodal large language models (MLLMs). However, existing methods primarily focus on static objects, while understanding dynamic point cloud sequences remains largely unexplored. This limitation is mainly caused by the lack of large-scale cross-modal datasets and the difficulty of m… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  36. arXiv:2601.21750  [pdf, ps, other

    cs.LG

    FISMO: Fisher-Structured Momentum-Orthogonalized Optimizer

    Authors: Chenrui Xu, Wenjing Yan, Ying-Jun Angela Zhang

    Abstract: Training large-scale neural networks requires solving nonconvex optimization where the choice of optimizer fundamentally determines both convergence behavior and computational efficiency. While adaptive methods like Adam have long dominated practice, the recently proposed Muon optimizer achieves superior performance through orthogonalized momentum updates that enforce isotropic geometry with unifo… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

  37. arXiv:2601.20601  [pdf, ps, other

    cs.CV cs.AI

    CLEAR-Mamba:Towards Accurate, Adaptive and Trustworthy Multi-Sequence Ophthalmic Angiography Classification

    Authors: Zhuonan Wang, Wenjie Yan, Wenqiao Zhang, Xiaohui Song, Jian Ma, Ke Yao, Yibo Yu, Beng Chin Ooi

    Abstract: Medical image classification is a core task in computer-aided diagnosis (CAD), playing a pivotal role in early disease detection, treatment planning, and patient prognosis assessment. In ophthalmic practice, fluorescein fundus angiography (FFA) and indocyanine green angiography (ICGA) provide hemodynamic and lesion-structural information that conventional fundus photography cannot capture. However… ▽ More

    Submitted 10 March, 2026; v1 submitted 28 January, 2026; originally announced January 2026.

    Comments: 12 pages, 7 figures

  38. arXiv:2601.13879  [pdf, ps, other

    cs.MM cs.CL cs.CV

    Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring

    Authors: Dongxu Zhang, Yiding Sun, Cheng Tan, Wenbiao Yan, Ning Yang, Jihua Zhu, Haijun Zhang

    Abstract: While Chain-of-Thought (CoT) reasoning significantly enhances the performance of Multimodal Large Language Models (MLLMs), its autoregressive nature incurs prohibitive latency constraints. Current efforts to mitigate this via token compression often fail by blindly applying text-centric metrics to multimodal contexts. We identify a critical failure mode termed Visual Amnesia, where linguistically… ▽ More

    Submitted 11 March, 2026; v1 submitted 20 January, 2026; originally announced January 2026.

  39. arXiv:2601.08602  [pdf, ps, other

    cs.CV cs.AI

    WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

    Authors: Zishan Shu, Juntong Wu, Wei Yan, Xudong Liu, Hongyu Zhang, Chang Liu, Youdong Mao, Jie Chen

    Abstract: Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially. We revisit this problem from a wave-based perspective: feature maps are treated as spatial signals whose evolution over an internal propagation time (aligned with network depth) is governed by an underdamped w… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

  40. arXiv:2512.07114  [pdf

    cs.RO eess.SY

    Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots

    Authors: Jue Wang, Mingsong Jiang, Luis A. Ramirez, Bilige Yang, Mujun Zhang, Esteban Figueroa, Wenzhong Yan, Rebecca Kramer-Bottiglio

    Abstract: Adaptive morphogenetic robots adapt their morphology and control policies to meet changing tasks and environmental conditions. Many such systems leverage soft components, which enable shape morphing but also introduce simulation and control challenges. Soft-body simulators remain limited in accuracy and computational tractability, while rigid-body simulators cannot capture soft-material dynamics.… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

  41. arXiv:2512.05905  [pdf, ps, other

    cs.CV

    SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

    Authors: Wenhao Yan, Sheng Ye, Zhuoyi Yang, Jiayan Teng, ZhenHui Dong, Kairui Wen, Xiaotao Gu, Yong-Jin Liu, Jie Tang

    Abstract: Achieving controllable character animation that meets studio-grade standards remains challenging despite recent progress. Existing approaches can transfer motion from a driving video to a reference image, but often fail to preserve structural fidelity and temporal consistency in wild scenarios involving complex motion and cross-identity animations. In this work, we present \textbf{SCAIL} (a framew… ▽ More

    Submitted 23 March, 2026; v1 submitted 5 December, 2025; originally announced December 2025.

  42. arXiv:2511.17943  [pdf, ps, other

    cs.CV

    SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System

    Authors: Zhiyu Xu, Weilong Yan, Yufei Shi, Xin Meng, Tao He, Huiping Zhuang, Ming Li, Hehe Fan

    Abstract: Recent advancements in multimodal large language models (MLLMs) and video agent systems have significantly improved general video understanding. However, when applied to scientific video understanding and educating, a domain that demands external professional knowledge integration and rigorous step-wise reasoning, existing approaches often struggle. To bridge this gap, we propose SciEducator, the… ▽ More

    Submitted 22 November, 2025; originally announced November 2025.

  43. arXiv:2511.13733  [pdf, ps, other

    eess.SP cs.LG q-bio.NC

    THD-BAR: Topology Hierarchical Derived Brain Autoregressive Modeling for EEG Generic Representations

    Authors: Wenchao Yang, Weidong Yan, Wenkang Liu, Yulan Ma, Yang Li

    Abstract: Large-scale pre-trained models hold significant potential for learning universal EEG representations. However, most existing methods, particularly autoregressive (AR) frameworks, primarily rely on straightforward temporal sequencing of multi-channel EEG data, which fails to capture the rich physiological characteristics inherent to EEG signals. Moreover, their time-centered modeling approach also… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  44. arXiv:2511.13110  [pdf, ps, other

    cs.CV

    Learning Implicit Neural Degradation Representation for Unpaired Image Dehazing

    Authors: Shuaibin Fan, Senming Zhong, Wenchao Yan, Minglong Xue

    Abstract: Image dehazing is an important task in the field of computer vision, aiming at restoring clear and detail-rich visual content from haze-affected images. However, when dealing with complex scenes, existing methods often struggle to strike a balance between fine-grained feature representation of inhomogeneous haze distribution and global consistency modeling. Furthermore, to better learn the common… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  45. arXiv:2511.06722  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View

    Authors: Jianyu Qi, Ding Zou, Wenrui Yan, Rui Ma, Jiaxu Li, Zhijie Zheng, Zhiguo Yang, Rongchang Zhao

    Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have spurred significant progress in Chain-of-Thought (CoT) reasoning. Building on the success of Deepseek-R1, researchers extended multimodal reasoning to post-training paradigms based on reinforcement learning (RL), focusing predominantly on mathematical datasets. However, existing post-training paradigms tend to neglect two critical as… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

    Comments: Accpeted by AAAI 2026

  46. arXiv:2511.06250  [pdf, ps, other

    cs.LG cs.CV

    Test-Time Iterative Error Correction for Efficient Diffusion Models

    Authors: Yunshan Zhong, Weiqi Yan, Yuxin Zhang

    Abstract: With the growing demand for high-quality image generation on resource-constrained devices, efficient diffusion models have received increasing attention. However, such models suffer from approximation errors introduced by efficiency techniques, which significantly degrade generation quality. Once deployed, these errors are difficult to correct, as modifying the model is typically infeasible in dep… ▽ More

    Submitted 9 February, 2026; v1 submitted 9 November, 2025; originally announced November 2025.

    Comments: Accepted by ICLR 2026

  47. Robustness study of the bio-inspired musculoskeletal arm robot based on the data-driven iterative learning algorithm

    Authors: Jianbo Yuan, Jing Dai, Yerui Fan, Yaxiong Wu, Yunpeng Liang, Weixin Yan

    Abstract: The human arm exhibits remarkable capabilities, including both explosive power and precision, which demonstrate dexterity, compliance, and robustness in unstructured environments. Developing robotic systems that emulate human-like operational characteristics through musculoskeletal structures has long been a research focus. In this study, we designed a novel lightweight tendon-driven musculoskelet… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: 20 pages, 13 figures

    Journal ref: SCIENCE CHINA Information Sciences 2025, 68(12): 222203

  48. arXiv:2511.02314  [pdf, ps, other

    cs.LG physics.med-ph

    Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning

    Authors: Jueye Zhang, Chao Yang, Youfang Lai, Kai-Wen Li, Wenting Yan, Yunzhou Xia, Haimei Zhang, Jingjing Zhou, Gen Yang, Chen Lin, Tian Li, Yibao Zhang

    Abstract: Head-and-neck cancer (HNC) planning is difficult because multiple critical organs-at-risk (OARs) are close to complex targets. Intensity-modulated carbon-ion therapy (IMCT) offers superior dose conformity and OAR sparing but remains slow due to relative biological effectiveness (RBE) modeling, leading to laborious, experience-based, and often suboptimal tuning of many treatment-planning parameters… ▽ More

    Submitted 4 November, 2025; originally announced November 2025.

  49. arXiv:2510.24727  [pdf, ps, other

    cs.CE cs.LG

    Stiff Circuit System Modeling via Transformer

    Authors: Weiman Yan, Yi-Chia Chang, Wanyu Zhao

    Abstract: Accurate and efficient circuit behavior modeling is a cornerstone of modern electronic design automation. Among different types of circuits, stiff circuits are challenging to model using previous frameworks. In this work, we propose a new approach using Crossformer, which is a current state-of-the-art Transformer model for time-series prediction tasks, combined with Kolmogorov-Arnold Networks (KAN… ▽ More

    Submitted 23 March, 2026; v1 submitted 5 October, 2025; originally announced October 2025.

  50. arXiv:2510.24288  [pdf, ps, other

    math.OC cs.LG stat.ML

    Problem-Parameter-Free Decentralized Bilevel Optimization

    Authors: Zhiwei Zhai, Wenjing Yan, Ying-Jun Angela Zhang

    Abstract: Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior knowledge of problem parameters-such as smoothness, convexity, or communication network topologies-to determine appropriate stepsizes. In practice, these problem parameters are typically unavailable, leading t… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: Accepted by NeurIPS 2025