Skip to main content

Showing 1–50 of 1,415 results for author: Cheng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.08159  [pdf, ps, other

    cs.CV cs.AI

    Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection

    Authors: Yushuo Zhang, Yu Cheng, Yongkang Hu, Jiuan Zhou, Jiawei Chen, Yuan Xie, Zhaoxia Yin

    Abstract: The rapid advancement of facial forgery techniques poses severe threats to public trust and information security, making facial DeepFake detection a critical research priority. Continual learning provides an effective approach to adapt facial DeepFake detection models to evolving forgery patterns. However, existing methods face two key bottlenecks in real-world continual learning scenarios: insuff… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  2. arXiv:2604.08044  [pdf, ps, other

    cs.AR

    A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

    Authors: Cong Li, Chenhao Xue, Yi Ren, Xiping Dong, Yu Cheng, Yinbo Hu, Fujun Bai, Yixin Guo, Xiping Jiang, Qiang Wu, Zhi Yang, Zhe Cheng, Yuan Xie, Guangyu Sun

    Abstract: Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this emerging technology provides strong performance gains over existing hardware, current 3D-DRAM accelerators (3D-Accelerators) rely on closed-source evaluation tools… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  3. arXiv:2604.06696  [pdf, ps, other

    cs.AI

    AgentGate: A Lightweight Structured Routing Engine for the Internet of Agents

    Authors: Yujun Cheng, Enfang Cui, Hao Qin, Zhiyuan Liang, Qi Xu

    Abstract: The rapid development of AI agent systems is leading to an emerging Internet of Agents, where specialized agents operate across local devices, edge nodes, private services, and cloud platforms. Although recent efforts have improved agent naming, discovery, and interaction, efficient request dispatch remains an open systems problem under latency, privacy, and cost constraints. In this paper, we pre… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  4. arXiv:2604.05081  [pdf, ps, other

    cs.AI

    MedGemma 1.5 Technical Report

    Authors: Andrew Sellergren, Chufan Gao, Fereshteh Mahvar, Timo Kohlberger, Fayaz Jamil, Madeleine Traverse, Alberto Tono, Bashir Sadjad, Lin Yang, Charles Lau, Liron Yatziv, Tiffany Chen, Bram Sterling, Kenneth Philbrick, Richa Tiwari, Yun Liu, Madhuram Jajoo, Chandrashekar Sankarapu, Swapnil Vispute, Harshad Purandare, Abhishek Bijay Mishra, Sam Schmidgall, Tao Tu, Anil Palepu, Chunjong Park , et al. (17 additional authors not shown)

    Abstract: We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical localization via bounding boxes, multi-timepoint chest X-ray analysis, and improved medical document understanding (lab reports, electronic health rec… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  5. arXiv:2604.04750  [pdf, ps, other

    cs.AR cs.DC

    DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

    Authors: Zhiwen Mo, Guoyu Li, Hao Mark Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan

    Abstract: Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. As LLMs scale to hundreds of billions or trillions of parameters, distributed inference across multiple 3D chips becomes essential. With cross-stack co-design increasingly critical, we propose DeepStack, an accurate and efficient performance model and too… ▽ More

    Submitted 9 April, 2026; v1 submitted 6 April, 2026; originally announced April 2026.

    Comments: fix typo

  6. arXiv:2604.04658  [pdf, ps, other

    cs.CV

    Synthesis4AD: Synthetic Anomalies are All You Need for 3D Anomaly Detection

    Authors: Yihan Sun, Yuqi Cheng, Junjie Zu, Yuxiang Tan, Guoyang Xie, Yucheng Wang, Yunkang Cao, Weiming Shen

    Abstract: Industrial 3D anomaly detection performance is fundamentally constrained by the scarcity and long-tailed distribution of abnormal samples. To address this challenge, we propose Synthesis4AD, an end-to-end paradigm that leverages large-scale, high-fidelity synthetic anomalies to learn more discriminative representations for 3D anomaly detection. At the core of Synthesis4AD is 3D-DefectStudio, a sof… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  7. arXiv:2604.04503  [pdf, ps, other

    cs.AI cs.MA

    Memory Intelligence Agent

    Authors: Jingyang Qiao, Weicheng Meng, Yu Cheng, Zhihang Lin, Zhizhong Zhang, Xin Tan, Jingyu Gong, Kun Shao, Yuan Xie

    Abstract: Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key limitations of ineffective memory evolution and increasing storage and retrieval c… ▽ More

    Submitted 7 April, 2026; v1 submitted 6 April, 2026; originally announced April 2026.

  8. arXiv:2604.03198  [pdf, ps, other

    cs.CV

    The Eleventh NTIRE 2026 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Yan Shu, Jiaqi Ma, Ziteng Cui, Shuhong Liu, Guofeng Mei, Lei Sun, Zongwei Wu, Fahad Shahbaz Khan, Salman Khan, Radu Timofte, Yawei Li, Hongyuan Yu, Pufan Xu, Chen Wu, Long Peng, Jiaojiao Yi, Siyang Yi, Yuning Cui, Jingyuan Xia, Xing Mou, Keji He, Jinlin Wu, Zongang Gao , et al. (38 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2026 challenge on efficient single-image super-resolution with a focus on the proposed solutions and results. The aim of this challenge is to devise a network that reduces one or several aspects, such as runtime, parameters, and FLOPs, while maintaining PSNR of around 26.90 dB on the DIV2K_LSDIR_valid dataset, and 26.99 dB on the DIV2K_LSDIR_test dataset. The challenge… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: CVPR 2026 NTIRE Workshop Paper, Efficient Super Resolution Technical Report

  9. arXiv:2604.02355  [pdf, ps, other

    cs.LG cs.CV

    From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

    Authors: Han Song, Yucheng Zhou, Jianbing Shen, Yu Cheng

    Abstract: Combining Chain-of-Thought (CoT) with Reinforcement Learning (RL) improves text-to-image (T2I) generation, yet the underlying interaction between CoT's exploration and RL's optimization remains unclear. We present a systematic entropy-based analysis that yields three key insights: (1) CoT expands the generative exploration space, while RL contracts it toward high-reward regions; (2) final reward i… ▽ More

    Submitted 12 March, 2026; originally announced April 2026.

  10. arXiv:2604.01988  [pdf, ps, other

    cs.AI

    SenseMath: Do LLMs Have Number Sense? Evaluating Shortcut Use, Judgment, and Generation

    Authors: Haomin Zhuang, Xiangqi Wang, Yili Shen, Ying Cheng, Xiangliang Zhang

    Abstract: Large language models often default to step-by-step computation even when efficient numerical shortcuts are available. This raises a basic question: do they exhibit number sense in a human-like behavioral sense, i.e., the ability to recognize numerical structure, apply shortcuts when appropriate, and avoid them when they are not? We introduce SenseMath, a controlled benchmark for evaluating struct… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  11. arXiv:2603.29620  [pdf, ps, other

    cs.CV cs.MM

    Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

    Authors: Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, Dasen Dai, Bolin Jiang, Manyuan Zhang, Shi-Xue Zhang, Zhengkai Jiang, Lucas Wang, Zhao Zhong, Yu Cheng, Nanyun Peng

    Abstract: Unified multimodal models provide a natural and promising architecture for understanding diverse and complex real-world knowledge while generating high-quality images. However, they still rely primarily on frozen parametric knowledge, which makes them struggle with real-world image generation involving long-tail and knowledge-intensive concepts. Inspired by the broad success of agents on real-worl… ▽ More

    Submitted 1 April, 2026; v1 submitted 31 March, 2026; originally announced March 2026.

    Comments: Project Page: https://github.com/shawn0728/Unify-Agent

  12. arXiv:2603.28088  [pdf, ps, other

    cs.CV

    GEMS: Agent-Native Multimodal Generation with Memory and Skills

    Authors: Zefeng He, Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Yu Cheng, Yang Yang

    Abstract: Recent multimodal generation models have achieved remarkable progress on general-purpose generation tasks, yet continue to struggle with complex instructions and specialized downstream tasks. Inspired by the success of advanced agent frameworks such as Claude Code, we propose \textbf{GEMS} (Agent-Native Multimodal \textbf{GE}neration with \textbf{M}emory and \textbf{S}kills), a framework that push… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: Project Page: https://gems-gen.github.io

  13. arXiv:2603.27991  [pdf, ps, other

    cs.HC cs.AI

    ViviDoc: Generating Interactive Documents through Human-Agent Collaboration

    Authors: Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Jiale Lao, Yue Cheng, Wei Chen

    Abstract: Interactive documents help readers engage with complex ideas through dynamic visualization, interactive animations, and exploratory interfaces. However, creating such documents remains costly, as it requires both domain expertise and web development skills. Recent Large Language Model (LLM)-based agents can automate content creation, but directly applying them to interactive document generation of… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

  14. arXiv:2603.27965  [pdf, ps, other

    cs.CV

    ExFusion: Efficient Transformer Training via Multi-Experts Fusion

    Authors: Jiacheng Ruan, Daize Dong, Xiaoye Qu, Tong Zhu, Ting Liu, Yuzhuo Fu, Yu Cheng, Suncheng Xiang

    Abstract: Mixture-of-Experts (MoE) models substantially improve performance by increasing the capacity of dense architectures. However, directly training MoE models requires considerable computational resources and introduces extra overhead in parameter storage and deployment. Therefore, it is critical to develop an approach that leverages the multi-expert capability of MoE to enhance performance while incu… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: Accepted by IEEE TMM2026

  15. arXiv:2603.27765  [pdf, ps, other

    cs.AI

    Let the Agent Steer: Closed-Loop Ranking Optimization via Influence Exchange

    Authors: Yin Cheng, Liao Zhou, Xiyu Liang, Dihao Luo, Tewei Lee, Kailun Zheng, Weiwei Zhang, Mingchen Cai, Jian Dong, Andy Zhang

    Abstract: Recommendation ranking is fundamentally an influence allocation problem: a sorting formula distributes ranking influence among competing factors, and the business outcome depends on finding the optimal "exchange rates" among them. However, offline proxy metrics systematically misjudge how influence reallocation translates to online impact, with asymmetric bias across metrics that a single calibrat… ▽ More

    Submitted 9 April, 2026; v1 submitted 29 March, 2026; originally announced March 2026.

  16. arXiv:2603.27499  [pdf, ps, other

    cs.SC

    A Dataset of Nonlinear Equations for Subdivision

    Authors: Juan Xu, Huilong Lai, Yingying Cheng, Wenqiang Yang, Changbo Chen

    Abstract: In this paper, we report on the largest labelled dataset constructed so far for solving zero-dimensional square nonlinear systems with subdivision-based methods. A brief, non-exhaustive survey with emphasis on the literature from the past two decades is also provided to accompany with the dataset. The value of the dataset has been demonstrated through benchmarking several solvers as well as being… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: 49 pages, 11 figures

  17. arXiv:2603.27460  [pdf, ps, other

    cs.CV cs.AI

    Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

    Authors: Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou , et al. (102 additional authors not shown)

    Abstract: Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: 157 pages, 19 figures, 26 tables. Project repo: \url{https://github.com/uni-medical/Project-Imaging-X}

  18. arXiv:2603.26722  [pdf, ps, other

    cs.NE cs.AI cs.AR cs.OS

    Brain-inspired AI for Edge Intelligence: a systematic review

    Authors: Yingchao Cheng, Meijia Wang, Zhifeng Hao, Rajkumar Buyya

    Abstract: While Spiking Neural Networks (SNNs) promise to circumvent the severe Size, Weight, and Power (SWaP) constraints of edge intelligence, the field currently faces a "Deployment Paradox" where theoretical energy gains are frequently negated by the inefficiencies of mapping asynchronous, event-driven dynamics onto traditional von Neumann substrates. Transcending the reductionism of algorithm-only revi… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  19. arXiv:2603.25388  [pdf, ps, other

    cs.CV

    Multimodal Dataset Distillation via Phased Teacher Models

    Authors: Shengbin Guo, Hang Zhao, Senqiao Yang, Chenyang Jiang, Yuhang Cheng, Xiangru Peng, Rui Shao, Zhuotao Tian

    Abstract: Multimodal dataset distillation aims to construct compact synthetic datasets that enable efficient compression and knowledge transfer from large-scale image-text data. However, existing approaches often fail to capture the complex, dynamically evolving knowledge embedded in the later training stages of teacher models. This limitation leads to degraded student performance and compromises the qualit… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: Accepted to ICLR 2026

  20. arXiv:2603.25218  [pdf, ps, other

    cs.CV

    SDD-YOLO: A Small-Target Detection Framework for Ground-to-Air Anti-UAV Surveillance with Edge-Efficient Deployment

    Authors: Pengyu Chen, Haotian Sa, Yiwei Hu, Yuhan Cheng, Junbo Wang

    Abstract: Detecting small unmanned aerial vehicles (UAVs) from a ground-to-air (G2A) perspective presents significant challenges, including extremely low pixel occupancy, cluttered aerial backgrounds, and strict real-time constraints. Existing YOLO-based detectors are primarily optimized for general object detection and often lack adequate feature resolution for sub-pixel targets, while introducing complexi… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  21. arXiv:2603.25040  [pdf, ps, other

    cs.LG cs.CL cs.CV

    Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

    Authors: Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang, Wenlong Zhang, Bo Zhang, Chao Zhang , et al. (152 additional authors not shown)

    Abstract: We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertis… ▽ More

    Submitted 2 April, 2026; v1 submitted 26 March, 2026; originally announced March 2026.

  22. arXiv:2603.24500  [pdf, ps, other

    cs.LG physics.flu-dyn

    Project and Generate: Divergence-Free Neural Operators for Incompressible Flows

    Authors: Xigui Li, Hongwei Zhang, Ruoxi Jiang, Deshu Chen, Chensen Lin, Limei Han, Yuan Qi, Xin Guo, Yuan Cheng

    Abstract: Learning-based models for fluid dynamics often operate in unconstrained function spaces, leading to physically inadmissible, unstable simulations. While penalty-based methods offer soft regularization, they provide no structural guarantees, resulting in spurious divergence and long-term collapse. In this work, we introduce a unified framework that enforces the incompressible continuity equation as… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

  23. arXiv:2603.23104  [pdf, ps, other

    cs.CV

    NeuroSeg Meets DINOv3: Transferring 2D Self-Supervised Visual Priors to 3D Neuron Segmentation via DINOv3 Initialization

    Authors: Yik San Cheng, Runkai Zhao, Weidong Cai

    Abstract: 2D visual foundation models, such as DINOv3, a self-supervised model trained on large-scale natural images, have demonstrated strong zero-shot generalization, capturing both rich global context and fine-grained structural cues. However, an analogous 3D foundation model for downstream volumetric neuroimaging remains lacking, largely due to the challenges of 3D image acquisition and the scarcity of… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: 17 pages, 12 figures, and 11 tables. Accepted to CVPR 2026

  24. arXiv:2603.21169  [pdf, ps, other

    cs.LG

    Model Evolution Under Zeroth-Order Optimization: A Neural Tangent Kernel Perspective

    Authors: Chen Zhang, Yuxin Cheng, Chenchen Ding, Shuqi Wang, Jingreng Lei, Runsheng Yu, Yik-Chung WU, Ngai Wong

    Abstract: Zeroth-order (ZO) optimization enables memory-efficient training of neural networks by estimating gradients via forward passes only, eliminating the need for backpropagation. However, the stochastic nature of gradient estimation significantly obscures the training dynamics, in contrast to the well-characterized behavior of first-order methods under Neural Tangent Kernel (NTK) theory. To address th… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: ICLR 2026 Workshop on Scientific Methods for Understanding Deep Learning (20 pages, 18 figures)

  25. arXiv:2603.19583  [pdf, ps, other

    cs.SE cs.AI

    Skilled AI Agents for Embedded and IoT Systems Development

    Authors: Yiming Li, Yuhan Cheng, Mingchen Ma, Yihang Zou, Ningyuan Yang, Wei Cheng, Hai "Helen" Li, Yiran Chen, Tingjun Chen

    Abstract: Large language models (LLMs) and agentic systems have shown promise for automated software development, but applying them to hardware-in-the-loop (HIL) embedded and Internet-of-Things (IoT) systems remains challenging due to the tight coupling between software logic and physical hardware behavior. Code that compiles successfully may still fail when deployed on real devices because of timing constr… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  26. arXiv:2603.19286  [pdf, ps, other

    q-fin.ST cs.AI cs.CL cs.LG

    Generalized Stock Price Prediction for Multiple Stocks Combined with News Fusion

    Authors: Pei-Jun Liao, Hung-Shin Lee, Yao-Fei Cheng, Li-Wei Chen, Hung-yi Lee, Hsin-Min Wang

    Abstract: Predicting stock prices presents challenges in financial forecasting. While traditional approaches such as ARIMA and RNNs are prevalent, recent developments in Large Language Models (LLMs) offer alternative methodologies. This paper introduces an approach that integrates LLMs with daily financial news for stock price prediction. To address the challenge of processing news data and identifying rele… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

    Comments: Accepted to Journal of Information Science and Engineering (JISE)

  27. arXiv:2603.18979  [pdf, ps, other

    cs.RO cs.AI

    PRIOR: Perceptive Learning for Humanoid Locomotion with Reference Gait Priors

    Authors: Chenxi Han, Shilu He, Yi Cheng, Linqi Ye, Houde Liu

    Abstract: Training perceptive humanoid locomotion policies that traverse complex terrains with natural gaits remains an open challenge, typically demanding multi-stage training pipelines, adversarial objectives, or extensive real-world calibration. We present PRIOR, an efficient and reproducible framework built on Isaac Lab that achieves robust terrain traversal with human-like gaits through a simple yet ef… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: https://prior-iros2026.github.io/

  28. arXiv:2603.13963  [pdf, ps, other

    cs.SE

    SeqTG: Scalable Combinatorial Test Generation via Sequential Integer Linear Programming

    Authors: Sitong Yang, Wanying Bao, Yinyin Song, Yueting Cheng, Qian Li, Chao Wei

    Abstract: Combinatorial Testing (CT) is essential for detecting interaction-triggered faults, yet generating minimal Covering Arrays under complex constraints remains an unresolved NP-hard challenge. Current greedy algorithms are highly scalable but suffer from severe ``diminishing returns'': they efficiently cover initial interactions but produce bloated, redundant test suites when struggling to pack the f… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

  29. arXiv:2603.13725  [pdf, ps, other

    cs.CL

    Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality

    Authors: Taiqiang Wu, Yuxin Cheng, Chenchen Ding, Runming Yang, Xincheng Feng, Wenyong Zhou, Zhengwu Liu, Ngai Wong

    Abstract: Memristor-based analog compute-in-memory (CIM) architectures provide a promising substrate for the efficient deployment of Large Language Models (LLMs), owing to superior energy efficiency and computational density. However, these architectures suffer from precision issues caused by intrinsic non-idealities of memristors. In this paper, we first conduct a comprehensive investigation into the impac… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: 7 figures, 3 tables

  30. arXiv:2603.13617  [pdf, ps, other

    cs.LG cs.CE

    Privacy-Preserving Federated Fraud Detection in Payment Transactions with NVIDIA FLARE

    Authors: Holger R. Roth, Sarthak Tickoo, Mayank Kumar, Isaac Yang, Andrew Liu, Amit Varshney, Sayani Kundu, Iustina Vintila, Peter Madsgaard, Juraj Milcak, Chester Chen, Yan Cheng, Andrew Feng, Jeff Savio, Vikram Singh, Craig Stancill, Gloria Wan, Evan Powell, Anwar Ul Haq, Sudhir Upadhyay, Jisoo Lee

    Abstract: Fraud-related financial losses continue to rise, while regulatory, privacy, and data-sovereignty constraints increasingly limit the feasibility of centralized fraud detection systems. Federated Learning (FL) has emerged as a promising paradigm for enabling collaborative model training across institutions without sharing raw transaction data. Yet, its practical effectiveness under realistic, non-II… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: 16 pages, 6 figures, 5 tables, technical report

  31. arXiv:2603.13406  [pdf, ps, other

    cs.CV cs.AI

    Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

    Authors: Liang Tang, Hongda Li, Jiayu Zhang, Long Chen, Shuxian Li, Siqi Pei, Tiaonan Duan, Yuhao Cheng

    Abstract: Emotion recognition in videos is a pivotal task in affective computing, where identifying subtle psychological states such as Ambivalence and Hesitancy holds significant value for behavioral intervention and digital health. Ambivalence and Hesitancy states often manifest through cross-modal inconsistencies such as discrepancies between facial expressions, vocal tones, and textual semantics, posing… ▽ More

    Submitted 23 March, 2026; v1 submitted 12 March, 2026; originally announced March 2026.

    Comments: 5 pages, 1 figures

  32. arXiv:2603.12971  [pdf

    cs.HC

    Generative Horcrux: Designing AI Carriers for Afterlife Selves

    Authors: Zhen-Chi Lai, Yu-Ting Cheng, Pei-Ying Lin, Chiao-Wei Ho, Janet Yi-Ching Huang

    Abstract: As generative AI technologies rapidly advance, AI agents are gaining the ability not only to collect data and perform tasks but also to respond to environments and evolve over time. This shift opens new possibilities for reimagining digital legacy - raising critical questions about how we remember, commemorate, and interact with the traces of the deceased. The forms of these AI agents are particul… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: 6 pages

    Journal ref: Proceedings of IASDR 2025: Design Next

  33. arXiv:2603.11627  [pdf, ps, other

    cs.CV

    Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

    Authors: Yichi Zhang, Le Xue, Wenbo Zhang, Lanlan Li, Feiyang Xiao, Yuchen Liu, Xiaohui Zhang, Hongwei Zhang, Shuqi Wang, Gang Feng, Liling Peng, Xin Gao, Yuanfan Xu, Yuan Qi, Kuangyu Shi, Hong Zhang, Yuan Cheng, Mei Tian, Zixin Hu

    Abstract: Positron emission tomography (PET) is a key nuclear medicine imaging modality that visualizes radiotracer distributions to quantify in vivo physiological and metabolic processes, playing an irreplaceable role in disease management. Despite its clinical importance, the development of deep learning models for quantitative PET image analysis remains severely limited, driven by both the inherent segme… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  34. arXiv:2603.10444  [pdf, ps, other

    cs.LG cs.AI

    The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

    Authors: Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fanqi Yu, Ruijun Huang, Fang Dong, Xin Zhang, Jixian Zhou, Anrui Chen, Mingzhi Dong, Yujiang Wang, Jinlong Hou, Qin Lv, Yuan Cheng, Tun Lu, Fan Yang, Li Shang

    Abstract: Large language models trained on natural language exhibit pronounced anisotropy: a small number of directions concentrate disproportionate energy, while the remaining dimensions form a broad semantic tail. In low-bit training regimes, this geometry becomes numerically unstable. Because blockwise quantization scales are determined by extreme elementwise magnitudes, dominant directions stretch the d… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  35. arXiv:2603.08823  [pdf, ps, other

    cs.SD cs.AI cs.CL

    Fish Audio S2 Technical Report

    Authors: Shijia Liao, Yuxuan Wang, Songting Liu, Yifan Cheng, Ruoyi Zhang, Tianyu Li, Shidong Li, Yisheng Zheng, Xingwei Liu, Qingzheng Wang, Zhizhuo Zhou, Jiahua Liu, Xin Chen, Dawei Han

    Abstract: We introduce Fish Audio S2, an open-sourced text-to-speech system featuring multi-speaker, multi-turn generation, and, most importantly, instruction-following control via natural-language descriptions. To scale training, we develop a multi-stage training recipe together with a staged data pipeline covering video captioning and speech captioning, voice-quality assessment, and reward modeling. To pu… ▽ More

    Submitted 11 March, 2026; v1 submitted 9 March, 2026; originally announced March 2026.

  36. arXiv:2603.08342  [pdf, ps, other

    cs.RO

    PhaForce: Phase-Scheduled Visual-Force Policy Learning with Slow Planning and Fast Correction for Contact-Rich Manipulation

    Authors: Mingxin Wang, Zhirun Yue, Renhao Lu, Yizhe Li, Zihan Wang, Guoping Pan, Kangkang Dong, Jun Cheng, Yi Cheng, Houde Liu

    Abstract: Contact-rich manipulation requires not only vision-dominant task semantics but also closed-loop reactions to force/torque (F/T) transients. Yet, generative visuomotor policies are typically constrained to low-frequency updates due to inference latency and action chunking, underutilizing F/T for control-rate feedback. Furthermore, existing force-aware methods often inject force continuously and ind… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  37. arXiv:2603.08258  [pdf, ps, other

    cs.CV

    WaDi: Weight Direction-aware Distillation for One-step Image Synthesis

    Authors: Lei Wang, Yang Cheng, Senmao Li, Ge Wu, Yaxing Wang, Jian Yang

    Abstract: Despite the impressive performance of diffusion models such as Stable Diffusion (SD) in image generation, their slow inference limits practical deployment. Recent works accelerate inference by distilling multi-step diffusion into one-step generators. To better understand the distillation mechanism, we analyze U-Net/DiT weight changes between one-step students and their multi-step teacher counterpa… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026;Code:https://github.com/gudaochangsheng/WaDi

  38. arXiv:2603.08117  [pdf, ps, other

    cs.AI cs.IR

    UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking

    Authors: Chang Liu, Chuqiao Kuang, Tianyi Zhuang, Yuxin Cheng, Huichi Zhou, Xiaoguang Li, Lifeng Shang

    Abstract: Recent advancements in LLM-based information-seeking agents have achieved record-breaking performance on established benchmarks. However, these agents remain heavily reliant on search-engine-indexed knowledge, leaving a critical blind spot: Unindexed Information Seeking (UIS). This paper identifies and explores the UIS problem, where vital information is not captured by search engine crawlers, suc… ▽ More

    Submitted 17 March, 2026; v1 submitted 9 March, 2026; originally announced March 2026.

    Comments: 21 pages, 5 figures, ICLR 2026

  39. arXiv:2603.06985  [pdf, ps, other

    cs.CV

    Perception-Aware Multimodal Spatial Reasoning from Monocular Images

    Authors: Yanchun Cheng, Rundong Wang, Xulei Yang, Alok Prakash, Daniela Rus, Marcelo H Ang Jr, ShiJie Li

    Abstract: Spatial reasoning from monocular images is essential for autonomous driving, yet current Vision-Language Models (VLMs) still struggle with fine-grained geometric perception, particularly under large scale variation and ambiguous object appearance. We propose a simple yet effective perception-aware multimodal reasoning framework that equips VLMs with explicit object-centric grounding ability. Inste… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  40. arXiv:2603.06065  [pdf, ps, other

    cs.IR

    ChatShopBuddy: Towards Reliable Conversational Shopping Agents via Reinforcement Learning

    Authors: Yiruo Cheng, Kelong Mao, Tianhao Li, Jiejun Tan, Ji-Rong Wen, Zhicheng Dou

    Abstract: Conversational shopping agents represent a critical consumer-facing application of Large Language Model (LLM)-powered agents, yet how to effectively apply post-training Reinforcement Learning (RL) to optimize such agents remains underexplored. This work investigates RL-based optimization for shopping agents in real-world scenarios, where agents must simultaneously satisfy multiple interdependent o… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  41. arXiv:2603.03379  [pdf, ps, other

    cs.IR cs.AI

    MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

    Authors: Jiejun Tan, Zhicheng Dou, Liancheng Zhang, Yuyang Hu, Yiruo Cheng, Ji-Rong Wen

    Abstract: As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex indexing methods (such as memory graphs) require heavy computation and can cause information loss. Fu… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: Code and datasets are available at https://github.com/plageon/MemSifter

  42. arXiv:2603.03187  [pdf, ps, other

    cs.CV

    ProSMA-UNet: Decoder Conditioning for Proximal-Sparse Skip Feature Selection

    Authors: Chun-Wun Cheng, Yanqi Cheng, Peiyuan Jing, Guang Yang, Javier A. Montoya-Zegarra, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

    Abstract: Medical image segmentation commonly relies on U-shaped encoder-decoder architectures such as U-Net, where skip connections preserve fine spatial detail by injecting high-resolution encoder features into the decoder. However, these skip pathways also propagate low-level textures, background clutter, and acquisition noise, allowing irrelevant information to bypass deeper semantic filtering -- an iss… ▽ More

    Submitted 4 March, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

  43. arXiv:2603.01893  [pdf, ps, other

    cs.CV

    Generative Visual Chain-of-Thought for Image Editing

    Authors: Zijin Yin, Tiankai Hang, Yiji Cheng, Shiyi Zhang, Runze He, Yu Xu, Chunyu Wang, Bing Li, Zheng Chang, Kongming Liang, Qinglin Lu, Zhanyu Ma

    Abstract: Existing image editing methods struggle to perceive where to edit, especially under complex scenes and nuanced spatial instructions. To address this issue, we propose Generative Visual Chain-of-Thought (GVCoT), a unified framework that performs native visual reasoning by first generating spatial cues to localize the target region and then executing the edit. Unlike prior text-only CoT or tool-depe… ▽ More

    Submitted 16 March, 2026; v1 submitted 2 March, 2026; originally announced March 2026.

    Comments: Project page: https://pris-cv.github.io/GVCoT/

  44. arXiv:2603.01284  [pdf, ps, other

    cs.CV

    FoSS: Modeling Long Range Dependencies and Multimodal Uncertainty in Trajectory Prediction via Fourier State Space Integration

    Authors: Yizhou Huang, Gengze Jiang, Yihua Cheng, Kezhi Wang

    Abstract: Accurate trajectory prediction is vital for safe autonomous driving, yet existing approaches struggle to balance modeling power and computational efficiency. Attention-based architectures incur quadratic complexity with increasing agents, while recurrent models struggle to capture long-range dependencies and fine-grained local dynamics. Building upon this, we present FoSS, a dual-branch framework… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  45. arXiv:2603.01128  [pdf, ps, other

    cs.RO

    A Deployable Bio-inspired Compliant Leg Design for Enhanced Leaping in Quadruped Robots

    Authors: Yiyang Chen, Yuxin Liu, Jinzheng Zhou, Fanxin Wang, Qinglei Bu, Jie Sun, Yikun Cheng

    Abstract: Quadruped robots are becoming increasingly essential for various applications, including industrial inspection and catastrophe search and rescue. These scenarios require robots to possess enhanced agility and obstacle-navigation skills. Nonetheless, the performance of current platforms is often constrained by insufficient peak motor power, limiting their ability to perform explosive jumps. To addr… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

  46. arXiv:2602.20556  [pdf, ps, other

    cs.CV

    WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos

    Authors: Hanhui Li, Xuan Huang, Wanquan Liu, Yuhao Cheng, Long Chen, Yiqiang Yan, Xiaodan Liang, Chenqiang Gao

    Abstract: Despite recent progress in 3D hand reconstruction from monocular videos, most existing methods rely on data captured in well-controlled environments and therefore degrade in real-world settings with severe perturbations, such as hand-object interactions, extreme poses, illumination changes, and motion blur. To tackle these issues, we introduce WildGHand, an optimization-based framework that enable… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

  47. arXiv:2602.19891  [pdf

    eess.IV cs.CV

    Using Unsupervised Domain Adaptation Semantic Segmentation for Pulmonary Embolism Detection in Computed Tomography Pulmonary Angiogram (CTPA) Images

    Authors: Wen-Liang Lin, Yun-Chien Cheng

    Abstract: While deep learning has demonstrated considerable promise in computer-aided diagnosis for pulmonary embolism (PE), practical deployment in Computed Tomography Pulmonary Angiography (CTPA) is often hindered by "domain shift" and the prohibitive cost of expert annotations. To address these challenges, an unsupervised domain adaptation (UDA) framework is proposed, utilizing a Transformer backbone and… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

  48. arXiv:2602.18116  [pdf, ps, other

    cs.LG cs.AI

    Cut Less, Fold More: Model Compression through the Lens of Projection Geometry

    Authors: Olga Saukh, Dong Wang, Haris Šikić, Yun Cheng, Lothar Thiele

    Abstract: Compressing neural networks without retraining is vital for deployment at scale. We study calibration-free compression through the lens of projection geometry: structured pruning is an axis-aligned projection, whereas model folding performs a low-rank projection via weight clustering. We formalize both as orthogonal operators and show that, within a rank distance of one, folding provably yields sm… ▽ More

    Submitted 20 February, 2026; originally announced February 2026.

    Comments: Accepted by ICLR 2026

  49. arXiv:2602.16590  [pdf, ps, other

    cs.CV cs.AI cs.LG

    A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification

    Authors: Qi You, Yitai Cheng, Zichao Zeng, James Haworth

    Abstract: Street-view image attribute classification is a vital downstream task of image classification, enabling applications such as autonomous driving, urban analytics, and high-definition map construction. It remains computationally demanding whether training from scratch, initialising from pre-trained weights, or fine-tuning large models. Although pre-trained vision-language models such as CLIP offer r… ▽ More

    Submitted 18 February, 2026; originally announced February 2026.

  50. arXiv:2602.15831  [pdf, ps, other

    cs.HC cs.MA

    A2H: Agent-to-Human Protocol for AI Agent

    Authors: Zhiyuan Liang, Enfang Cui, Qian Wei, Rui She, Tianzheng Li, Minxin Guo, Yujun Cheng

    Abstract: AI agents are increasingly deployed as autonomous systems capable of planning, tool use, and multi-agent collaboration across complex tasks. However, existing agent-related protocols focus on agent-to-agent interactions, leaving humans as external observers rather than integrated participants within the agent systems. This limitation arises from the lack of a standardized mechanism for agents to d… ▽ More

    Submitted 31 December, 2025; originally announced February 2026.