Skip to main content

Showing 1–50 of 288 results for author: Yuan, R

.
  1. arXiv:2604.10708  [pdf, ps, other

    cs.SD cs.AI cs.CV cs.MM

    Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

    Authors: Zeyue Tian, Binxin Yang, Zhaoyang Liu, Jiexuan Zhang, Ruibin Yuan, Hubery Yin, Qifeng Chen, Chen Li, Jing Lv, Wei Xue, Yike Guo

    Abstract: Recent progress in multimodal models has spurred rapid advances in audio understanding, generation, and editing. However, these capabilities are typically addressed by specialized models, leaving the development of a truly unified framework that can seamlessly integrate all three tasks underexplored. While some pioneering works have explored unifying audio understanding and generation, they often… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

  2. arXiv:2604.09023  [pdf, ps, other

    cs.CV

    CAD 100K: A Comprehensive Multi-Task Dataset for Car Related Visual Anomaly Detection

    Authors: Jiahua Pang, Ying Li, Dongpu Cao, Jingcai Luo, Yanuo Zheng, Bao Yunfan, Yujie Lei, Rui Yuan, Yuxi Tian, Guojin Yuan, Hongchang Chen, Zhi Zheng, Yongchun Liu

    Abstract: Multi-task visual anomaly detection is critical for car-related manufacturing quality assessment. However, existing methods remain task-specific, hindered by the absence of a unified benchmark for multi-task evaluation. To fill in this gap, We present the CAD Dataset, a large-scale and comprehensive benchmark designed for car-related multi-task visual anomaly detection. The dataset contains over 1… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  3. arXiv:2604.06970  [pdf, ps, other

    cs.DC cs.OS cs.PF

    Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale

    Authors: Renzhong Yuan, Yijun Zeng, Xiaosong Gao, Linxi Yu, Haochun Liao, Han Wang

    Abstract: When output token counts can be predicted at submission time (Gan et al., 2026), client-side scheduling against a black-box LLM API becomes semi-clairvoyant: decisions condition on coarse token priors even though the provider's internals remain hidden. We decompose this boundary problem into three separable concerns: allocation (inter-class share via adaptive DRR), ordering (intra-class sequencing… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 10 pages, 8 figures. Code and reproduction artifacts available upon request

    ACM Class: C.2.4; D.4.4; I.2.11

  4. arXiv:2603.19957  [pdf, ps, other

    cs.CV cs.AI cs.LG

    HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

    Authors: Ruicheng Yuan, Zhenxuan Zhang, Anbang Wang, Liwei Hu, Xiangqian Hua, Yaya Peng, Jiawei Luo, Guang Yang

    Abstract: Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing pathology vision-language models (VLMs) reduce this output to a flat label or free-form text. We present HiPath, a lightweight VLM framework built on frozen UNI2 and Qwen3 backbones that treats structured repor… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: 10 pages, 1 figures, 3 tables

  5. arXiv:2603.15482  [pdf, ps, other

    physics.optics quant-ph

    Noise and dynamics in acoustoelectric waveguides

    Authors: Ryan O. Behunin, Andrew Shepherd, Ruoyu Yuan, Taylor Ray, Matthew J. Storey, Peter T. Rakich, Nils T. Otterstrom, Matt Eichenfield

    Abstract: We present a quantum field theoretic formulation of acoustoelectric interactions in waveguide-like systems of arbitrary cross-section. Building on an open quantum systems approach, we derive a unified description of plasmon-phonon coupling that incorporates dissipation, noise, and the influence of drift currents. Our analysis captures both bulk and surface plasmon modes, highlighting how drift cur… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: 16 pages, 4 figures

  6. arXiv:2603.15154  [pdf, ps, other

    eess.IV cs.CV

    Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

    Authors: Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

    Abstract: Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we propose a three-stage source-aware multi-expert framework for multi-source COVID-19 CT classification. First, we build a lung-aware 3D expert by combining original CT volumes and lung-extracted CT volum… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  7. arXiv:2603.15143  [pdf, ps, other

    eess.IV cs.CV

    Clinical Priors Guided Lung Disease Detection in 3D CT Scans

    Authors: Kejin Lu, Jianfa Bai, Qingqiu Li, Runtian Yuan, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

    Abstract: Accurate classification of lung diseases from chest CT scans plays an important role in computer-aided diagnosis systems. However, medical imaging datasets often suffer from severe class imbalance, which may significantly degrade the performance of deep learning models, especially for minority disease categories. To address this issue, we propose a gender-aware two-stage lung disease classificatio… ▽ More

    Submitted 17 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  8. arXiv:2603.11325  [pdf, ps, other

    cs.CV

    Towards Trustworthy Selective Generation: Reliability-Guided Diffusion for Ultra-Low-Field to High-Field MRI Synthesis

    Authors: Zhenxuan Zhang, Peiyuan Jing, Ruicheng Yuan, Liwei Hu, Anbang Wang, Fanwen Wang, Yinzhe Wu, Kh Tohidul Islam, Zhaolin Chen, Zi Wang, Peter Lally, Guang Yang

    Abstract: Low-field to high-field MRI synthesis has emerged as a cost-effective strategy to enhance image quality under hardware and acquisition constraints, particularly in scenarios where access to high-field scanners is limited or impractical. Despite recent progress in diffusion models, diffusion-based approaches often struggle to balance fine-detail recovery and structural fidelity. In particular, the… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  9. arXiv:2603.05591  [pdf, ps, other

    cs.CV

    Thinking with Spatial Code for Physical-World Video Reasoning

    Authors: Jieneng Chen, Wenxin Ma, Ruisheng Yuan, Yunzhi Zhang, Jiajun Wu, Alan Yuille

    Abstract: We introduce Thinking with Spatial Code, a framework that transforms RGB video into explicit, temporally coherent 3D representations for physical-world visual question answering. We highlight the empirical finding that our proposed spatial encoder can parse videos into structured spatial code with explicit 3D oriented bounding boxes and semantic labels, enabling large language models (LLMs) to rea… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

    Comments: Code at https://github.com/Beckschen/spatialcode

  10. arXiv:2603.04022  [pdf, ps, other

    cs.CV

    Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation

    Authors: Zilin Lu, Ruifeng Yuan, Weiwei Cao, Wanxing Chang, Zhongyu Wei, Sinuo Wang, Yong Xia, Ling Zhang, Jianpeng Zhang

    Abstract: Radiologists highly desire fully automated AI for radiology report generation (R2G), yet existing approaches fall short in clinical utility. Reinforcement learning (RL) holds potential to address these shortcomings, but its adoption in this task remains underexplored. In this paper, we revisit RL in terms of data efficiency and optimization effectiveness for R2G tasks. First, we explore the impact… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  11. arXiv:2603.00610  [pdf, ps, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

    Authors: Yinghao Ma, Haiwen Xia, Hewei Gao, Weixiong Chen, Yuxin Ye, Yuchen Yang, Sungkyun Chang, Mingshuo Ding, Yizhi Li, Ruibin Yuan, Simon Dixon, Emmanouil Benetos

    Abstract: While music generation models have evolved to handle complex multimodal inputs mixing text, lyrics, and reference audio, evaluation mechanisms have lagged behind. In this paper, we bridge this critical gap by establishing a comprehensive ecosystem for music reward modeling under Compositional Multimodal Instruction (CMI), where the generated music may be conditioned on text descriptions, lyrics, a… ▽ More

    Submitted 4 March, 2026; v1 submitted 28 February, 2026; originally announced March 2026.

  12. arXiv:2603.00533  [pdf, ps, other

    cs.SD eess.AS

    Voices of Civilizations: A Multilingual QA Benchmark for Global Music Understanding

    Authors: Shangda Wu, Ziya Zhou, Yongyi Zang, Yutong Zheng, Dafang Liang, Ruibin Yuan, Qiuqiang Kong

    Abstract: We introduce Voices of Civilizations, the first multilingual QA benchmark for evaluating audio LLMs' cultural comprehension on full-length music recordings. Covering 380 tracks across 38 languages, our automated pipeline yields 1,190 multiple-choice questions through four stages - each followed by manual verification: 1) compiling a representative music list; 2) generating cultural-background docu… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

    Comments: 2 pages, 2 figures, 1 table, accepted by ISMIR 2025 LBD

  13. arXiv:2602.19013  [pdf

    quant-ph

    Co-Propagation of Quantum Time Synchronization and Optical Frequency Transfer over a 122 km Hollow-Core Fiber

    Authors: Huibo Hong, Xiao Xiang, Runai Quan, Rongduo Lu, Qian Zhou, Dawei Ge, Liuyan Han, Bo Liu, Ru Yuan, Dechao Zhang, Yuting Liu, Bingke Shi, ZhiGuang Xia, Xinghua Li, Mingtao Cao, Tao Liu, Ruifang Dong, Shougang Zhang

    Abstract: The co-propagation of quantum and classical signals through shared optical fibers is crucial for scalable quantum networks. However, this coexistence is fundamentally limited by spontaneous Raman scattering (SpRS) from the bright classical light, which generates overwhelming noise that disrupts the single-photon-level quantum signals. Here, we overcome this long-standing challenge by leveraging th… ▽ More

    Submitted 21 February, 2026; originally announced February 2026.

  14. arXiv:2602.09621  [pdf, ps, other

    cs.CL cs.LG

    AlignTune: Modular Toolkit for Post-Training Alignment of Large Language Models

    Authors: R E Zera Marveen Lyngkhoi, Chirag Chawla, Pratinav Seth, Utsav Avaiya, Soham Bhattacharjee, Mykola Khandoga, Rui Yuan, Vinay Kumar Sankarapu

    Abstract: Post-training alignment is central to deploying large language models (LLMs), yet practical workflows remain split across backend-specific tools and ad-hoc glue code, making experiments hard to reproduce. We identify backend interference, reward fragmentation, and irreproducible pipelines as key obstacles in alignment research. We introduce AlignTune, a modular toolkit exposing a unified interface… ▽ More

    Submitted 11 February, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

    Comments: Library opensource and available at https://github.com/Lexsi-Labs/aligntune

  15. arXiv:2602.09331  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Beyond Uniform Credit: Causal Credit Assignment for Policy Optimization

    Authors: Mykola Khandoga, Rui Yuan, Vinay Kumar Sankarapu

    Abstract: Policy gradient methods for language model reasoning, such as GRPO and DAPO, assign uniform credit to all generated tokens - the filler phrase "Let me think" receives the same gradient update as the critical calculation "23 + 45 = 68." We propose counterfactual importance weighting: mask reasoning spans, measure the drop in answer probability, and upweight tokens accordingly during policy gradient… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

    Comments: 12 pages, 1 figure

  16. arXiv:2602.04380  [pdf, ps, other

    cs.LG cs.AI

    Beyond KL Divergence: Policy Optimization with Flexible Bregman Divergences for LLM Reasoning

    Authors: Rui Yuan, Mykola Khandoga, Vinay Kumar Sankarapu

    Abstract: Policy optimization methods like Group Relative Policy Optimization (GRPO) and its variants have achieved strong results on mathematical reasoning and code generation tasks. Despite extensive exploration of reward processing strategies and training dynamics, all existing group-based methods exclusively use KL divergence for policy regularization, leaving the choice of divergence function unexplore… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  17. arXiv:2601.17761  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation

    Authors: Dongjie Cheng, Ruifeng Yuan, Yongqi Li, Runyang You, Wenjie Wang, Liqiang Nie, Lei Zhang, Wenjie Li

    Abstract: Real-world perception and interaction are inherently multimodal, encompassing not only language but also vision and speech, which motivates the development of "Omni" MLLMs that support both multimodal inputs and multimodal outputs. While a sequence of omni MLLMs has emerged, most existing systems still rely on additional expert components to achieve multimodal generation, limiting the simplicity o… ▽ More

    Submitted 25 January, 2026; originally announced January 2026.

  18. arXiv:2601.16526  [pdf, ps, other

    cond-mat.mtrl-sci

    Mobile charges in MoS2/high-k oxide transistors: from abnormal instabilities to memory-like dynamics

    Authors: Shaokai Zhou, Haihui Cai, Yehao Wu, Yufeng Min, Renchen Yuan, Yezhu Lv, Jianming Huang, Yuanyuan Shi, Yury Yuryevich Illarionov

    Abstract: MoS$_2$ field-effect transistors (FETs) with high-\textit{k} oxides currently lag behind silicon standards in bias and temperature stability due to ubiquitous border oxide traps that cause clockwise (CW) hysteresis in gate transfer characteristics. While suppressing this effect is typically mandatory for logic FETs, here we explore an alternative strategy where the initial CW hysteresis can be dyn… ▽ More

    Submitted 23 January, 2026; originally announced January 2026.

    Comments: 45 page, 17 figure, The first 28 pages of the main text contain 7 figures, and the following 17 pages of supplementary information contain 10 figures

  19. arXiv:2601.13870  [pdf, ps, other

    physics.comp-ph physics.flu-dyn

    An efficient treatment of heat-flux boundary conditions in GSIS for rarefied gas flows

    Authors: Yanbing Zhang, Ruifeng Yuan, Liyan Luo, Lei Wu

    Abstract: Heat-flux boundary conditions are challenging to implement efficiently in rarefied gas flow simulations because the wall-reflected gas temperature and density must be determined dynamically during the computation. This paper aims to tackle this problem within the general synthetic iterative scheme (GSIS), where the Boltzmann kinetic equation is solved deterministically in an outer loop and macrosc… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

  20. arXiv:2601.13304  [pdf, ps, other

    cs.CV

    CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning

    Authors: Wenxin Ma, Chenlong Wang, Ruisheng Yuan, Hao Chen, Nanru Dai, S. Kevin Zhou, Yijun Yang, Alan Yuille, Jieneng Chen

    Abstract: Humans can look at a static scene and instantly predict what happens next -- will moving this object cause a collision? We call this ability Causal Spatial Reasoning. However, current multimodal large language models (MLLMs) cannot do this, as they remain largely restricted to static spatial perception, struggling to answer "what-if" questions in a 3D scene. We introduce CausalSpatial, a diagnosti… ▽ More

    Submitted 19 January, 2026; originally announced January 2026.

    Comments: Code is available: https://github.com/CausalSpatial/CausalSpatial

  21. SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

    Authors: Ziyang Ma, Guanrou Yang, Wenxi Chen, Zhifu Gao, Yexing Du, Xiquan Li, Zhisheng Zheng, Haina Zhu, Jianheng Zhuo, Zheshu Song, Ruiyang Xu, Tiranrui Wang, Yifan Yang, Yanqiao Zhu, Zhikang Niu, Liumeng Xue, Yinghao Ma, Ruibin Yuan, Shiliang Zhang, Kai Yu, Eng Siong Chng, Xie Chen

    Abstract: The recent surge in open-source Multimodal Large Language Models (MLLM) frameworks, such as LLaVA, provides a convenient kickoff for artificial intelligence developers and researchers. However, most of the MLLM frameworks take vision as the main input modality, and provide limited in-depth support for the modality of speech, audio, and music. This situation hinders the development of audio-languag… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

    Comments: Published in IEEE Journal of Selected Topics in Signal Processing (JSTSP)

  22. arXiv:2512.12303  [pdf, ps, other

    cs.CV

    OMUDA: Omni-level Masking for Unsupervised Domain Adaptation in Semantic Segmentation

    Authors: Yang Ou, Xiongwei Zhao, Xinye Yang, Yihan Wang, Yicheng Di, Rong Yuan, Xieyuanli Chen, Xu Zhu

    Abstract: Unsupervised domain adaptation (UDA) enables semantic segmentation models to generalize from a labeled source domain to an unlabeled target domain. However, existing UDA methods still struggle to bridge the domain gap due to cross-domain contextual ambiguity, inconsistent feature representations, and class-wise pseudo-label noise. To address these challenges, we propose Omni-level Masking for Unsu… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

    Comments: Submitted to TMM

  23. arXiv:2512.12196  [pdf, ps, other

    cs.MM cs.CV cs.SD eess.AS

    AutoMV: An Automatic Multi-Agent System for Music Video Generation

    Authors: Xiaoxuan Tang, Xinping Lei, Chaoran Zhu, Shiyun Chen, Ruibin Yuan, Yizhi Li, Changjae Oh, Ge Zhang, Wenhao Huang, Emmanouil Benetos, Yang Liu, Jiaheng Liu, Yinghao Ma

    Abstract: Music-to-Video (M2V) generation for full-length songs faces significant challenges. Existing methods produce short, disjointed clips, failing to align visuals with musical structure, beats, or lyrics, and lack temporal consistency. We propose AutoMV, a multi-agent system that generates full music videos (MVs) directly from a song. AutoMV first applies music processing tools to extract musical attr… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

  24. arXiv:2512.09302  [pdf, ps, other

    physics.acc-ph

    Real-Time-Capable Betatron Tune Measurement from Schottky Spectra Using Deep Learning and Uncertainty-Aware Kalman Filtering

    Authors: Peihan Sun, Manzhou Zhang, Renxian Yuan, Deming Li, Jian Dong, Ying Shi

    Abstract: Betatron tune measurement is essential for beam control in compact proton-therapy synchrotrons, yet conventional peak-detection techniques are not robust under the low signal-to-noise ratio (SNR) conditions typical of these machines. This work presents a lightweight convolutional neural network that performs real-time tune extraction from Schottky spectra with sub-millisecond inference latency and… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

  25. arXiv:2512.07874  [pdf, ps, other

    cs.LG

    Controllable risk scenario generation from human crash data for autonomous vehicle testing

    Authors: Qiujing Lu, Xuanhan Wang, Runze Yuan, Wei Lu, Xinyi Gong, Shuo Feng

    Abstract: Ensuring the safety of autonomous vehicles (AV) requires rigorous testing under both everyday driving and rare, safety-critical conditions. A key challenge lies in simulating environment agents, including background vehicles (BVs) and vulnerable road users (VRUs), that behave realistically in nominal traffic while also exhibiting risk-prone behaviors consistent with real-world accidents. We introd… ▽ More

    Submitted 26 November, 2025; originally announced December 2025.

  26. arXiv:2512.07093  [pdf, ps, other

    physics.flu-dyn physics.comp-ph

    Surrogate-assisted airfoil optimization in rarefied gas flows

    Authors: Xiaoda Li, Ruifeng Yuan, Yanbing Zhang, Lei Wu

    Abstract: With growing interest in space exploration, optimized airfoil design has become increasingly important. However, airfoil design in rarefied gas flows remains underexplored because solving the Boltzmann equation formulated in a six dimensional phase space is time consuming. To address this problem, a solver-in-the-loop Bayesian optimization framework for symmetric, thickness-only airfoils is develo… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

  27. Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition

    Authors: Zihao Wang, Ruibin Yuan, Ziqi Geng, Hengjia Li, Xingwei Qu, Xinyi Li, Songye Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang

    Abstract: Singing accent research is underexplored compared to speech accent studies, primarily due to the scarcity of suitable datasets. Existing singing datasets often suffer from detail loss, frequently resulting from the vocal-instrumental separation process. Additionally, they often lack regional accent annotations. To address this, we introduce the Multi-Accent Mandarin Dry-Vocal Singing Dataset (MADV… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

    Comments: Accepted by ACMMM 2025

    Journal ref: Proceedings of the 33rd ACM International Conference on Multimedia (ACMMM 2025), Pages 12714-12721, October 27, 2025. Dublin, Ireland

  28. Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model

    Authors: Zihao Wang, Ruibin Yuan, Ziqi Geng, Hengjia Li, Xingwei Qu, Xinyi Li, Songye Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang

    Abstract: Automated singing assessment is crucial for education and entertainment. However, existing systems face two fundamental limitations: reliance on reference tracks, which stifles creative expression, and the simplification of complex performances into non-diagnostic scores based solely on pitch and rhythm. We advocate for a shift from discriminative to descriptive evaluation, creating a complete eco… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

    Comments: Accepted to ACMMM 2025 oral

    ACM Class: H.5.5; I.2.7

    Journal ref: Proceedings of the 33rd ACM International Conference on Multimedia (ACMMM 2025), Pages 12227-12236

  29. arXiv:2512.04268  [pdf, ps, other

    cs.LG cs.AI

    The Initialization Determines Whether In-Context Learning Is Gradient Descent

    Authors: Shifeng Xie, Rui Yuan, Simone Rossi, Thomas Hannagan

    Abstract: In-context learning (ICL) in large language models (LLMs) is a striking phenomenon, yet its underlying mechanisms remain only partially understood. Previous work connects linear self-attention (LSA) to gradient descent (GD), this connection has primarily been established under simplified conditions with zero-mean Gaussian priors and zero initialization for GD. However, subsequent studies have chal… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

  30. arXiv:2512.02057  [pdf

    cs.LG cs.AI

    Opening the Black Box: An Explainable, Few-shot AI4E Framework Informed by Physics and Expert Knowledge for Materials Engineering

    Authors: Haoxiang Zhang, Ruihao Yuan, Lihui Zhang, Yushi Luo, Qiang Zhang, Pan Ding, Xiaodong Ren, Weijie Xing, Niu Gao, Jishan Chen, Chubo Zhang

    Abstract: The industrial adoption of Artificial Intelligence for Engineering (AI4E) faces two fundamental bottlenecks: scarce high-quality data and the lack of interpretability in black-box models-particularly critical in safety-sensitive sectors like aerospace. We present an explainable, few-shot AI4E framework that is systematically informed by physics and expert knowledge throughout its architecture. Sta… ▽ More

    Submitted 28 November, 2025; originally announced December 2025.

  31. arXiv:2512.00466  [pdf, ps, other

    cs.CL cs.AI

    SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

    Authors: Yang Xiao, Chunpu Xu, Ruifeng Yuan, Jiashuo Wang, Wenjie Li, Pengfei Liu

    Abstract: Test-time compute scaling has emerged as a powerful paradigm for enhancing mathematical reasoning in large language models (LLMs) by allocating additional computational resources during inference. However, current methods employ uniform resource distribution across all reasoning sub-problems, creating fundamental bottlenecks where challenging sub-problems receive insufficient attention while routi… ▽ More

    Submitted 29 November, 2025; originally announced December 2025.

    Comments: accepted by AAAI 2026

  32. arXiv:2511.21156  [pdf, ps, other

    cs.NI

    Digital Twin-Driven Secure Access Strategy for SAGIN-Enabled IoT Networks

    Authors: Hui Liang, Zhihui Wu, Runqi Yuan, Guobin Zhang, Yanfeng Zhang, Jinkai Zheng, Tom H. Luan

    Abstract: In space-air-ground integrated networks (SAGIN)-enabled IoT networks, secure access has become a significant challenge due to the increasing risks of eavesdropping attacks. To address these threats to data confidentiality, this paper proposes a Digital Twin (DT)-driven secure access strategy. The strategy leverages a virtual replica of the physical SAGIN environment within the DT framework to cont… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  33. arXiv:2511.18870  [pdf, ps, other

    cs.CV

    HunyuanVideo 1.5 Technical Report

    Authors: Bing Wu, Chang Zou, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Jack Peng, Jianbing Wu, Jiangfeng Xiong, Jie Jiang, Linus, Patrol, Peizhen Zhang, Peng Chen, Penghao Zhao, Qi Tian, Songtao Liu, Weijie Kong, Weiyan Wang, Xiao He, Xin Li, Xinchi Deng, Xuefei Zhe, Yang Li, Yanxin Long , et al. (56 additional authors not shown)

    Abstract: We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding til… ▽ More

    Submitted 24 November, 2025; v1 submitted 24 November, 2025; originally announced November 2025.

  34. arXiv:2511.18433  [pdf, ps, other

    physics.comp-ph

    A fast-converging and asymptotic-preserving method for adjoint shape optimization of rarefied gas flows

    Authors: Yanbing Zhang, Ruifeng Yuan, Lei Wu

    Abstract: Adjoint based shape optimization is a powerful technique in fluid-dynamics optimization, capable of identifying an optimal shape within only dozens of design iterations. However, when extended to rarefied gas flows, the computational cost becomes enormous because both the six dimensional primal and adjoint Boltzmann equations must be solved for each candidate shape. Building on the general synthet… ▽ More

    Submitted 23 November, 2025; originally announced November 2025.

  35. arXiv:2511.18368  [pdf, ps, other

    cs.AI

    Wireless Power Transfer and Intent-Driven Network Optimization in AAVs-assisted IoT for 6G Sustainable Connectivity

    Authors: Xiaoming He, Gaofeng Wang, Huajun Cui, Rui Yuan, Haitao Zhao

    Abstract: Autonomous Aerial Vehicle (AAV)-assisted Internet of Things (IoT) represents a collaborative architecture in which AAV allocate resources over 6G links to jointly enhance user-intent interpretation and overall network performance. Owing to this mutual dependence, improvements in intent inference and policy decisions on one component reinforce the efficiency of others, making highly reliable intent… ▽ More

    Submitted 28 January, 2026; v1 submitted 23 November, 2025; originally announced November 2025.

  36. arXiv:2511.12152  [pdf

    cs.AR eess.SP

    A Digital SRAM-Based Compute-In-Memory Macro for Weight-Stationary Dynamic Matrix Multiplication in Transformer Attention Score Computation

    Authors: Jianyi Yu, Tengxiao Wang, Yuxuan Wang, Xiang Fu, Fei Qiao, Ying Wang, Rui Yuan, Liyuan Liu, Cong Shi

    Abstract: Compute-in-memory (CIM) techniques are widely employed in energy-efficient artificial intelligent (AI) processors. They alleviate power and latency bottlenecks caused by extensive data movements between compute and storage units. To extend these benefits to Transformer, this brief proposes a digital CIM macro to compute attention score. To eliminate dynamic matrix multiplication (MM), we reconstru… ▽ More

    Submitted 12 December, 2025; v1 submitted 15 November, 2025; originally announced November 2025.

  37. arXiv:2511.02294  [pdf, ps, other

    cs.RO

    SuckTac: Camera-based Tactile Sucker for Unstructured Surface Perception and Interaction

    Authors: Ruiyong Yuan, Jieji Ren, Zhanxuan Peng, Feifei Chen, Guoying Gu

    Abstract: Suckers are significant for robots in picking, transferring, manipulation and locomotion on diverse surfaces. However, most of the existing suckers lack high-fidelity perceptual and tactile sensing, which impedes them from resolving the fine-grained geometric features and interaction status of the target surface. This limits their robust performance with irregular objects and in complex, unstructu… ▽ More

    Submitted 4 November, 2025; originally announced November 2025.

  38. arXiv:2510.24693  [pdf, ps, other

    cs.SD cs.CL eess.AS

    STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

    Authors: Zihan Liu, Zhikang Niu, Qiuyang Xiao, Zhisheng Zheng, Ruoqi Yuan, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Jianze Liang, Xie Chen, Leilei Sun, Dahua Lin, Jiaqi Wang

    Abstract: Despite rapid progress in Multi-modal Large Language Models and Large Audio-Language Models, existing audio benchmarks largely test semantics that can be recovered from text captions, masking deficits in fine-grained perceptual reasoning. We formalize audio 4D intelligence that is defined as reasoning over sound dynamics in time and 3D space, and introduce STAR-Bench to measure it. STAR-Bench comb… ▽ More

    Submitted 28 November, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

    Comments: Homepage: https://internlm.github.io/StarBench/

  39. arXiv:2510.22431  [pdf, ps, other

    cs.MA cs.CV

    Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration

    Authors: Zheng Wei, Mingchen Li, Zeqian Zhang, Ruibin Yuan, Pan Hui, Huamin Qu, James Evans, Maneesh Agrawala, Anyi Rao

    Abstract: Recent advancements in multi-agent systems have demonstrated significant potential for enhancing creative task performance, such as long video generation. This study introduces three innovations to improve multi-agent collaboration. First, we propose OmniAgent, a hierarchical, graph-based multi-agent framework for long video generation that leverages a film-production-inspired architecture to enab… ▽ More

    Submitted 25 October, 2025; originally announced October 2025.

  40. arXiv:2510.17811  [pdf, ps, other

    eess.SP physics.ao-ph

    Channel Modeling of Satellite-to-Underwater Laser Communication Links: An Analytical-Monte Carlo Hybrid Approach

    Authors: Zhixing Wang, Renzhi Yuan, Haifeng Yao, Chuang Yang, Mugen Peng

    Abstract: Channel modeling for satellite-to-underwater laser communication (StULC) links remains challenging due to long distances and the diversity of the channel constituents. The StULC channel is typically segmented into three isolated channels: the atmospheric channel, the air-water interface channel, and the underwater channel. Previous studies involving StULC channel modeling either focused on separat… ▽ More

    Submitted 24 September, 2025; originally announced October 2025.

  41. arXiv:2510.17536  [pdf, ps, other

    math.DG

    The existence of negatively curved metrics on locally conformally flat manifolds with boundary

    Authors: Rirong Yuan

    Abstract: We use certain Morse functions to construct conformal metrics with negative sectional curvature on locally conformally flat manifolds with boundary. Moreover, without conformally flatness assumption, we also construct conformal metric of positive Einstein tensor.

    Submitted 20 October, 2025; originally announced October 2025.

  42. arXiv:2510.17039  [pdf

    cs.CV

    Click, Predict, Trust: Clinician-in-the-Loop AI Segmentation for Lung Cancer CT-Based Prognosis within the Knowledge-to-Action Framework

    Authors: Mohammad R. Salmanpour, Sonya Falahati, Amir Hossein Pouria, Amin Mousavi, Somayeh Sadat Mehrnia, Morteza Alizadeh, Arman Gorji, Zeinab Farsangi, Alireza Safarian, Mehdi Maghsudi, Carlos Uribe, Arman Rahmim, Ren Yuan

    Abstract: Lung cancer remains the leading cause of cancer mortality, with CT imaging central to screening, prognosis, and treatment. Manual segmentation is variable and time-intensive, while deep learning (DL) offers automation but faces barriers to clinical adoption. Guided by the Knowledge-to-Action framework, this study develops a clinician-in-the-loop DL pipeline to enhance reproducibility, prognostic a… ▽ More

    Submitted 19 October, 2025; originally announced October 2025.

    Comments: 13 pages, 2 figures, and 2 tables

    ACM Class: F.2.2; I.2.7

  43. arXiv:2510.14378  [pdf, ps, other

    physics.flu-dyn

    Wetted-Area Minimum and Inlet-Outlet Reciprocity in Optimal Manifolds of Rarefied Gas Flows

    Authors: Ruifeng Yuan, Lei Wu

    Abstract: While flow optimization has been extensively studied in the continuum regime, its extension to rarefied gas flows remains less explored. Here, based on the Boltzmann model equation, an adjoint topology optimization method is employed to design two-dimensional single inlet multi outlet manifolds, aiming to maximize the total mass flow rate while maintaining outflow uniformity. Two key findings are… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  44. arXiv:2510.09038  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.CY cs.LG

    Auto-scaling Continuous Memory for GUI Agent

    Authors: Wenyi Wu, Kun Zhou, Ruoxin Yuan, Vivian Yu, Stephen Wang, Zhiting Hu, Biwei Huang

    Abstract: We study how to endow GUI agents with scalable memory that help generalize across unfamiliar interfaces and long-horizon tasks. Prior GUI agents compress past trajectories into text tokens, which balloons context length and misses decisive visual cues (e.g., exact widget size and position). We propose a continuous memory that encodes each GUI trajectory into a fixed-length sequence of continuous e… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  45. arXiv:2510.08334  [pdf, ps, other

    cond-mat.mes-hall cond-mat.mtrl-sci physics.optics

    Topological surface magnon-polariton in an insulating canted antiferromagnet

    Authors: Weixin Li, Rundong Yuan, Fenglin Zhong, Bo Peng, Jean-Philippe Ansermet, Haiming Yu

    Abstract: Excitation and control of antiferromagnetic magnon modes lie at the heart of coherent antiferromagnetic spintronics. Here, we propose a topological surface magnon-polariton as a new approach in the prototypical magnonic material hematite. We show that in an insulating canted antiferromagnet, where strong-coupled magnon-photon modes can be achieved using electrical on-chip layouts, a surface magnon… ▽ More

    Submitted 26 October, 2025; v1 submitted 9 October, 2025; originally announced October 2025.

    Comments: 12 pages, 7 figures

  46. arXiv:2510.02797  [pdf, ps, other

    eess.AS

    SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision

    Authors: Chunbo Hao, Ruibin Yuan, Jixun Yao, Qixin Deng, Xinyi Bai, Yanbo Wang, Wei Xue, Lei Xie

    Abstract: Music structure analysis (MSA) underpins music understanding and controllable generation, yet progress has been limited by small, inconsistent corpora. We present SongFormer, a scalable framework that learns from heterogeneous supervision. SongFormer (i) fuses short- and long-window self-supervised learning representations to capture both fine-grained and long-range dependencies, and (ii) introduc… ▽ More

    Submitted 8 April, 2026; v1 submitted 3 October, 2025; originally announced October 2025.

  47. arXiv:2509.21144  [pdf, ps, other

    cs.SD cs.AI

    UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice

    Authors: Sitong Cheng, Weizhen Bian, Xinsheng Wang, Ruibin Yuan, Jianyi Chen, Shunshun Yin, Yike Guo, Wei Xue

    Abstract: The ultimate goal of expressive speech-to-speech translation (S2ST) is to accurately translate spoken content while preserving the speaker identity and emotional style. However, progress in this field is largely hindered by three key challenges: the scarcity of paired speech data that retains expressive styles, the complexity of multi-stage processing pipelines, and the limited transfer of transla… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  48. arXiv:2509.20030  [pdf, ps, other

    eess.SP

    Multi-Stage CD-Kennedy Receiver for QPSK Modulated CV-QKD in Turbulent Channels

    Authors: Renzhi Yuan, Zhixing Wang, Shouye Miao, Mufei Zhao, Haifeng Yao, Bin Cao, Mugen Peng

    Abstract: Continuous variable-quantum key distribution (CV-QKD) protocols attract increasing attentions in recent years because they enjoy high secret key rate (SKR) and good compatibility with existing optical communication infrastructure. Classical coherent receivers are widely employed in coherent states based CV-QKD protocols, whose detection performance is bounded by the standard quantum limit (SQL). R… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 25pages,7 figures

  49. arXiv:2509.03959  [pdf, ps, other

    cs.SD

    WenetSpeech-Yue: A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation

    Authors: Longhao Li, Zhao Guo, Hongjie Chen, Yuhang Dai, Ziyu Zhang, Hongfei Xue, Tianlun Zuo, Chengyou Wang, Shuiyuan Wang, Jie Li, Jian Kang, Xin Xu, Hui Bu, Binbin Zhang, Ruibin Yuan, Ziya Zhou, Wei Xue, Lei Xie

    Abstract: The development of speech understanding and generation has been significantly accelerated by the availability of large-scale, high-quality speech datasets. Among these, ASR and TTS are regarded as the most established and fundamental tasks. However, for Cantonese (Yue Chinese), spoken by approximately 84.9 million native speakers worldwide, limited annotated resources have hindered progress and re… ▽ More

    Submitted 5 September, 2025; v1 submitted 4 September, 2025; originally announced September 2025.

  50. arXiv:2508.21056  [pdf, ps, other

    cond-mat.mtrl-sci cond-mat.mes-hall physics.atm-clus physics.chem-ph physics.comp-ph

    Altermagnetic Shastry-Sutherland fullerene networks

    Authors: Jiaqi Wu, Alaric Sanders, Rundong Yuan, Bo Peng

    Abstract: The interplay between quantum magnetism and many-body physics is of fundamental importance in condensed matter physics. %Magnetic exchange interactions in frustrated lattices give rise to rich phase diagrams. Molecular building blocks provide a versatile platform for exploring the exotic quantum phases arising from complex orderings in frustrated lattices. Here we demonstrate a showcase system bas… ▽ More

    Submitted 11 September, 2025; v1 submitted 28 August, 2025; originally announced August 2025.

    Comments: 8 pages, 3 figures