Skip to main content

Showing 1–50 of 470 results for author: Yao, C

.
  1. arXiv:2604.13598  [pdf, ps, other

    cs.LG stat.ME

    Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

    Authors: Qin Zhou, Guoyan Liang, Qianyi Yang, Jingyuan Chen, Sai Wu, Chang Yao, Zhe Wang

    Abstract: Recent reinforcement learning (RL) approaches have advanced radiology report generation (RRG), yet two core limitations persist: (1) report-level rewards offer limited evidence-grounded guidance for clinical faithfulness; and (2) current methods lack an explicit self-improving mechanism to align with clinical preference. We introduce clinically aligned Evidence-aware Self-Correcting Reinforcement… ▽ More

    Submitted 15 April, 2026; originally announced April 2026.

    Comments: 13 pages,4 figures, ACL2026-main

  2. arXiv:2604.02324  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation

    Authors: Daiwei Chen, Zhoutong Fu, Chengming Jiang, Haichao Zhang, Ran Zhou, Tan Wang, Chunnan Yao, Guoyao Li, Rui Cai, Yihan Cao, Ruijie Jiang, Fedor Borisyuk, Jianqiang Shen, Jingwei Wu, Ramya Korlakai Vinayak

    Abstract: Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their representations. We present a systematic analysis of this strategy: through spec… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  3. arXiv:2604.01498  [pdf, ps, other

    cs.MM

    Semantic Compensation via Adversarial Removal for Robust Zero-Shot ECG Diagnosis

    Authors: Hongjun Liu, Rujun Han, Leyu Zhou, Chao Yao

    Abstract: Recent ECG--language pretraining methods enable zero-shot diagnosis by aligning cardiac signals with clinical text, but they do not explicitly model robustness to partial observation and are typically studied under fully observed ECG settings. In practice, diagnostically critical leads or temporal segments may be missing due to electrode detachment, motion artifacts, or signal corruption, causing… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  4. arXiv:2604.00609  [pdf, ps, other

    cs.CV

    TALENT: Target-aware Efficient Tuning for Referring Image Segmentation

    Authors: Shuo Jin, Siyue Yu, Bingfeng Zhang, Chao Yao, Meiqin Liu, Jimin Xiao

    Abstract: Referring image segmentation aims to segment specific targets based on a natural text expression. Recently, parameter-efficient tuning (PET) has emerged as a promising paradigm. However, existing PET-based methods often suffer from the fact that visual features can't emphasize the text-referred target instance but activate co-category yet unrelated objects. We analyze and quantify this problem, te… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: Accepted by CVPR26 Findings

  5. arXiv:2603.25187  [pdf, ps, other

    cs.CL cs.AI

    Probing the Lack of Stable Internal Beliefs in LLMs

    Authors: Yifan Luo, Kangping Xu, Yanzhen Lu, Yang Yuan, Andrew Chi-Chih Yao

    Abstract: Persona-driven large language models (LLMs) require consistent behavioral tendencies across interactions to simulate human-like personality traits, such as persistence or reliability. However, current LLMs often lack stable internal representations that anchor their responses over extended dialogues. This work explores whether LLMs can maintain "implicit consistency", defined as persistent adheren… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: Accepted by NeurIPS 2025 Workshop Mexico City PersonaNLP

  6. arXiv:2603.18273  [pdf, ps, other

    cs.AI

    EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research

    Authors: Chenguang Pan, Zhou Zhang, Weixuan Xiao, Chengyuan Yao

    Abstract: In this technical report, we present the Educational Data Mining Automated Research System (EDM-ARS), a domain-specific multi-agent pipeline that automates end-to-end educational data mining (EDM) research. We conceptualize EDM-ARS as a general framework for domain-aware automated research pipelines, where educational expertise is embedded into each stage of the research lifecycle. As a first inst… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  7. arXiv:2603.14951  [pdf, ps, other

    cs.CV

    GT-PCQA: Geometry-Texture Decoupled Point Cloud Quality Assessment with MLLM

    Authors: Guohua Zhang, Jian Jin, Meiqin Liu, Chao Yao, Weisi Lin, Yao Zhao

    Abstract: With the rapid advancement of Multi-modal Large Language Models (MLLMs), MLLM-based Image Quality Assessment (IQA) methods have shown promising generalization. However, directly extending these MLLM-based IQA methods to PCQA remains challenging. On the one hand, existing PCQA datasets are limited in scale, which hinders stable and effective instruction tuning of MLLMs. On the other hand, due to la… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  8. arXiv:2603.14728  [pdf, ps, other

    quant-ph physics.app-ph physics.comp-ph physics.data-an

    A Deep-Learning-Boosted Framework for Quantum Sensing with Nitrogen-Vacancy Centers in Diamond

    Authors: Changyu Yao, Haochen Shen, Zhongyuan Liu, Ruotian Gong, Md Shakil Bin Kashem, Stella Varnum, Liangyu Li, Hangyue Li, Yue Yu, Yizhou Wang, Xiaoshui Lin, Jonathan Brestoff, Chenyang Lu, Shankar Mukherji, Chuanwei Zhang, Chong Zu

    Abstract: Nitrogen-vacancy (NV) centers in diamond are a versatile quantum sensing platform for high sensitivity measurements of magnetic fields, temperature and strain with nanoscale spatial resolution. A common bottleneck is the analysis of optically detected magnetic resonance (ODMR) spectra, where target quantities are encoded in resonance features. Conventional nonlinear fitting is often computationall… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: Main text contains: 9 pages, 4 figures. Includes supplementary material

  9. arXiv:2603.13781  [pdf, ps, other

    cs.RO

    KoopmanFlow: Spectrally Decoupled Generative Control Policy via Koopman Structural Bias

    Authors: Chengsi Yao, Ge Wang, Kai Kang, Shenhao Yan, Jiahao Yang, Fan Feng, Honghao Cai, Xianxian Zeng, Rongjun Chen, Yiming Zhao, Yatong Han, Xi Li

    Abstract: Generative Control Policies (GCPs) show immense promise in robotic manipulation but struggle to simultaneously model stable global motions and high-frequency local corrections. While modern architectures extract multi-scale spatial features, their underlying Probability Flow ODEs apply a uniform temporal integration schedule. Compressed to a single step for real-time Receding Horizon Control (RHC)… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

  10. arXiv:2603.11563  [pdf, ps, other

    cs.CV cs.RO

    SVLL: Staged Vision-Language Learning for Physically Grounded Embodied Task Planning

    Authors: Yuyuan Yang, Junkun Hong, Hongrong Wang, Honghao Cai, Xunpeng Ren, Ge Wang, Mingcong Lei, Shenhao Yan, Jiahao Yang, Chengsi Yao, Xi Li, Yiming Zhao, Yatong Han, Jinke Ren

    Abstract: Embodied task planning demands vision-language models to generate action sequences that are both visually grounded and causally coherent over time. However, existing training paradigms face a critical trade-off: joint end-to-end training often leads to premature temporal binding, while standard reinforcement learning methods suffer from optimization instability. To bridge this gap, we present Stag… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  11. arXiv:2603.03726  [pdf, ps, other

    cs.CV

    QD-PCQA: Quality-Aware Domain Adaptation for Point Cloud Quality Assessment

    Authors: Guohua Zhang, Jian Jin, Meiqin Liu, Chao Yao, Weisi Lin

    Abstract: No-Reference Point Cloud Quality Assessment (NR-PCQA) still struggles with generalization, primarily due to the scarcity of annotated point cloud datasets. Since the Human Visual System (HVS) drives perceptual quality assessment independently of media types, prior knowledge on quality learned from images can be repurposed for point clouds. This insight motivates adopting Unsupervised Domain Adapta… ▽ More

    Submitted 16 March, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  12. arXiv:2603.01488  [pdf, ps, other

    cs.AI

    LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

    Authors: Chang Yao, Jinghui Qin, Kebing Jin, Hankz Hankui Zhuo

    Abstract: Despite achieving remarkable success in complex tasks, Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications, such as low data efficiency, lack of interpretability, and limited cross-environment transferability. However, the learned policy generating actions based on states are sensitive to the environmental changes, struggling to guarantee behavioral… ▽ More

    Submitted 7 March, 2026; v1 submitted 2 March, 2026; originally announced March 2026.

  13. arXiv:2602.22701  [pdf, ps, other

    cs.GR

    BRepMAE: Self-Supervised Masked BRep Autoencoders for Machining Feature Recognition

    Authors: Can Yao, Kang Wu, Zuheng Zheng, Siyuan Xing, Xiao-Ming Fu

    Abstract: We propose a masked self-supervised learning framework, called BRepMAE, for automatically extracting a valuable representation of the input computer-aided design (CAD) model to recognize its machining features. Representation learning is conducted on a large-scale, unlabeled CAD model dataset using the geometric Attributed Adjacency Graph (gAAG) representation, derived from the boundary representa… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

    Comments: 16 pages

  14. arXiv:2602.21188  [pdf, ps, other

    cs.CV

    Human Video Generation from a Single Image with 3D Pose and View Control

    Authors: Tiantian Wang, Chun-Han Yao, Tao Hu, Mallikarjun Byrasandra Ramalinga Reddy, Ming-Hsuan Yang, Varun Jampani

    Abstract: Recent diffusion methods have made significant progress in generating videos from single images due to their powerful visual generation capabilities. However, challenges persist in image-to-video synthesis, particularly in human video generation, where inferring view-consistent, motion-dependent clothing wrinkles from a single image remains a formidable problem. In this paper, we present Human Vid… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

  15. arXiv:2602.18329  [pdf, ps, other

    cs.CV math.AT

    G-LoG Bi-filtration for Medical Image Classification

    Authors: Qingsong Wang, Jiaxing He, Bingzhe Hou, Tieru Wu, Yang Cao, Cailing Yao

    Abstract: Building practical filtrations on objects to detect topological and geometric features is an important task in the field of Topological Data Analysis (TDA). In this paper, leveraging the ability of the Laplacian of Gaussian operator to enhance the boundaries of medical images, we define the G-LoG (Gaussian-Laplacian of Gaussian) bi-filtration to generate the features more suitable for multi-parame… ▽ More

    Submitted 20 February, 2026; originally announced February 2026.

    MSC Class: 55N31; 68T09

  16. arXiv:2602.17936  [pdf, ps, other

    math.NA

    Optimal error estimate of an isoparametric upwind discontinuous Galerkin method for radiation transport equation on curved domains

    Authors: Changhui Yao, Yunpan Ma, Lingxiao Li

    Abstract: This work investigates the isoparametric upwind discontinuous Galerkin method for solving the radiation transport equation defined on a bounded domain $D$ with a piecewise $C^{k+1}$ smooth curved boundary. An auxiliary mapping is constructed to approximate the original curved domain. The analysis delineates a high-order optimal convergence rate under the DG norm, which comprehensively balances the… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

  17. arXiv:2602.17011  [pdf, ps, other

    cs.MM

    CAFE: Channel-Autoregressive Factorized Encoding for Robust Biosignal Spatial Super-Resolution

    Authors: Hongjun Liu, Leyu Zhou, Zijianghao Yang, Rujun Han, Shitong Duan, Kuanjian Tang, Chao Yao

    Abstract: High-density biosignal recordings are critical for neural decoding and clinical monitoring, yet real-world deployments often rely on low-density (LD) montages due to hardware and operational constraints. This motivates spatial super-resolution from LD observations, but heterogeneous dependencies under sparse and noisy measurements often lead to artifact propagation and false non-local correlations… ▽ More

    Submitted 18 February, 2026; originally announced February 2026.

  18. arXiv:2602.13933  [pdf, ps, other

    cs.AI

    HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

    Authors: Xiaochen Zhao, Kaikai Wang, Xiaowen Zhang, Chen Yao, Aili Wang

    Abstract: Large language model (LLM) agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational… ▽ More

    Submitted 14 February, 2026; originally announced February 2026.

  19. arXiv:2602.11570  [pdf, ps, other

    cs.CL

    PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering

    Authors: Xiangfeng Wang, Hangyu Guo, Yanlin Lai, Mitt Huang, Liang Zhao, Chengyuan Yao, Yinmin Zhang, Qi Han, Xiaoxiao Ren, Chun Yuan, Tong Xu, Zheng Ge, Xiangyu Zhang, Daxin Jiang

    Abstract: While model-based verifiers are essential for scaling Reinforcement Learning with Verifiable Rewards (RLVR), current outcome-centric verification paradigms primarily focus on the consistency between the final result and the ground truth, often neglecting potential errors in the derivation process. This leads to assigning positive rewards to correct answers produced from incorrect derivations. To b… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

  20. arXiv:2602.10604  [pdf, ps, other

    cs.CL cs.AI

    Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

    Authors: Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao, Changyi Wan, Chao Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengting Feng, Chengyuan Yao, Chunrui Han, Dan Ma, Dapeng Shi, Daxin Jiang, Dehua Ma, Deshan Sun , et al. (191 additional authors not shown)

    Abstract: We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/f… ▽ More

    Submitted 23 February, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

    Comments: Technical report for Step 3.5 Flash

  21. arXiv:2602.06763  [pdf, ps, other

    cs.CL

    R-Align: Enhancing Generative Reward Models through Rationale-Centric Meta-Judging

    Authors: Yanlin Lai, Mitt Huang, Hangyu Guo, Xiangfeng Wang, Haodong Li, Shaoxiong Zhan, Liang Zhao, Chengyuan Yao, Yinmin Zhang, Qi Han, Chun Yuan, Zheng Ge, Xiangyu Zhang, Daxin Jiang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) remains indispensable for aligning large language models (LLMs) in subjective domains. To enhance robustness, recent work shifts toward Generative Reward Models (GenRMs) that generate rationales before predicting preferences. Yet in GenRM training and evaluation, practice remains outcome-label-only, leaving reasoning quality unchecked. We show that… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

    Comments: Github: https://github.com/lyn22333/R-Align Huggingface: https://huggingface.co/collections/lyn22333/r-align

  22. arXiv:2601.22045  [pdf, ps, other

    cs.CV

    Urban Neural Surface Reconstruction from Constrained Sparse Aerial Imagery with 3D SAR Fusion

    Authors: Da Li, Chen Yao, Tong Mao, Jiacheng Bao, Houjun Sun

    Abstract: Neural surface reconstruction (NSR) has recently shown strong potential for urban 3D reconstruction from multi-view aerial imagery. However, existing NSR methods often suffer from geometric ambiguity and instability, particularly under sparse-view conditions. This issue is critical in large-scale urban remote sensing, where aerial image acquisition is limited by flight paths, terrain, and cost. To… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

  23. arXiv:2601.18489  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Electrostatic Screening Modulation of Graphene's Electronic Structure and the Helical Wavefunction Dominated Topological Properties

    Authors: Yaorui Tan, Xiang Chen, Yunhu Zhu, Xiaowu Yang, Zhongkai Huang, Chuang Yao, Maolin Bo

    Abstract: This study examines electrostatic screening effects in graphene using tight binding calculations based on the Binding energy and Bond Charge model and a modified version of it. The results indicate that the modified BBC potential decays in an exponential manner with distance, which suppresses electron electron interactions. The hopping integrals exhibit a pronounced decrease over distance and shif… ▽ More

    Submitted 11 February, 2026; v1 submitted 26 January, 2026; originally announced January 2026.

  24. arXiv:2601.13747  [pdf, ps, other

    math.DG

    closed $\mathrm{G}_2$-structures with $\mathbb{T}^3$-symmetry and hypersymplectic structures

    Authors: Chengjian Yao, Ziyi Zhou

    Abstract: Closed $\mathrm{G}_2$-structures $\varphi$ with an effective $\mathbb{T}^3$-symmetry on connected manifolds are roughly classified into three types according to the evaluation of $\varphi$ on the principal orbits. Type 1: if there is neither associative nor isotropic orbit, then the action is free and $\varphi$ reduces to a hypersymplectic structure on the quotient manifold admitting three linearl… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

    Comments: 17 pages

  25. arXiv:2601.09668  [pdf, ps, other

    cs.CV

    STEP3-VL-10B Technical Report

    Authors: Ailin Huang, Chengyuan Yao, Chunrui Han, Fanqi Wan, Hangyu Guo, Haoran Lv, Hongyu Zhou, Jia Wang, Jian Zhou, Jianjian Sun, Jingcheng Hu, Kangheng Lin, Liang Zhao, Mitt Huang, Song Yuan, Wenwen Qu, Xiangfeng Wang, Yanlin Lai, Yingxiu Zhao, Yinmin Zhang, Yukang Shi, Yuyang Chen, Zejia Weng, Ziyang Meng, Ang Li , et al. (68 additional authors not shown)

    Abstract: We present STEP3-VL-10B, a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. STEP3-VL-10B is realized through two strategic shifts: first, a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens that integrates a language-aligned Perception Encoder with a Qwen3-8B decoder to establish… ▽ More

    Submitted 15 January, 2026; v1 submitted 14 January, 2026; originally announced January 2026.

    Comments: 50 pages

  26. arXiv:2601.08545  [pdf, ps, other

    cs.AI cs.CL cs.SE

    Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement

    Authors: Zhenlong Dai, Zhuoluo Zhao, Hengning Wang, Xiu Tang, Sai Wu, Chang Yao, Zhipeng Gao, Jingyuan Chen

    Abstract: With the development of large language models (LLMs) in the field of programming, intelligent programming coaching systems have gained widespread attention. However, most research focuses on repairing the buggy code of programming learners without providing the underlying causes of the bugs. To address this gap, we introduce a novel task, namely LRP (Learner-Tailored Program Repair). We then propo… ▽ More

    Submitted 18 January, 2026; v1 submitted 13 January, 2026; originally announced January 2026.

    Comments: Accepted by AAAI2026 main track

  27. arXiv:2601.05593  [pdf, ps, other

    cs.LG

    PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

    Authors: Jingcheng Hu, Yinmin Zhang, Shijie Shang, Xiaobo Yang, Yue Peng, Zhewei Huang, Hebin Zhou, Xin Wu, Jie Cheng, Fanqi Wan, Xiangwen Kong, Chengyuan Yao, Kaiwen Yan, Ailin Huang, Hongyu Zhou, Qi Han, Zheng Ge, Daxin Jiang, Xiangyu Zhang, Heung-Yeung Shum

    Abstract: We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window. PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a… ▽ More

    Submitted 9 January, 2026; originally announced January 2026.

  28. arXiv:2512.22431  [pdf, ps, other

    cs.AI cs.CL cs.FL

    Monadic Context Engineering

    Authors: Yifan Zhang, Yang Yuan, Mengdi Wang, Andrew Chi-Chih Yao

    Abstract: The proliferation of Large Language Models (LLMs) has catalyzed a shift towards autonomous agents capable of complex reasoning and tool use. However, current agent architectures are frequently constructed using imperative, ad hoc patterns. This results in brittle systems plagued by difficulties in state management, error handling, and concurrency. This paper introduces Monadic Context Engineering… ▽ More

    Submitted 21 January, 2026; v1 submitted 26 December, 2025; originally announced December 2025.

  29. arXiv:2512.11334  [pdf, ps, other

    cs.LG

    Spectral entropy prior-guided deep feature fusion architecture for magnetic core loss

    Authors: Cong Yao, Chunye Gong, Jin Zhang

    Abstract: Accurate core loss modeling is critical for the design of high-efficiency power electronic systems. Traditional core loss modeling methods have limitations in prediction accuracy. To advance this field, the IEEE Power Electronics Society launched the MagNet Challenge in 2023, the first international competition focused on data-driven power electronics design methods, aiming to uncover complex loss… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  30. arXiv:2512.10360  [pdf, ps, other

    cs.RO

    CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Jinlong Li, Ruihao Xia, Mengxian Hu, Chenpeng Yao, Chengju Liu, Yang Tang, Qijun Chen

    Abstract: Vision-and-Language Navigation (VLN) requires robots to follow natural language instructions and navigate complex environments without prior maps. While recent vision-language large models demonstrate strong reasoning abilities, they often underperform task-specific panoramic small models in VLN tasks. To address this, we propose CLASH (Collaborative Large-Small Hierarchy), a VLN-CE framework that… ▽ More

    Submitted 23 January, 2026; v1 submitted 11 December, 2025; originally announced December 2025.

  31. arXiv:2512.07805  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Group Representational Position Encoding

    Authors: Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew Chi-Chih Yao

    Abstract: We present GRAPE (Group Representational Position Encoding), a unified framework for positional encoding based on group actions. GRAPE unifies two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in $\operatorname{SO}(d)$ and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group $\mathrm{GL}$. In Multiplicative GRAPE, a po… ▽ More

    Submitted 1 April, 2026; v1 submitted 8 December, 2025; originally announced December 2025.

    Comments: Published in ICLR 2026; Project Page: https://github.com/model-architectures/GRAPE

  32. arXiv:2512.01084  [pdf

    cond-mat.mtrl-sci

    Aging-driven in situ polymerization of FEC additive boosts the calendar-life of silicon anodes via surface passivation enhancement

    Authors: Sattajit Barua, Rownak J. Mou, Koffi P. C. Yao

    Abstract: The role of additives such as FEC in extending the calendar life of silicon anodes beyond the cycling benefits is still not fully understood. Herein, the calendar life of high-loading Si (80 wt%) using baseline 1.2 M LiPF6 in EC-EMC electrolyte versus adding 10 wt% FEC is investigated over months. Over 8 days of aging, FEC leads to a 13-fold reduction in irreversible capacity loss in Si-LiFePO4 fu… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: 7 Main Manuscript and 9 Supporting Information figures. Work funded by U.S Department of Energy. Li-Ion battery material research. Silicon anodes. Calendar Aging of Silicon Anodes in presence of electrolyte additives. Novel insights into the calendar aging of silicon due to internal passivation via polymerization of Fluoroethylene Carbonate during open-circuit aging

  33. arXiv:2511.18761  [pdf, ps, other

    cs.MA

    Think How Your Teammates Think: Active Inference Can Benefit Decentralized Execution

    Authors: Hao Wu, Shoucheng Song, Chang Yao, Sheng Han, Huaiyu Wan, Youfang Lin, Kai Lv

    Abstract: In multi-agent systems, explicit cognition of teammates' decision logic serves as a critical factor in facilitating coordination. Communication (i.e., ``\textit{Tell}'') can assist in the cognitive development process by information dissemination, yet it is inevitably subject to real-world constraints such as noise, latency, and attacks. Therefore, building the understanding of teammates' decision… ▽ More

    Submitted 23 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026

  34. arXiv:2511.15848  [pdf, ps, other

    cs.AI cs.CL cs.SD

    Step-Audio-R1 Technical Report

    Authors: Fei Tian, Xiangyu Tony Zhang, Yuxin Zhang, Haoyang Zhang, Yuxin Li, Daijiao Liu, Yayue Deng, Donghang Wu, Jun Chen, Liang Zhao, Chengyuan Yao, Hexin Liu, Eng Siong Chng, Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Gang Yu

    Abstract: Recent advances in reasoning models have demonstrated remarkable success in text and vision domains through extended chain-of-thought deliberation. However, a perplexing phenomenon persists in audio language models: they consistently perform better with minimal or no reasoning, raising a fundamental question - can audio intelligence truly benefit from deliberate thinking? We introduce Step-Audio-R… ▽ More

    Submitted 26 November, 2025; v1 submitted 19 November, 2025; originally announced November 2025.

    Comments: 22 pages, 5 figures. Technical Report

    ACM Class: I.2.7; I.2.6; H.5.5

  35. arXiv:2511.11245  [pdf, ps, other

    cs.LG

    Heterogeneous Attributed Graph Learning via Neighborhood-Aware Star Kernels

    Authors: Hong Huang, Chengyu Yao, Haiming Chen, Hang Gao

    Abstract: Attributed graphs, typically characterized by irregular topologies and a mix of numerical and categorical attributes, are ubiquitous in diverse domains such as social networks, bioinformatics, and cheminformatics. While graph kernels provide a principled framework for measuring graph similarity, existing kernel methods often struggle to simultaneously capture heterogeneous attribute semantics and… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  36. arXiv:2510.21571  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG

    Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

    Authors: Qixiu Li, Yu Deng, Yaobo Liang, Lin Luo, Lei Zhou, Chengtang Yao, Lingqi Zeng, Zhiyuan Feng, Huizhi Liang, Sicheng Xu, Yizhong Zhang, Xi Chen, Hao Chen, Lily Sun, Dong Chen, Jiaolong Yang, Baining Guo

    Abstract: This paper presents a novel approach for pretraining robotic manipulation Vision-Language-Action (VLA) models using a large corpus of unscripted real-life video recordings of human hand activities. Treating human hand as dexterous robot end-effector, we show that "in-the-wild" egocentric human videos without any annotations can be transformed into data formats fully aligned with existing robotic V… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: Project page: https://microsoft.github.io/VITRA/

  37. arXiv:2510.19375  [pdf, ps, other

    math.CV

    Diffeomorphic solutions of Ahlfors-Hopf equations

    Authors: Gaven Martin, Cong Yao

    Abstract: Here we advance the study of boundary the value problem for extremal functions of mean distortion and the associated TeichmĂ¼ller spaces interpolating between the classical examples of extremal quasiconformal mappings, and the more recent approach through harmonic mappings (of extreme Dirichlet energy). In this paper we focus on the Alhfors-Hopf differential \[ Φ=\mathcal{A}(\mathbb{K}(w,h))h_w\,\o… ▽ More

    Submitted 8 January, 2026; v1 submitted 22 October, 2025; originally announced October 2025.

    Comments: 22 pages, 5 figures

    MSC Class: 30C62 31A05 49J10

  38. arXiv:2510.19166  [pdf, ps, other

    cs.MM

    Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution

    Authors: Hongjun Liu, Leyu Zhou, Zijianghao Yang, Chao Yao

    Abstract: For real-world BCI applications, lightweight Electroencephalography (EEG) systems offer the best cost-deployment balance. However, such spatial sparsity of EEG limits spatial fidelity, hurting learning and introducing bias. EEG spatial super-resolution methods aim to recover high-density EEG signals from sparse measurements, yet is often hindered by distribution shift and signal distortion and thu… ▽ More

    Submitted 22 February, 2026; v1 submitted 21 October, 2025; originally announced October 2025.

    Comments: ICLR 2026 Conference Paper

    MSC Class: 68T07 ACM Class: I.2.6

  39. arXiv:2510.15347  [pdf, ps, other

    eess.IV cs.MM

    Symmetric Entropy-Constrained Video Coding for Machines

    Authors: Yuxiao Sun, Meiqin Liu, Chao Yao, Qi Tang, Jian Jin, Weisi Lin, Frederic Dufaux, Yao Zhao

    Abstract: As video transmission increasingly serves machine vision systems (MVS) instead of human vision systems (HVS), video coding for machines (VCM) has become a critical research topic. Existing VCM methods often bind codecs to specific downstream models, requiring retraining or supervised data, thus limiting generalization in multi-task scenarios. Recently, unified VCM frameworks have employed visual b… ▽ More

    Submitted 31 October, 2025; v1 submitted 17 October, 2025; originally announced October 2025.

    Comments: This paper is submitted to the IEEE Transactions

  40. arXiv:2510.13131  [pdf, ps, other

    cs.CV cs.MM

    OS-HGAdapter: Open Semantic Hypergraph Adapter for Large Language Models Assisted Entropy-Enhanced Image-Text Alignment

    Authors: Rongjun Chen, Chengsi Yao, Jinchang Ren, Xianxian Zeng, Peixian Wang, Jun Yuan, Jiawen Li, Huimin Zhao, Xu Lu

    Abstract: Text-image alignment constitutes a foundational challenge in multimedia content understanding, where effective modeling of cross-modal semantic correspondences critically enhances retrieval system performance through joint embedding space optimization. Given the inherent difference in information entropy between texts and images, conventional approaches often show an imbalance in the mutual retrie… ▽ More

    Submitted 15 October, 2025; originally announced October 2025.

  41. arXiv:2510.08271  [pdf, ps, other

    cs.GR cs.CV cs.LG

    SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

    Authors: Andreas Engelhardt, Mark Boss, Vikram Voleti, Chun-Han Yao, Hendrik P. A. Lensch, Varun Jampani

    Abstract: We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional steps to enable… ▽ More

    Submitted 1 November, 2025; v1 submitted 9 October, 2025; originally announced October 2025.

    Comments: Accepted by International Conference on Computer Vision (ICCV 2025). Project page: http://svim3d.aengelhardt.com

  42. arXiv:2510.03889  [pdf, ps, other

    cond-mat.str-el

    Lattice Translation Modulated Symmetries and TFTs

    Authors: Ching-Yu Yao

    Abstract: Modulated symmetries are internal symmetries that are not invariant under spacetime symmetry actions. We propose a general way to describe the lattice translation modulated symmetries in 1+1D, including the non-invertible ones, via the tensor network language. We demonstrate that the modulations can be described by some autoequivalences of the categories. Although the topological behaviors are bro… ▽ More

    Submitted 8 December, 2025; v1 submitted 4 October, 2025; originally announced October 2025.

    Comments: 26 pages, 6 figures

  43. arXiv:2509.24872  [pdf

    physics.geo-ph

    U-SWIFT: A Unified Surface Wave Inversion Framework with Transformer via Normalization of Dispersion Curves

    Authors: Tianjian Cheng, Hongrui Xu, Jiayu Feng, Xiongyu Hu, Chaofan Yao

    Abstract: Deep learning is an increasingly popular approach for inverting surface wave dispersion curves to obtain Vs profiles. However, its generalizability is constrained by the depth and velocity scales of training data. We propose a unified deep learning framework that overcomes this limitation via normalization of dispersion curves. By leveraging the scaling properties of dispersion curves, our approac… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 27 pages, 10 figures, 4 tables. Under review at a peer-reviewed journal

  44. arXiv:2509.24203  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends

    Authors: Chaorui Yao, Yanxi Chen, Yuchang Sun, Yushuo Chen, Wenhao Zhang, Xuchen Pan, Yaliang Li, Bolin Ding

    Abstract: Off-policy reinforcement learning (RL) for large language models (LLMs) is attracting growing interest, driven by practical constraints in real-world applications, the complexity of LLM-RL infrastructure, and the need for further innovations of RL methodologies. While classic REINFORCE and its modern variants like Group Relative Policy Optimization (GRPO) are typically regarded as on-policy algori… ▽ More

    Submitted 1 March, 2026; v1 submitted 28 September, 2025; originally announced September 2025.

    Comments: Accepted to ICLR 2026. arXiv v2 update: add references and experiments

  45. arXiv:2509.19336  [pdf, ps, other

    cs.CL cs.AI

    Cognitive-Level Adaptive Generation via Capability-Aware Retrieval and Style Adaptation

    Authors: Qingsong Wang, Tao Wu, Wang Lin, Yueying Feng, Gongsheng Yuan, Chang Yao, Jingyuan Chen

    Abstract: Large Language Models (LLMs) have demonstrated strong performance in open-ended generation tasks. However, they often struggle to adapt content to users with differing cognitive capacities, leading to a phenomenon we term cognitive misalignment. This issue arises in two forms: knowledge-level misalignment, where content is too complex or too simplistic relative to user understanding, and presentat… ▽ More

    Submitted 15 September, 2025; originally announced September 2025.

    Comments: Accepted to Findings of EMNLP 2026

  46. arXiv:2509.16136  [pdf, ps, other

    cs.RO

    Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning

    Authors: Changwei Yao, Xinzi Liu, Chen Li, Marios Savvides

    Abstract: Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, w… ▽ More

    Submitted 24 March, 2026; v1 submitted 19 September, 2025; originally announced September 2025.

  47. arXiv:2509.10687  [pdf, ps, other

    cs.CV

    Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

    Authors: Hao Zhang, Chun-Han Yao, Simon Donné, Narendra Ahuja, Varun Jampani

    Abstract: We present Stable Part Diffusion 4D (SP4D), a framework for generating paired RGB and kinematic part videos from monocular inputs. Unlike conventional part segmentation methods that rely on appearance-based semantic cues, SP4D learns to produce kinematic parts - structural components aligned with object articulation and consistent across views and time. SP4D adopts a dual-branch diffusion model th… ▽ More

    Submitted 4 November, 2025; v1 submitted 12 September, 2025; originally announced September 2025.

    Comments: Page: https://stablepartdiffusion4d.github.io/

  48. Powering Job Search at Scale: LLM-Enhanced Query Understanding in Job Matching Systems

    Authors: Ping Liu, Jianqiang Shen, Qianqi Shen, Chunnan Yao, Kevin Kao, Dan Xu, Rajat Arora, Baofen Zheng, Caleb Johnson, Liangjie Hong, Jingwei Wu, Wenjing Zhang

    Abstract: Query understanding is essential in modern relevance systems, where user queries are often short, ambiguous, and highly context-dependent. Traditional approaches often rely on multiple task-specific Named Entity Recognition models to extract structured facets as seen in job search applications. However, this fragmented architecture is brittle, expensive to maintain, and slow to adapt to evolving t… ▽ More

    Submitted 19 August, 2025; originally announced September 2025.

    Comments: CIKM2025

  49. arXiv:2509.09066  [pdf

    cs.AI

    Instructional Prompt Optimization for Few-Shot LLM-Based Recommendations on Cold-Start Users

    Authors: Haowei Yang, Yushang Zhao, Sitao Min, Bo Su, Chao Yao, Wei Xu

    Abstract: The cold-start user issue further compromises the effectiveness of recommender systems in limiting access to the historical behavioral information. It is an effective pipeline to optimize instructional prompts on a few-shot large language model (LLM) used in recommender tasks. We introduce a context-conditioned prompt formulation method P(u,\ Ds)\ \rightarrow\ R\widehat, where u is a cold-start us… ▽ More

    Submitted 10 September, 2025; originally announced September 2025.

  50. arXiv:2509.06561  [pdf

    cond-mat.mtrl-sci cond-mat.str-el

    Silicon-Compatible Ionic Control over Multi-State Magnetoelectric Phase Transformations in Correlated Oxide System

    Authors: Xuanchi Zhou, Jiahui Ji, Wentian Lu, Huihui Ji, Chunwei Yao, Xiaohui Yao, Xiaomei Qiao, Guowei Zhou, Xiaohong Xu

    Abstract: Realizing room-temperature ferromagnetic insulators, critical enablers for low-power spintronics, is fundamentally challenged by the long-standing trade-off between ferromagnetic ordering and indirect exchange interactions in insulators. Ionic evolution offers tempting opportunities for accessing exotic magnetoelectric states and physical functionality beyond conventional doping paradigm via tailo… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.