Skip to main content

Showing 1–50 of 3,087 results for author: Zhang, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.19433  [pdf, ps, other

    cs.CV

    dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models

    Authors: Yi Xin, Siqi Luo, Qi Qin, Haoxing Chen, Kaiwen Zhu, Zhiwei Zhang, Yangfan He, Rongchao Zhang, Jinbin Bai, Shuo Cao, Bin Fu, Junjun He, Yihao Liu, Yuewen Cao, Xiaohong Liu

    Abstract: Diffusion Multi-modal Large Language Models (dMLLMs) have recently emerged as a novel architecture unifying image generation and understanding. However, developing effective and efficient Test-Time Scaling (TTS) methods to unlock their full generative potential remains an underexplored challenge. To address this, we propose dMLLM-TTS, a novel framework operating on two complementary scaling axes:… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

    Comments: Project page: https://github.com/Alpha-VLLM/Lumina-DiMOO

  2. arXiv:2512.18595  [pdf, ps, other

    cs.LG

    Benchmarking neural surrogates on realistic spatiotemporal multiphysics flows

    Authors: Runze Mao, Rui Zhang, Xuan Bai, Tianhao Wu, Teng Zhang, Zhenyi Chen, Minqi Lin, Bocheng Zeng, Yangchen Xu, Yingxuan Xiang, Haoze Zhang, Shubham Goswami, Pierre A. Dawe, Yifan Xu, Zhenhua An, Mengtao Yan, Xiaoyi Lu, Yi Wang, Rongbo Bai, Haobu Gao, Xiaohang Fang, Han Li, Hao Sun, Zhi X. Chen

    Abstract: Predicting multiphysics dynamics is computationally expensive and challenging due to the severe coupling of multi-scale, heterogeneous physical processes. While neural surrogates promise a paradigm shift, the field currently suffers from an "illusion of mastery", as repeatedly emphasized in top-tier commentaries: existing evaluations overly rely on simplified, low-dimensional proxies, which fail t… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: 52 pages, 20 figures. Code and data available at https://github.com/deepflame-ai/REALM. Companion website and leaderboard at https://realm-bench.org

  3. arXiv:2512.18582  [pdf, ps, other

    cs.NI

    Wireless Copilot: An AI-Powered Partner for Navigating Next-Generation Wireless Complexity

    Authors: Haoxiang Luo, Ruichen Zhang, Yinqiu Liu, Gang Sun, Hongfang Yu, Dusit Niyato, Shiwen Mao, Dong In Kim

    Abstract: The sixth-generation (6G) of wireless networks introduces a level of operational complexity that exceeds the limits of traditional automation and manual oversight. This paper introduces the "Wireless Copilot", an AI-powered technical assistant designed to function as a collaborative partner for human network designers, engineers, and operators. We posit that by integrating Large Language Models (L… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

  4. arXiv:2512.18189  [pdf, ps, other

    cs.AI cs.SC

    NL2CA: Auto-formalizing Cognitive Decision-Making from Natural Language Using an Unsupervised CriticNL2LTL Framework

    Authors: Zihao Deng, Yijia Li, Renrui Zhang, Peijun Ye

    Abstract: Cognitive computing models offer a formal and interpretable way to characterize human's deliberation and decision-making, yet their development remains labor-intensive. In this paper, we propose NL2CA, a novel method for auto-formalizing cognitive decision-making rules from natural language descriptions of human experience. Different from most related work that exploits either pure manual or human… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

  5. arXiv:2512.18133  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Grad: Guided Relation Diffusion Generation for Graph Augmentation in Graph Fraud Detection

    Authors: Jie Yang, Rui Zhang, Ziyang Cheng, Dawei Cheng, Guang Yang, Bo Wang

    Abstract: Nowadays, Graph Fraud Detection (GFD) in financial scenarios has become an urgent research topic to protect online payment security. However, as organized crime groups are becoming more professional in real-world scenarios, fraudsters are employing more sophisticated camouflage strategies. Specifically, fraudsters disguise themselves by mimicking the behavioral data collected by platforms, ensurin… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: Accepted by The Web Conference 2025 (WWW'25). 12 pages, includes implementation details. Code: https://github.com/AI4Risk/antifraud and https://github.com/Muyiiiiii/WWW25-Grad

    ACM Class: H.2.8; G.2.2

    Journal ref: Proceedings of the ACM Web Conference 2025 (WWW '25), April 28-May 2, 2025, Sydney, NSW, Australia

  6. arXiv:2512.17149  [pdf

    cs.HC

    Transformer-Based Modeling of User Interaction Sequences for Dwell Time Prediction in Human-Computer Interfaces

    Authors: Rui Liu, Runsheng Zhang, Shixiao Wang

    Abstract: This study investigates the task of dwell time prediction and proposes a Transformer framework based on interaction behavior modeling. The method first represents user interaction sequences on the interface by integrating dwell duration, click frequency, scrolling behavior, and contextual features, which are mapped into a unified latent space through embedding and positional encoding. On this basi… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  7. arXiv:2512.15624  [pdf, ps, other

    cs.CE physics.comp-ph physics.data-an stat.ME

    Nonparametric Stochastic Subspaces via the Bootstrap for Characterizing Model Error

    Authors: Akash Yadav, Ruda Zhang

    Abstract: Reliable forward uncertainty quantification in engineering requires methods that account for aleatory and epistemic uncertainties. In many applications, epistemic effects arising from uncertain parameters and model form dominate prediction error and strongly influence engineering decisions. Because distinguishing and representing each source separately is often infeasible, their combined effect is… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  8. arXiv:2512.15069  [pdf, ps, other

    cs.CV cs.AI

    PMMD: A pose-guided multi-view multi-modal diffusion for person generation

    Authors: Ziyu Shang, Haoran Liu, Rongchao Zhang, Zhiqian Wei, Tongtong Feng

    Abstract: Generating consistent human images with controllable pose and appearance is essential for applications in virtual try on, image editing, and digital human creation. Current methods often suffer from occlusions, garment style drift, and pose misalignment. We propose Pose-guided Multi-view Multimodal Diffusion (PMMD), a diffusion framework that synthesizes photorealistic person images conditioned on… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  9. Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

    Authors: Chenxiao Zhang, Runshi Zhang, Junchen Wang

    Abstract: Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key component of the computer-assisted surgery workflow. The low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary, which may lead to small object losses and increase bounda… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: Chenxiao Zhang and Runshi Zhang contributed equally to this work. 14 pages, 11 figures

    Journal ref: Medical Image Analysis 2026

  10. arXiv:2512.15044  [pdf, ps, other

    cs.AI cs.NI

    Agentic AI for Integrated Sensing and Communication: Analysis, Framework, and Case Study

    Authors: Wenwen Xie, Geng Sun, Ruichen Zhang, Xuejie Liu, Yinqiu Liu, Jiacheng Wang, Dusit Niyato, Ping Zhang

    Abstract: Integrated sensing and communication (ISAC) has emerged as a key development direction in the sixth-generation (6G) era, which provides essential support for the collaborative sensing and communication of future intelligent networks. However, as wireless environments become increasingly dynamic and complex, ISAC systems require more intelligent processing and more autonomous operation to maintain… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  11. arXiv:2512.15000  [pdf, ps, other

    cs.LG cs.AI cs.CL

    DreamPRM-Code: Function-as-Step Process Reward Model with Label Correction for LLM Coding

    Authors: Ruiyi Zhang, Peijia Qin, Qi Cao, Pengtao Xie

    Abstract: Process Reward Models (PRMs) have become essential for improving Large Language Models (LLMs) via test-time scaling, yet their effectiveness in coding remains limited due to the lack of meaningful step decompositions in code and the noise of Monte-Carlo-generated partial labels. We propose DreamPRM-Code, a coding-focused PRM that treats functions as reasoning steps using a Chain-of-Function prompt… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  12. arXiv:2512.14946  [pdf, ps, other

    cs.OS cs.AI cs.LG

    EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving

    Authors: Shaoting Feng, Yuhan Liu, Hanchen Li, Xiaokun Chen, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, Jiayi Yao, Qizheng Zhang, Ganesh Ananthanarayanan, Junchen Jiang

    Abstract: Reusing KV cache is essential for high efficiency of Large Language Model (LLM) inference systems. With more LLM users, the KV cache footprint can easily exceed GPU memory capacity, so prior work has proposed to either evict KV cache to lower-tier storage devices, or compress KV cache so that more KV cache can be fit in the fast memory. However, prior work misses an important opportunity: jointly… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

  13. arXiv:2512.14944  [pdf, ps, other

    cs.CV

    Puzzle Curriculum GRPO for Vision-Centric Reasoning

    Authors: Ahmadreza Jeddi, Hakki Can Karaimer, Hue Nguyen, Zhongling Wang, Ke Zhao, Javad Rajabi, Ran Zhang, Raghav Goyal, Babak Taati, Radek Grzeszczuk

    Abstract: Recent reinforcement learning (RL) approaches like outcome-supervised GRPO have advanced chain-of-thought reasoning in Vision Language Models (VLMs), yet key issues linger: (i) reliance on costly and noisy hand-curated annotations or external verifiers; (ii) flat and sparse reward schemes in GRPO; and (iii) logical inconsistency between a chain's reasoning and its final answer. We present Puzzle C… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: Project page: https://pcgrpo.github.io

  14. arXiv:2512.14329  [pdf

    cs.CE cs.AI

    A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data

    Authors: Yanning Dai, Chenyu Tang, Ruizhi Zhang, Wenyu Yang, Yilan Zhang, Yuhui Wang, Junliang Chen, Xuhang Chen, Ruimou Xie, Yangyue Cao, Qiaoying Li, Jin Cao, Tao Li, Hubin Zhao, Yu Pan, Arokia Nathan, Xin Gao, Peter Smielewski, Shuo Gao

    Abstract: Dynamic prediction of locomotor capacity after stroke is crucial for tailoring rehabilitation, yet current assessments provide only static impairment scores and do not indicate whether patients can safely perform specific tasks such as slope walking or stair climbing. Here, we develop a data-physics hybrid generative framework that reconstructs an individual stroke survivor's neuromuscular control… ▽ More

    Submitted 16 December, 2025; originally announced December 2025.

    Comments: 26 pages, 6 figures

  15. Legitimizing, Developing, and Sustaining Feminist HCI in East Asia: Challenges and Opportunities

    Authors: Runhua Zhang, Ruyuan Wan, Jiaqi, Li, Daye Kang, Yigang Qin, Yijia Wang, Ziqi Pan, Tiffany Knearem, Huamin Qu, Xiaojuan Ma

    Abstract: Feminist HCI has been rapidly developing in East Asian contexts in recent years. The region's unique cultural and political backgrounds have contributed valuable, situated knowledge, revealing topics such as localized digital feminism practices, or women's complex navigation among social expectations. However, the very factors that ground these perspectives also create significant survival challen… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: The proposal was accepted by CHI2026 meet-up track; and will be published in Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26)

  16. arXiv:2512.12817  [pdf

    cs.HC cs.AI

    Decoding Human and AI Persuasion in National College Debate: Analyzing Prepared Arguments Through Aristotle's Rhetorical Principles

    Authors: Mengqian Wu, Jiayi Zhang, Raymond Z. Zhang

    Abstract: Debate has been widely adopted as a strategy to enhance critical thinking skills in English Language Arts (ELA). One important skill in debate is forming effective argumentation, which requires debaters to select supportive evidence from literature and construct compelling claims. However, the training of this skill largely depends on human coaching, which is labor-intensive and difficult to scale… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  17. arXiv:2512.12578  [pdf, ps, other

    quant-ph cs.CC cs.LG

    Scalable Quantum Error Mitigation with Neighbor-Informed Learning

    Authors: Zhenyu Chen, Bin Cheng, Minbo Gao, Xiaodie Lin, Ruiqi Zhang, Zhaohui Wei, Zhengfeng Ji

    Abstract: Noise in quantum hardware is the primary obstacle to realizing the transformative potential of quantum computing. Quantum error mitigation (QEM) offers a promising pathway to enhance computational accuracy on near-term devices, yet existing methods face a difficult trade-off between performance, resource overhead, and theoretical guarantees. In this work, we introduce neighbor-informed learning (N… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

  18. arXiv:2512.12487  [pdf, ps, other

    cs.CV

    More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

    Authors: Hoang Anh Just, Yifei Fan, Handong Zhao, Jiuxiang Gu, Ruiyi Zhang, Simon Jenni, Kushal Kafle, Ruoxi Jia, Jing Shi

    Abstract: Reinforcement learning from verifiable rewards (RLVR) has recently been extended from text-only LLMs to vision-language models (VLMs) to elicit long-chain multimodal reasoning. However, RLVR-trained VLMs still exhibit two persistent failure modes: inaccurate visual extraction (missing or hallucinating details) and logically inconsistent chains-of-thought, largely because verifiable signals supervi… ▽ More

    Submitted 13 December, 2025; originally announced December 2025.

  19. arXiv:2512.12175  [pdf, ps, other

    cs.AI

    Rethinking Label Consistency of In-Context Learning: An Implicit Transductive Label Propagation Perspective

    Authors: Haoyang Chen, Richong Zhang, Junfan Chen

    Abstract: Large language models (LLMs) perform in-context learning (ICL) with minimal supervised examples, which benefits various natural language processing (NLP) tasks. One of the critical research focus is the selection of prompt demonstrations. Current approaches typically employ retrieval models to select the top-K most semantically similar examples as demonstrations. However, we argue that existing me… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  20. arXiv:2512.12131  [pdf, ps, other

    cs.LG cs.DC

    BOOST: BOttleneck-Optimized Scalable Training Framework for Low-Rank Large Language Models

    Authors: Zhengyang Wang, Ziyue Liu, Ruijie Zhang, Avinash Maurya, Paul Hovland, Bogdan Nicolae, Franck Cappello, Zheng Zhang

    Abstract: The scale of transformer model pre-training is constrained by the increasing computation and communication cost. Low-rank bottleneck architectures offer a promising solution to significantly reduce the training time and memory footprint with minimum impact on accuracy. Despite algorithmic efficiency, bottleneck architectures scale poorly under standard tensor parallelism. Simply applying 3D parall… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  21. arXiv:2512.11686  [pdf, ps, other

    physics.comp-ph cs.LG

    Stable spectral neural operator for learning stiff PDE systems from limited data

    Authors: Rui Zhang, Han Wan, Yang Liu, Hao Sun

    Abstract: Accurate modeling of spatiotemporal dynamics is crucial to understanding complex phenomena across science and engineering. However, this task faces a fundamental challenge when the governing equations are unknown and observational data are sparse. System stiffness, the coupling of multiple time-scales, further exacerbates this problem and hinders long-term prediction. Existing methods fall short:… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

  22. arXiv:2512.10954  [pdf, ps, other

    cs.CV

    Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration

    Authors: Sicheng Mo, Thao Nguyen, Richard Zhang, Nick Kolkin, Siddharth Srinivasan Iyer, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, Yuheng Li

    Abstract: In this work, we explore an untapped signal in diffusion model inference. While all previous methods generate images independently at inference, we instead ask if samples can be generated collaboratively. We propose Group Diffusion, unlocking the attention mechanism to be shared across images, rather than limited to just the patches within an image. This enables images to be jointly denoised at in… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

    Comments: Project Page: https://sichengmo.github.io/GroupDiff/

  23. arXiv:2512.10949  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

    Authors: Yiwen Tang, Zoey Guo, Kaixin Zhu, Ray Zhang, Qizhi Chen, Dongzhi Jiang, Junli Liu, Bohan Zeng, Haoming Song, Delin Qu, Tianyi Bai, Dan Xu, Wentao Zhang, Bin Zhao

    Abstract: Reinforcement learning (RL), earlier proven to be effective in large language and multi-modal models, has been successfully extended to enhance 2D image generation recently. However, applying RL to 3D generation remains largely unexplored due to the higher spatial complexity of 3D objects, which require globally consistent geometry and fine-grained local textures. This makes 3D generation signific… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

    Comments: Code is released at https://github.com/Ivan-Tang-3D/3DGen-R1

  24. arXiv:2512.10794  [pdf, ps, other

    cs.CV cs.AI cs.GR cs.LG stat.ML

    What matters for Representation Alignment: Global Information or Spatial Structure?

    Authors: Jaskirat Singh, Xingjian Leng, Zongze Wu, Liang Zheng, Richard Zhang, Eli Shechtman, Saining Xie

    Abstract: Representation alignment (REPA) guides generative training by distilling representations from a strong, pretrained vision encoder to intermediate diffusion features. We investigate a fundamental question: what aspect of the target representation matters for generation, its \textit{global} \revision{semantic} information (e.g., measured by ImageNet-1K accuracy) or its spatial structure (i.e. pairwi… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

    Comments: Project page: https://end2end-diffusion.github.io/irepa

  25. arXiv:2512.10046  [pdf, ps, other

    cs.AI

    SimWorld-Robotics: Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration

    Authors: Yan Zhuang, Jiawei Ren, Xiaokang Ye, Jianzhi Shen, Ruixuan Zhang, Tianai Yue, Muhammad Faayez, Xuhong He, Ziqiao Ma, Lianhui Qin, Zhiting Hu, Tianmin Shu

    Abstract: Recent advances in foundation models have shown promising results in developing generalist robotics that can perform diverse tasks in open-ended scenarios given multimodal inputs. However, current work has been mainly focused on indoor, household scenarios. In this work, we present SimWorld-Robotics~(SWR), a simulation platform for embodied AI in large-scale, photorealistic urban environments. Bui… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

    Comments: Conference: NeurIPS 2025 (main)

  26. arXiv:2512.08896  [pdf, ps, other

    cs.LG

    Open Polymer Challenge: Post-Competition Report

    Authors: Gang Liu, Sobin Alosious, Subhamoy Mahajan, Eric Inae, Yihan Zhu, Yuhan Liu, Renzheng Zhang, Jiaxin Xu, Addison Howard, Ying Li, Tengfei Luo, Meng Jiang

    Abstract: Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal condu… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

    Comments: The report for the competition: "NeurIPS - Open Polymer Prediction 2025". Kaggle Page: https://www.kaggle.com/competitions/neurips-open-polymer-prediction-2025. Website: https://open-polymer-challenge.github.io

  27. arXiv:2512.08754  [pdf, ps, other

    cs.RO

    A Multi-Robot Platform for Robotic Triage Combining Onboard Sensing and Foundation Models

    Authors: Jason Hughes, Marcel Hussing, Edward Zhang, Shenbagaraj Kannapiran, Joshua Caswell, Kenneth Chaney, Ruichen Deng, Michaela Feehery, Agelos Kratimenos, Yi Fan Li, Britny Major, Ethan Sanchez, Sumukh Shrote, Youkang Wang, Jeremy Wang, Daudi Zein, Luying Zhang, Ruijun Zhang, Alex Zhou, Tenzi Zhouga, Jeremy Cannon, Zaffir Qasim, Jay Yelon, Fernando Cladera, Kostas Daniilidis , et al. (2 additional authors not shown)

    Abstract: This report presents a heterogeneous robotic system designed for remote primary triage in mass-casualty incidents (MCIs). The system employs a coordinated air-ground team of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to locate victims, assess their injuries, and prioritize medical assistance without risking the lives of first responders. The UAV identify and provide overhe… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

    Comments: Technical Report for the DARPA Triage Challenge PRONTO team

  28. arXiv:2512.08674  [pdf, ps, other

    cs.AI cs.MA

    Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology

    Authors: Rongzhao Zhang, Junqiao Wang, Shuyun Yang, Mouxiao Bian, Chao Ding, Yuwei Bai, Chihao Zhang, Yuguang Shen, Lei Wang, Lei Zheng, Qiujuan Yan, Yun Zhong, Meiling Liu, Jiwei Yu, Zheng Wang, Jie Xu, Meng Luo

    Abstract: Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous me… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

  29. arXiv:2512.08564  [pdf, ps, other

    cs.CV

    Modular Neural Image Signal Processing

    Authors: Mahmoud Afifi, Zhongling Wang, Ran Zhang, Michael S. Brown

    Abstract: This paper presents a modular neural image signal processing (ISP) framework that processes raw inputs and renders high-quality display-referred images. Unlike prior neural ISP designs, our method introduces a high degree of modularity, providing full control over multiple intermediate stages of the rendering process.~This modular design not only achieves high rendering accuracy but also improves… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

  30. arXiv:2512.07410  [pdf, ps, other

    cs.CV

    InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs

    Authors: Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Lan Xu, Jingyi Yu, Jingya Wang

    Abstract: Humanoid agents are expected to emulate the complex coordination inherent in human social behaviors. However, existing methods are largely confined to single-agent scenarios, overlooking the physically plausible interplay essential for multi-agent interactions. To bridge this gap, we propose InterAgent, the first end-to-end framework for text-driven physics-based multi-agent humanoid control. At i… ▽ More

    Submitted 12 December, 2025; v1 submitted 8 December, 2025; originally announced December 2025.

    Comments: Project page: https://binlee26.github.io/InterAgent-Page

  31. arXiv:2512.07226  [pdf, ps, other

    eess.AS cs.SD

    Unsupervised Single-Channel Audio Separation with Diffusion Source Priors

    Authors: Runwu Shi, Chang Li, Jiang Wang, Rui Zhang, Nabeela Khan, Benjamin Yen, Takeshi Ashizawa, Kazuhiro Nakadai

    Abstract: Single-channel audio separation aims to separate individual sources from a single-channel mixture. Most existing methods rely on supervised learning with synthetically generated paired data. However, obtaining high-quality paired data in real-world scenarios is often difficult. This data scarcity can degrade model performance under unseen conditions and limit generalization ability. To this end, i… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

    Comments: 15 pages, 31 figures, accepted by The 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)

  32. arXiv:2512.06628  [pdf, ps, other

    cs.RO cs.CV

    MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

    Authors: Ruicheng Zhang, Mingyang Zhang, Jun Zhou, Zhangrui Guo, Xiaofan Liu, Zunnan Xu, Zhizhou Zhong, Puxin Yan, Haocheng Luo, Xiu Li

    Abstract: Embodied imitation learning is constrained by the scarcity of diverse, long-horizon robotic manipulation data. Existing video generation models for this domain are limited to synthesizing short clips of simple actions and often rely on manually defined trajectories. To this end, we introduce MIND-V, a hierarchical framework designed to synthesize physically plausible and logically coherent videos… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

  33. arXiv:2512.05251  [pdf, ps, other

    stat.ML cs.LG

    One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow

    Authors: Pascal Jutras-Dube, Jiaru Zhang, Ziran Wang, Ruqi Zhang

    Abstract: Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps to produce high-quality samples, leading to high computational costs. We introduce one-step diffusion samplers which learn a step-conditioned ODE so that one large step reproduces the trajectory of many small o… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

  34. arXiv:2512.05119  [pdf, ps, other

    cs.IR cs.AI cs.CL

    RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering

    Authors: Rongyang Zhang, Yuqing Huang, Chengqiang Lu, Qimeng Wang, Yan Gao, Yi Wu, Yao Hu, Yin Xu, Wei Wang, Hao Wang, Enhong Chen

    Abstract: In real-world scenarios, providing user queries with visually enhanced responses can considerably benefit understanding and memory, underscoring the great value of interleaved image-text generation. Despite recent progress, like the visual autoregressive model that unifies text and image processing in a single transformer architecture, generating high-quality interleaved content remains challengin… ▽ More

    Submitted 10 October, 2025; originally announced December 2025.

    Comments: 26 pages, 6 figures, NeurIPS 2025 D&B Track poster

  35. arXiv:2512.05112  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

    Authors: Dongzhi Jiang, Renrui Zhang, Haodong Li, Zhuofan Zong, Ziyu Guo, Jun He, Claire Guo, Junyan Ye, Rongyao Fang, Weijia Li, Rui Liu, Hongsheng Li

    Abstract: Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, existing approaches remain limited, either treating the model merely as a standalone generator or relying on abstract textual planning. To this end, we propose Draft-as-CoT (DraCo), a novel interleaved reasoning p… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

    Comments: Project Page: https://github.com/CaraJ7/DraCo

  36. arXiv:2512.04751  [pdf, ps, other

    cs.CE

    NAWOA-XGBoost: A Novel Model for Early Prediction of Academic Potential in Computer Science Students

    Authors: Junhao Wei, Yanzhao Gu, Ran Zhang, Mingjing Huang, Jinhong Song, Yanxiao Li, Wenxuan Zhu, Yapeng Wang, Zikun Li, Zhiwen Wang, Xu Yang, Ngai Cheong

    Abstract: Whale Optimization Algorithm (WOA) suffers from limited global search ability, slow convergence, and tendency to fall into local optima, restricting its effectiveness in hyperparameter optimization for machine learning models. To address these issues, this study proposes a Nonlinear Adaptive Whale Optimization Algorithm (NAWOA), which integrates strategies such as Good Nodes Set initialization, Le… ▽ More

    Submitted 5 December, 2025; v1 submitted 4 December, 2025; originally announced December 2025.

  37. arXiv:2512.04742  [pdf, ps, other

    cs.IT

    Rotatable Antenna-Enhanced Cell-Free Communication

    Authors: Kecheng Pan, Beixiong Zheng, Yanhua Tan, Emil Björnson, Robert Schober, Rui Zhang

    Abstract: Rotatable antenna (RA) is a promising technology that can exploit new spatial degrees-of-freedom (DoFs) by flexibly adjusting the three-dimensional (3D) boresight direction of antennas. In this letter, we investigate an RA-enhanced cell-free system for downlink transmission, where multiple RA-equipped access points (APs) cooperatively serve multiple single-antenna users over the same time-frequenc… ▽ More

    Submitted 12 December, 2025; v1 submitted 4 December, 2025; originally announced December 2025.

  38. arXiv:2512.03722  [pdf, ps, other

    cs.NI

    Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless Networks

    Authors: Lingyi Cai, Wenjie Fu, Yuxi Huang, Ruichen Zhang, Yinqiu Liu, Jiawen Kang, Zehui Xiong, Tao Jiang, Dusit Niyato, Xianbin Wang, Shiwen Mao, Xuemin Shen

    Abstract: Reinforcement Learning (RL) has shown remarkable success in enabling adaptive and data-driven optimization for various applications in wireless networks. However, classical RL suffers from limitations in generalization, learning feedback, interpretability, and sample efficiency in dynamic wireless environments. Large Language Models (LLMs) have emerged as a transformative Artificial Intelligence (… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: 30 pages, 12 figures, survey paper

  39. arXiv:2512.03672  [pdf, ps, other

    cs.CL

    Evaluating Hydro-Science and Engineering Knowledge of Large Language Models

    Authors: Shiruo Hu, Wenbo Shan, Yingjia Li, Zhiqi Wan, Xinpeng Yu, Yunjia Qi, Haotian Xia, Yang Xiao, Dingxiao Liu, Jiaru Wang, Chenxu Gong, Ruixi Zhang, Shuyue Wu, Shibo Cui, Chee Hui Lai, Wei Luo, Yubin He, Bin Xu, Jianshi Zhao

    Abstract: Hydro-Science and Engineering (Hydro-SE) is a critical and irreplaceable domain that secures human water supply, generates clean hydropower energy, and mitigates flood and drought disasters. Featuring multiple engineering objectives, Hydro-SE is an inherently interdisciplinary domain that integrates scientific knowledge with engineering expertise. This integration necessitates extensive expert col… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: Hydro-SE Bench sets a new benchmark for the evaluation of LLMs in the Hydro-Science and Engineering domain, with its code and data available at \url{https://github.com/sheishijun/Hydro-SE-Bench}

  40. CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation

    Authors: Ruoxuan Zhang, Bin Wen, Hongxia Xie, Yi Yao, Songhan Zuo, Jian-Yu Jiang-Lin, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Cooking is a sequential and visually grounded activity, where each step such as chopping, mixing, or frying carries both procedural logic and visual semantics. While recent diffusion models have shown strong capabilities in text-to-image generation, they struggle to handle structured multi-step scenarios like recipe illustration. Additionally, current recipe illustration methods are unable to adju… ▽ More

    Submitted 5 December, 2025; v1 submitted 3 December, 2025; originally announced December 2025.

    Comments: Accepted by ACM Multimedia 2025

  41. arXiv:2512.02920  [pdf, ps, other

    cs.LG cs.CV cs.SI

    Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation

    Authors: Ziniu Zhang, Minxuan Duan, Haris N. Koutsopoulos, Hongyang R. Zhang

    Abstract: We consider analyzing traffic accident patterns using both road network data and satellite images aligned to road graph nodes. Previous work for predicting accident occurrences relies primarily on road network structural features while overlooking physical and environmental information from the road surface and its surroundings. In this work, we construct a large multimodal dataset across six U.S.… ▽ More

    Submitted 17 December, 2025; v1 submitted 2 December, 2025; originally announced December 2025.

    Comments: 17 pages. To appear in KDD'26 Datasets

  42. arXiv:2512.02556  [pdf, ps, other

    cs.CL

    DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

    Authors: DeepSeek-AI, Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenhao Xu, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Erhang Li, Fangqi Zhou, Fangyun Lin, Fucong Dai, Guangbo Hao , et al. (239 additional authors not shown)

    Abstract: We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2)… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  43. arXiv:2512.02471  [pdf, ps, other

    q-bio.GN cs.AI

    scCluBench: Comprehensive Benchmarking of Clustering Algorithms for Single-Cell RNA Sequencing

    Authors: Ping Xu, Zaitian Wang, Zhirui Wang, Pengjiang Li, Jiajia Wang, Ran Zhang, Pengfei Wang, Yuanchun Zhou

    Abstract: Cell clustering is crucial for uncovering cellular heterogeneity in single-cell RNA sequencing (scRNA-seq) data by identifying cell types and marker genes. Despite its importance, benchmarks for scRNA-seq clustering methods remain fragmented, often lacking standardized protocols and failing to incorporate recent advances in artificial intelligence. To fill these gaps, we present scCluBench, a comp… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  44. arXiv:2512.02358  [pdf, ps, other

    cs.AI

    Beyond Playtesting: A Generative Multi-Agent Simulation System for Massively Multiplayer Online Games

    Authors: Ran Zhang, Kun Ouyang, Tiancheng Ma, Yida Yang, Dong Fang

    Abstract: Optimizing numerical systems and mechanism design is crucial for enhancing player experience in Massively Multiplayer Online (MMO) games. Traditional optimization approaches rely on large-scale online experiments or parameter tuning over predefined statistical models, which are costly, time-consuming, and may disrupt player experience. Although simplified offline simulation systems are often adopt… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  45. arXiv:2512.02284  [pdf, ps, other

    quant-ph cs.ET

    Quantum-Classical Separation in Bounded-Resource Tasks Arising from Measurement Contextuality

    Authors: Shashwat Kumar, Eliott Rosenberg, Alejandro Grajales Dau, Rodrigo Cortinas, Dmitri Maslov, Richard Oliver, Adam Zalcman, Matthew Neeley, Alice Pagano, Aaron Szasz, Ilya Drozdov, Zlatko Minev, Craig Gidney, Noureldin Yosri, Stijn J. de Graaf, Aniket Maiti, Dmitry Abanin, Rajeev Acharya, Laleh Aghababaie Beni, Georg Aigeldinger, Ross Alcaraz, Sayra Alcaraz, Trond I. Andersen, Markus Ansmann, Frank Arute , et al. (258 additional authors not shown)

    Abstract: The prevailing view is that quantum phenomena can be harnessed to tackle certain problems beyond the reach of classical approaches. Quantifying this capability as a quantum-classical separation and demonstrating it on current quantum processors has remained elusive. Using a superconducting qubit processor, we show that quantum contextuality enables certain tasks to be performed with success probab… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  46. arXiv:2512.02013  [pdf, ps, other

    cs.RO

    ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation

    Authors: Chenyang Gu, Jiaming Liu, Hao Chen, Runzhong Huang, Qingpo Wuwu, Zhuoyang Liu, Xiaoqi Li, Ying Li, Renrui Zhang, Peng Jia, Pheng-Ann Heng, Shanghang Zhang

    Abstract: Vision-Language-Action (VLA) models have recently emerged, demonstrating strong generalization in robotic scene understanding and manipulation. However, when confronted with long-horizon tasks that require defined goal states, such as LEGO assembly or object rearrangement, existing VLA models still face challenges in coordinating high-level planning with precise manipulation. Therefore, we aim to… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  47. arXiv:2512.01113  [pdf, ps, other

    cs.LG cs.AI cs.DS

    Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning

    Authors: Dongyue Li, Zhenshuo Zhang, Minxuan Duan, Edgar Dobriban, Hongyang R. Zhang

    Abstract: Algorithmic reasoning -- the ability to perform step-by-step logical inference -- has become a core benchmark for evaluating reasoning in graph neural networks (GNNs) and large language models (LLMs). Ideally, one would like to design a single model capable of performing well on multiple algorithmic reasoning tasks simultaneously. However, this is challenging when the execution steps of algorithms… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: 31 pages. Preprint, to appear in KDD'26

  48. arXiv:2512.01078  [pdf, ps, other

    cs.AI

    SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds

    Authors: Jiawei Ren, Yan Zhuang, Xiaokang Ye, Lingjun Mao, Xuhong He, Jianzhi Shen, Mrinaal Dogra, Yiming Liang, Ruixuan Zhang, Tianai Yue, Yiqing Yang, Eric Liu, Ryan Wu, Kevin Benavente, Rajiv Mandya Nagaraju, Muhammad Faayez, Xiyan Zhang, Dhruv Vivek Sharma, Xianrui Zhong, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin

    Abstract: While LLM/VLM-powered AI agents have advanced rapidly in math, coding, and computer use, their applications in complex physical and social environments remain challenging. Building agents that can survive and thrive in the real world (for example, by autonomously earning income or running a business) requires massive-scale interaction, reasoning, training, and evaluation across diverse embodied sc… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

  49. arXiv:2512.00262  [pdf, ps, other

    cs.RO cs.HC

    "Why the face?": Exploring Robot Error Detection Using Instrumented Bystander Reactions

    Authors: Maria Teresa Parreira, Ruidong Zhang, Sukruth Gowdru Lingaraju, Alexandra Bremers, Xuanyu Fang, Adolfo Ramirez-Aristizabal, Manaswi Saha, Michael Kuniavsky, Cheng Zhang, Wendy Ju

    Abstract: How do humans recognize and rectify social missteps? We achieve social competence by looking around at our peers, decoding subtle cues from bystanders - a raised eyebrow, a laugh - to evaluate the environment and our actions. Robots, however, struggle to perceive and make use of these nuanced reactions. By employing a novel neck-mounted device that records facial expressions from the chin region,… ▽ More

    Submitted 28 November, 2025; originally announced December 2025.

  50. arXiv:2512.00079  [pdf, ps, other

    cs.AR cs.AI cs.LG

    InF-ATPG: Intelligent FFR-Driven ATPG with Advanced Circuit Representation Guided Reinforcement Learning

    Authors: Bin Sun, Rengang Zhang, Zhiteng Chao, Zizhen Liu, Jianan Mu, Jing Ye, Huawei Li

    Abstract: Automatic test pattern generation (ATPG) is a crucial process in integrated circuit (IC) design and testing, responsible for efficiently generating test patterns. As semiconductor technology progresses, traditional ATPG struggles with long execution times to achieve the expected fault coverage, which impacts the time-to-market of chips. Recent machine learning techniques, like reinforcement learni… ▽ More

    Submitted 25 November, 2025; originally announced December 2025.

    Comments: 9 pages,6 figures