Skip to main content

Showing 1–50 of 381 results for author: Dai, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.03198  [pdf, ps, other

    cs.CV

    The Eleventh NTIRE 2026 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Hang Guo, Yan Shu, Jiaqi Ma, Ziteng Cui, Shuhong Liu, Guofeng Mei, Lei Sun, Zongwei Wu, Fahad Shahbaz Khan, Salman Khan, Radu Timofte, Yawei Li, Hongyuan Yu, Pufan Xu, Chen Wu, Long Peng, Jiaojiao Yi, Siyang Yi, Yuning Cui, Jingyuan Xia, Xing Mou, Keji He, Jinlin Wu, Zongang Gao , et al. (38 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2026 challenge on efficient single-image super-resolution with a focus on the proposed solutions and results. The aim of this challenge is to devise a network that reduces one or several aspects, such as runtime, parameters, and FLOPs, while maintaining PSNR of around 26.90 dB on the DIV2K_LSDIR_valid dataset, and 26.99 dB on the DIV2K_LSDIR_test dataset. The challenge… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: CVPR 2026 NTIRE Workshop Paper, Efficient Super Resolution Technical Report

  2. arXiv:2603.29227  [pdf, ps, other

    cs.RO

    Kernel-SDF: An Open-Source Library for Real-Time Signed Distance Function Estimation using Kernel Regression

    Authors: Zhirui Dai, Tianxing Fan, Mani Amani, Jaemin Seo, Ki Myung Brian Lee, Hyondong Oh, Nikolay Atanasov

    Abstract: Accurate and efficient environment representation is crucial for robotic applications such as motion planning, manipulation, and navigation. Signed distance functions (SDFs) have emerged as a powerful representation for encoding distance to obstacle boundaries, enabling efficient collision-checking and trajectory optimization techniques. However, existing SDF reconstruction methods have limitation… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  3. arXiv:2603.19224  [pdf, ps, other

    cs.CV

    EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

    Authors: Yang Fu, Yike Zheng, Ziyun Dai, Henghui Ding

    Abstract: Video object removal aims to eliminate dynamic target objects and their visual effects, such as deformation, shadows, and reflections, while restoring seamless backgrounds. Recent diffusion-based video inpainting and object removal methods can remove the objects but often struggle to erase these effects and to synthesize coherent backgrounds. Beyond method limitations, progress is further hampered… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: CVPR 2026, Project Page: https://henghuiding.com/EffectErase/

  4. arXiv:2603.13956  [pdf, ps, other

    cs.AI

    EviAgent: Evidence-Driven Agent for Radiology Report Generation

    Authors: Tuoshi Qi, Shenshen Bu, Yingfei Xiang, Zhiming Dai

    Abstract: Automated radiology report generation holds immense potential to alleviate the heavy workload of radiologists. Despite the formidable vision-language capabilities of recent Multimodal Large Language Models (MLLMs), their clinical deployment is severely constrained by inherent limitations: their "black-box" decision-making renders the generated reports untraceable due to the lack of explicit visual… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

  5. arXiv:2603.09478  [pdf, ps, other

    cs.MM

    MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning

    Authors: Xiang Yuan, Xu Chu, Xinrong Chen, Haochen Li, Zonghong Dai, Hongcheng Fan, Xiaoyue Yuan, Weiping Li, Tong Mo

    Abstract: Multimodal Object-Entity Relation Extraction (MORE) is a challenging task in information extraction research. It aims to identify relations between visual objects and textual entities, requiring complex multimodal understanding and cross-modal reasoning abilities. Existing methods, mainly classification-based or generation-based without reasoning, struggle to handle complex extraction scenarios in… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

    Comments: Accepted by the 31st International Conference on Database Systems for Advanced Applications. This is the Accepted Manuscript (AM) version

  6. arXiv:2603.06739  [pdf, ps, other

    cs.SE cs.AI

    ResearchEnvBench: Benchmarking Agents on Environment Synthesis for Research Code Execution

    Authors: Yubang Wang, Chenxi Zhang, Bowen Chen, Zezheng Huai, Zihao Dai, Xinchi Chen, Yuxin Wang, Yining Zheng, Jingjing Gong, Xipeng Qiu

    Abstract: Autonomous agents are increasingly expected to support scientific research, and recent benchmarks report progress in code repair and autonomous experimentation. However, these evaluations typically assume a pre-configured execution environment, which requires resolving complex software dependencies, aligning hardware and framework versions, and configuring distributed execution, yet this capabilit… ▽ More

    Submitted 11 March, 2026; v1 submitted 6 March, 2026; originally announced March 2026.

  7. arXiv:2603.06256  [pdf, ps, other

    cs.CV cs.AI

    GazeMoE: Perception of Gaze Target with Mixture-of-Experts

    Authors: Zhuangzhuang Dai, Zhongxi Lu, Vincent G. Zakka, Luis J. Manso, Jose M Alcaraz Calero, Chen Li

    Abstract: Estimating human gaze target from visible images is a critical task for robots to understand human attention, yet the development of generalizable neural architectures and training paradigms remains challenging. While recent advances in pre-trained vision foundation models offer promising avenues for locating gaze targets, the integration of multi-modal cues -- including eyes, head poses, gestures… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

    Comments: 8 pages, 3 figures, ICRA 2026

  8. arXiv:2603.06228  [pdf, ps, other

    cs.CV

    Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

    Authors: Haiqing Hao, Zhipeng Sui, Rong Zou, Zijia Dai, Nikola Zubić, Davide Scaramuzza, Wenhui Wang

    Abstract: Event cameras provide sequential visual data with spatial sparsity and high temporal resolution, making them attractive for low-latency object detection. Existing asynchronous event-based neural networks realize this low-latency advantage by updating predictions event-by-event, but still suffer from two bottlenecks: recurrent architectures are difficult to train efficiently on long sequences, and… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  9. arXiv:2603.02630  [pdf, ps, other

    cs.LG cs.AI

    MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

    Authors: Zhi Hong, Qian Zhang, Jiahang Sun, Zhiwei Shang, Mingze Kong, Xiangyi Wang, Yao Shu, Zhongxiang Dai

    Abstract: Large Language Models (LLMs) have achieved great success in many real-world applications, especially the one serving as the cognitive backbone of Multi-Agent Systems (MAS) to orchestrate complex workflows in practice. Since many deployment scenarios preclude MAS workflow modifications and its performance is highly sensitive to the input prompts, prompt optimization emerges as a more natural approa… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: Preprint

  10. arXiv:2603.01375  [pdf, ps, other

    cs.AI cs.LG

    Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

    Authors: Chenxing Wei, Hong Wang, Ying He, Zhongxiang Dai, Bo Jiang, F. Richard Yu, Yao Shu

    Abstract: Test-time policy adaptation for multi-turn interactions (T2PAM) is essential for aligning Large Language Models (LLMs) with dynamic user needs during inference time. However, existing paradigms commonly treat test-time adaptation as a single-axis problem, either purely refining instructions (Prompt Engineering) or only adjusting weights (Test-Time Training), ignoring that interaction failures stem… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

  11. arXiv:2603.00569  [pdf, ps, other

    cs.SE cs.LG cs.NI

    TopoEdge: Topology-Grounded Agentic Framework for Edge Networking Code Generation and Repair

    Authors: Haomin Qi, Bohan Liu, Zihan Dai, Yunkai Gao

    Abstract: TopoEdge is a topology-grounded, edge-deployable framework for end-to-end software-defined networking (SDN) configuration generation and repair, motivated by the brittleness of configuration artefacts under topology variation and by strict operational constraints on latency, privacy, and on-site execution. TopoEdge represents each target topology as a router-level graph and embeds it using a contr… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

    Comments: 6 pages, 4 figures, 3 tables

    Journal ref: IEEE Wireless Communications and Networking Conference (WCNC) 2026

  12. arXiv:2602.22441  [pdf, ps, other

    cs.AI cs.CL cs.LG

    How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?

    Authors: Yingqian Cui, Zhenwei Dai, Bing He, Zhan Shi, Hui Liu, Rui Sun, Zhiji Liu, Yue Xing, Jiliang Tang, Benoit Dumoulin

    Abstract: Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual space. This paradigm enables reasoning beyond discrete language tokens by performing multi-step computation in continuous latent spaces. Although there have been numerous studies focusing on improving the performance of latent rea… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  13. arXiv:2602.19248  [pdf, ps, other

    cs.CV cs.AI

    No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection

    Authors: Zunkai Dai, Ke Li, Jiajia Liu, Jie Yang, Yuanyuan Qiao

    Abstract: The collection and detection of video anomaly data has long been a challenging problem due to its rare occurrence and spatio-temporal scarcity. Existing video anomaly detection (VAD) methods under perform in open-world scenarios. Key contributing factors include limited dataset diversity, and inadequate understanding of context-dependent anomalous semantics. To address these issues, i) we propose… ▽ More

    Submitted 23 March, 2026; v1 submitted 22 February, 2026; originally announced February 2026.

    Comments: Accepted by CVPR 2026

  14. arXiv:2602.18535  [pdf, ps, other

    cs.SD cs.AI

    Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS

    Authors: Arianna Francesconi, Zhixiang Dai, Arthur Stefano Moscheni, Himesh Morgan Perera Kanattage, Donato Cappetta, Fabio Rebecchi, Paolo Soda, Valerio Guarrasi, Rosa Sicilia, Mary-Anne Hartley

    Abstract: Voice-based digital biomarkers can enable scalable, non-invasive screening and monitoring of Parkinson's disease (PD) and Amyotrophic Lateral Sclerosis (ALS). However, models trained on one cohort or device often fail on new acquisition settings due to cross-device and cross-cohort domain shift. This challenge is amplified in real-world scenarios with partial-label mismatch, where datasets may con… ▽ More

    Submitted 20 February, 2026; originally announced February 2026.

    Comments: 7 pages, 1 figure. Submitted to Pattern Recognition Letters

  15. arXiv:2602.05633  [pdf, ps, other

    cs.CL

    CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models

    Authors: Rui Jia, Ruiyi Lan, Fengrui Liu, Zhongxiang Dai, Bo Jiang, Jing Shao, Jingyuan Chen, Guandong Xu, Fei Wu, Min Zhang

    Abstract: Large language models (LLMs) have advanced the development of personalized learning in education. However, their inherent generation mechanisms often produce homogeneous responses to identical prompts. This one-size-fits-all mechanism overlooks the substantial heterogeneity in students cognitive and psychological, thereby posing potential safety risks to vulnerable groups. Existing safety evaluati… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

  16. arXiv:2602.05207  [pdf, ps, other

    eess.AS cs.AI

    ARCHI-TTS: A flow-matching-based Text-to-Speech Model with Self-supervised Semantic Aligner and Accelerated Inference

    Authors: Chunyat Wu, Jiajun Deng, Zhengxi Liu, Zheqi Dai, Haolin He, Qiuqiang Kong

    Abstract: Although diffusion-based, non-autoregressive text-to-speech (TTS) systems have demonstrated impressive zero-shot synthesis capabilities, their efficacy is still hindered by two key challenges: the difficulty of text-speech alignment modeling and the high computational overhead of the iterative denoising process. To address these limitations, we propose ARCHI-TTS that features a dedicated semantic… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

    Comments: Accepted by ICASSP 2026

  17. arXiv:2602.01202  [pdf, ps, other

    cs.AI

    Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction

    Authors: Mingze Kong, Zikun Qu, Zhongquan Zhou, Pengyu Liang, Xiang Li, Zhiwei Shang, Zhi Hong, Kaiyu Huang, Zhiyong Wang, Zhongxiang Dai

    Abstract: The rapid evolution of agentic workflows has demonstrated strong performance of LLM-based agents in addressing complex reasoning tasks. However, existing workflow optimization methods typically formulate workflow synthesis as a static, one-shot code-centric generation problem. This paradigm imposes excessive constraints on the model's coding capabilities and restricts the flexibility required for… ▽ More

    Submitted 1 February, 2026; originally announced February 2026.

  18. arXiv:2602.00528  [pdf, ps, other

    cs.AI

    How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use

    Authors: Minhua Lin, Enyan Dai, Hui Liu, Xianfeng Tang, Yuliang Yan, Zhenwei Dai, Jingying Zeng, Zhiwei Zhang, Fali Wang, Hongcheng Gao, Chen Luo, Xiang Zhang, Qi He, Suhang Wang

    Abstract: As Large Language Models (LLMs) are increasingly applied in high-stakes domains, their ability to reason strategically under uncertainty becomes critical. Poker provides a rigorous testbed, requiring not only strong actions but also principled, game-theoretic reasoning. In this paper, we conduct a systematic study of LLMs in multiple realistic poker tasks, evaluating both gameplay outcomes and rea… ▽ More

    Submitted 31 January, 2026; originally announced February 2026.

    Comments: Accepted by ICLR 2026

  19. arXiv:2602.00359  [pdf, ps, other

    cs.AI

    Position: Agentic Evolution is the Path to Evolving LLMs

    Authors: Minhua Lin, Hanqing Lu, Zhan Shi, Bing He, Rui Mao, Zhiwei Zhang, Zongyu Wu, Xianfeng Tang, Hui Liu, Zhenwei Dai, Xiang Zhang, Suhang Wang, Benoit Dumoulin, Jian Pei

    Abstract: As Large Language Models (LLMs) move from curated training sets into open-ended real-world environments, a fundamental limitation emerges: static training cannot keep pace with continual deployment environment change. Scaling training-time and inference-time compute improves static capability but does not close this train-deploy gap. We argue that addressing this limitation requires a new scaling… ▽ More

    Submitted 13 March, 2026; v1 submitted 30 January, 2026; originally announced February 2026.

    Comments: Update code link

  20. arXiv:2601.22664  [pdf, ps, other

    cs.AI

    Real-Time Aligned Reward Model beyond Semantics

    Authors: Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuefeng Xiao, Hongyan Xie, Li Huaqiu, Songshi Liang, Zhongxiang Dai, Fuzhen Zhuang, Jianxin Li, Yikun Ban, Deqing Wang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for aligning large language models (LLMs) with human preferences, yet it is susceptible to reward overoptimization, in which policy models overfit to the reward model, exploit spurious reward patterns instead of faithfully capturing human intent. Prior mitigations primarily relies on surface semantic information and fails to… ▽ More

    Submitted 9 March, 2026; v1 submitted 30 January, 2026; originally announced January 2026.

  21. arXiv:2601.21402  [pdf, ps, other

    eess.AS cs.SD

    SemanticAudio: Audio Generation and Editing in Semantic Space

    Authors: Zheqi Dai, Guangyan Zhang, Haolin He, Xiquan Li, Jingyu Li, Chunyat Wu, Yiwen Guo, Qiuqiang Kong

    Abstract: In recent years, Text-to-Audio Generation has achieved remarkable progress, offering sound creators powerful tools to transform textual inspirations into vivid audio. However, existing models predominantly operate directly in the acoustic latent space of a Variational Autoencoder (VAE), often leading to suboptimal alignment between generated audio and textual descriptions. In this paper, we introd… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

  22. arXiv:2601.14115  [pdf, ps, other

    cs.LG cs.AI

    Riemannian Liquid Spatio-Temporal Graph Network

    Authors: Liangsi Lu, Jingchao Wang, Zhaorong Dai, Hanqian Liu, Yang Shi

    Abstract: Liquid Time-Constant networks (LTCs), a type of continuous-time graph neural network, excel at modeling irregularly-sampled dynamics but are fundamentally confined to Euclidean space. This limitation introduces significant geometric distortion when representing real-world graphs with inherent non-Euclidean structures (e.g., hierarchies and cycles), degrading representation quality. To overcome thi… ▽ More

    Submitted 20 January, 2026; originally announced January 2026.

    Comments: This paper has been accepted to The Web Conference 2026

  23. arXiv:2601.11611  [pdf, ps, other

    cs.LG

    Integrating Temporal Context into Streaming Data for Human Activity Recognition in Smart Home

    Authors: Marina Vicini, Martin Rudorfer, Zhuangzhuang Dai, Luis J. Manso

    Abstract: With the global population ageing, it is crucial to enable individuals to live independently and safely in their homes. Using ubiquitous sensors such as Passive InfraRed sensors (PIR) and door sensors is drawing increasing interest for monitoring daily activities and facilitating preventative healthcare interventions for the elderly. Human Activity Recognition (HAR) from passive sensors mostly rel… ▽ More

    Submitted 9 January, 2026; originally announced January 2026.

    Comments: Accepted to International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI) 2024

  24. arXiv:2601.10457  [pdf, ps, other

    cs.AI

    NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models

    Authors: Ziming Dai, Dabiao Ma, Jinle Tong, Mengyuan Han, Jian Yang, Hongtao Liu, Haojun Fei, Qing Yang

    Abstract: Although the Gradient Boosted Decision Trees (GBDTs) dominate industrial tabular applications, upgrading legacy models in high-concurrency production environments still faces prohibitive retraining costs and systemic risks. To address this problem, we present NSR-Boost, a neuro-symbolic residual boosting framework designed specifically for industrial scenarios. Its core advantage lies in being "no… ▽ More

    Submitted 31 January, 2026; v1 submitted 15 January, 2026; originally announced January 2026.

    Comments: 14 pages, 12 figures

  25. arXiv:2601.08545  [pdf, ps, other

    cs.AI cs.CL cs.SE

    Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement

    Authors: Zhenlong Dai, Zhuoluo Zhao, Hengning Wang, Xiu Tang, Sai Wu, Chang Yao, Zhipeng Gao, Jingyuan Chen

    Abstract: With the development of large language models (LLMs) in the field of programming, intelligent programming coaching systems have gained widespread attention. However, most research focuses on repairing the buggy code of programming learners without providing the underlying causes of the bugs. To address this gap, we introduce a novel task, namely LRP (Learner-Tailored Program Repair). We then propo… ▽ More

    Submitted 18 January, 2026; v1 submitted 13 January, 2026; originally announced January 2026.

    Comments: Accepted by AAAI2026 main track

  26. arXiv:2601.04343  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Summary of The Inaugural Music Source Restoration Challenge

    Authors: Yongyi Zang, Jiarui Hai, Wanying Ge, Qiuqiang Kong, Zheqi Dai, Helin Wang, Yuki Mitsufuji, Mark D. Plumbley

    Abstract: Music Source Restoration (MSR) aims to recover original, unprocessed instrument stems from professionally mixed and degraded audio, requiring the reversal of both production effects and real-world degradations. We present the inaugural MSR Challenge, which features objective evaluation on studio-produced mixtures using Multi-Mel-SNR, Zimtohrli, and FAD-CLAP, alongside subjective evaluation on real… ▽ More

    Submitted 7 January, 2026; originally announced January 2026.

  27. arXiv:2512.22035  [pdf, ps, other

    cs.DC

    Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View

    Authors: Yanmeng Wang, Zhiwen Dai, Shuai Wang, Jian Zhou, Fu Xiao, Tony Q. S. Quek, Tsung-Hui Chang

    Abstract: Federated Fine-Tuning (FFT) has attracted growing interest as it leverages both server- and client-side data to enhance global model generalization while preserving privacy, and significantly reduces the computational burden on edge devices by avoiding training from scratch. Despite these advantages, FFT performance is often degraded by unreliable server-client connections and heterogeneous client… ▽ More

    Submitted 26 December, 2025; originally announced December 2025.

  28. arXiv:2512.21887  [pdf, ps, other

    cs.RO cs.AI

    Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space

    Authors: Weichen Zhang, Peizhi Tang, Xin Zeng, Fanhang Man, Shiquan Yu, Zichao Dai, Baining Zhao, Hongjin Chen, Yu Shang, Wei Wu, Chen Gao, Xinlei Chen, Xin Wang, Yong Li, Wenwu Zhu

    Abstract: Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large-scale three-dimensional environments. Existing navigation policies, however, are typically optimized for low-level objectives such as obstacle avoidance and trajectory smoothness, lacking the ability to incorporate high-level semantics into planning. To bridge this… ▽ More

    Submitted 2 January, 2026; v1 submitted 26 December, 2025; originally announced December 2025.

  29. arXiv:2512.17344  [pdf, ps, other

    cs.CL

    Governance-Aware Hybrid Fine-Tuning for Multilingual Large Language Models

    Authors: Haomin Qi, Chengbo Huang, Zihan Dai, Yunkai Gao

    Abstract: We present a governance-aware hybrid fine-tuning framework for multilingual, low-resource adaptation of large language models. The core algorithm combines gradient-aligned low-rank updates with structured orthogonal transformations through layer-wise mixing and introduces unitary constraints in selected sub-layers to stabilize deep optimization. In tandem with lightweight, label-free data governan… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: 11 pages, 4 figures, 6 tables. arXiv admin note: substantial text overlap with arXiv:2507.18076

    Journal ref: 2025 IEEE International Conference on Big Data

  30. arXiv:2512.05377  [pdf, ps, other

    cs.LG cs.AI physics.ao-ph

    China Regional 3km Downscaling Based on Residual Corrective Diffusion Model

    Authors: Honglu Sun, Hao Jing, Zhixiang Dai, Sa Xiao, Wei Xue, Jian Sun, Qifeng Lu

    Abstract: A fundamental challenge in numerical weather prediction is to efficiently produce high-resolution forecasts. A common solution is applying downscaling methods, which include dynamical downscaling and statistical downscaling, to the outputs of global models. This work focuses on statistical downscaling, which establishes statistical relationships between low-resolution and high-resolution historica… ▽ More

    Submitted 9 February, 2026; v1 submitted 4 December, 2025; originally announced December 2025.

  31. arXiv:2512.02942  [pdf, ps, other

    cs.CV cs.AI

    Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench

    Authors: Lanxiang Hu, Abhilash Shankarampeta, Yixin Huang, Zilin Dai, Haoyang Yu, Yujie Zhao, Haoqiang Kang, Daniel Zhao, Tajana Rosing, Hao Zhang

    Abstract: The next frontier for video generation lies in developing models capable of zero-shot reasoning, where understanding real-world scientific laws is crucial for accurate physical outcome modeling under diverse conditions. However, existing video benchmarks are physical commonsense-based, offering limited insight into video models' scientific reasoning capability. We introduce VideoScience-Bench, a b… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  32. arXiv:2512.02486  [pdf, ps, other

    cs.LG

    Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts

    Authors: Zhongjian Qiao, Rui Yang, Jiafei Lyu, Xiu Li, Zhongxiang Dai, Zhuoran Yang, Siyang Gao, Shuang Qiu

    Abstract: Single-domain offline reinforcement learning (RL) often suffers from limited data coverage, while cross-domain offline RL handles this issue by leveraging additional data from other domains with dynamics shifts. However, existing studies primarily focus on train-time robustness (handling dynamics shifts from training data), neglecting the test-time robustness against dynamics perturbations when de… ▽ More

    Submitted 9 March, 2026; v1 submitted 2 December, 2025; originally announced December 2025.

    Comments: Accepted at ICLR 2026

  33. arXiv:2511.18870  [pdf, ps, other

    cs.CV

    HunyuanVideo 1.5 Technical Report

    Authors: Bing Wu, Chang Zou, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Jack Peng, Jianbing Wu, Jiangfeng Xiong, Jie Jiang, Linus, Patrol, Peizhen Zhang, Peng Chen, Penghao Zhao, Qi Tian, Songtao Liu, Weijie Kong, Weiyan Wang, Xiao He, Xin Li, Xinchi Deng, Xuefei Zhe, Yang Li, Yanxin Long , et al. (56 additional authors not shown)

    Abstract: We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding til… ▽ More

    Submitted 24 November, 2025; v1 submitted 24 November, 2025; originally announced November 2025.

  34. arXiv:2511.18399  [pdf, ps, other

    cs.CV

    ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering

    Authors: Yuxiang Nie, Han Wang, Yongjie Ye, Haiyang Yu, Weitao Jia, Tao Zeng, Hao Feng, Xiang Fei, Yang Li, Xiaohui Lv, Guozhi Tang, Jingqun Tang, Jinghui Lu, Zehui Dai, Jiacong Wang, Dingkang Yang, An-Lan Wang, Can Huang

    Abstract: This paper introduces ChineseVideoBench, a pioneering benchmark specifically designed for evaluating Multimodal Large Language Models (MLLMs) in Chinese Video Question Answering. The growing demand for sophisticated video analysis capabilities highlights the critical need for comprehensive, culturally-aware evaluation frameworks. ChineseVideoBench addresses this gap by providing a robust dataset a… ▽ More

    Submitted 23 November, 2025; originally announced November 2025.

  35. arXiv:2511.16883  [pdf, ps, other

    cs.LG

    PersonalizedRouter: Personalized LLM Routing via Graph-based User Preference Modeling

    Authors: Zhongjie Dai, Tao Feng, Jiaxuan You

    Abstract: The growing number of Large Language Models (LLMs) with diverse capabilities and response styles provides users with a wider range of choices, which presents challenges in selecting appropriate LLMs, as user preferences vary in terms of performance, cost, and response style. Current LLM selection methods typically optimize for a single fixed objective, such as performance, cost, or a trade-off bet… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

  36. arXiv:2511.16147  [pdf, ps, other

    cs.CL cs.AI

    TS-PEFT: Unveiling Token-Level Redundancy in Parameter-Efficient Fine-Tuning

    Authors: Dabiao Ma, Ziming Dai, Zhimin Xin, Shu Wang, Jian Yang, Haojun Fei

    Abstract: Current Parameter-Efficient Fine-Tuning (PEFT) methods typically operate under an implicit assumption: Once a target module is selected, every token passing through it contributes equally to the downstream task and requires a parameter update. In this paper, we challenge this convention by revealing a pervasive token-level redundancy in the fine-tuning of large models (LMs). We propose TS-PEFT, a… ▽ More

    Submitted 29 January, 2026; v1 submitted 20 November, 2025; originally announced November 2025.

    Comments: 11 pages, 3 figures

  37. arXiv:2511.14422  [pdf, ps, other

    cs.CR cs.AI cs.LG

    Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection

    Authors: Zhengchunmin Dai, Jiaxiong Tang, Peng Sun, Honglong Chen, Liantao Wu

    Abstract: In decentralized machine learning paradigms such as Split Federated Learning (SFL) and its variant U-shaped SFL, the server's capabilities are severely restricted. Although this enhances client-side privacy, it also leaves the server highly vulnerable to model theft by malicious clients. Ensuring intellectual property protection for such capability-limited servers presents a dual challenge: waterm… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 18 pages,8 figures

  38. arXiv:2511.13598  [pdf, ps, other

    cs.CR cs.AI

    Robust Client-Server Watermarking for Split Federated Learning

    Authors: Jiaxiong Tang, Zhengchunmin Dai, Liantao Wu, Peng Sun, Honglong Chen, Zhenfu Cao

    Abstract: Split Federated Learning (SFL) is renowned for its privacy-preserving nature and low computational overhead among decentralized machine learning paradigms. In this framework, clients employ lightweight models to process private data locally and transmit intermediate outputs to a powerful server for further computation. However, SFL is a double-edged sword: while it enables edge computing and enhan… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  39. arXiv:2511.11635  [pdf, ps, other

    cs.CY cs.AI cs.CL

    EduAgentQG: A Multi-Agent Workflow Framework for Personalized Question Generation

    Authors: Rui Jia, Min Zhang, Fengrui Liu, Bo Jiang, Kun Kuang, Zhongxiang Dai

    Abstract: High-quality personalized question banks are crucial for supporting adaptive learning and individualized assessment. Manually designing questions is time-consuming and often fails to meet diverse learning needs, making automated question generation a crucial approach to reduce teachers' workload and improve the scalability of educational resources. However, most existing question generation method… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

  40. arXiv:2511.09924  [pdf, ps, other

    cs.LG cs.AI

    MDMLP-EIA: Multi-domain Dynamic MLPs with Energy Invariant Attention for Time Series Forecasting

    Authors: Hu Zhang, Zhien Dai, Zhaohui Tang, Yongfang Xie

    Abstract: Time series forecasting is essential across diverse domains. While MLP-based methods have gained attention for achieving Transformer-comparable performance with fewer parameters and better robustness, they face critical limitations including loss of weak seasonal signals, capacity constraints in weight-sharing MLPs, and insufficient channel fusion in channel-independent strategies. To address thes… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  41. arXiv:2511.08873  [pdf, ps, other

    cs.AI

    UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models

    Authors: Shouang Wei, Min Zhang, Xin Lin, Bo Jiang, Kun Kuang, Zhongxiang Dai

    Abstract: Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings, yet current supervised fine-tuning methods only learn surface teaching patterns without dynamic adaptation capabilities. Recent reinforcement learning approaches address this limitation but face two critical challenges. First, they evaluate teaching effectiveness solely based on whether s… ▽ More

    Submitted 5 January, 2026; v1 submitted 11 November, 2025; originally announced November 2025.

  42. arXiv:2510.21829  [pdf, ps, other

    cs.CV

    A Flow Model with Low-Rank Transformers for Incomplete Multimodal Survival Analysis

    Authors: Yi Yin, Yuntao Shou, Zao Dai, Yun Peng, Tao Meng, Wei Ai, Keqin Li

    Abstract: In recent years, multimodal medical data-based survival analysis has attracted much attention. However, real-world datasets often suffer from the problem of incomplete modality, where some patient modality information is missing due to acquisition limitations or system failures. Existing methods typically infer missing modalities directly from observed ones using deep neural networks, but they oft… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

    Comments: 12 pages, 4 figures

  43. arXiv:2510.18999  [pdf, ps, other

    cs.RO cs.AI cs.CV

    $\nabla$-SDF: Learning Euclidean Signed Distance Functions Online with Gradient-Augmented Octree Interpolation and Neural Residual

    Authors: Zhirui Dai, Qihao Qian, Tianxing Fan, Nikolay Atanasov

    Abstract: Estimation of signed distance functions (SDFs) from point cloud data has been shown to benefit many robot autonomy capabilities, including localization, mapping, motion planning, and control. Methods that support online and large-scale SDF reconstruction tend to rely on discrete volumetric data structures, which affect the continuity and differentiability of the SDF estimates. Recently, using impl… ▽ More

    Submitted 21 October, 2025; originally announced October 2025.

  44. arXiv:2510.17771  [pdf, ps, other

    cs.AI cs.CV

    Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs

    Authors: Zhining Liu, Ziyi Chen, Hui Liu, Chen Luo, Xianfeng Tang, Suhang Wang, Joy Zeng, Zhenwei Dai, Zhan Shi, Tianxin Wei, Benoit Dumoulin, Hanghang Tong

    Abstract: Vision-Language Models (VLMs) achieve strong results on multimodal tasks such as visual question answering, yet they can still fail even when the correct visual evidence is present. In this work, we systematically investigate whether these failures arise from not perceiving the evidence or from not leveraging it effectively. By examining layer-wise attention dynamics, we find that shallow layers f… ▽ More

    Submitted 20 October, 2025; originally announced October 2025.

    Comments: 21 pages, 10 figures, 6 tables

  45. arXiv:2510.15992  [pdf, ps, other

    cs.LG cs.AI

    Stratos: An End-to-End Distillation Pipeline for Customized LLMs under Distributed Cloud Environments

    Authors: Ziming Dai, Tuo Zhang, Fei Gao, Xingyi Cai, Xiaofei Wang, Cheng Zhang, Wenyu Wang, Chengjie Zang

    Abstract: The growing industrial demand for customized and cost-efficient large language models (LLMs) is fueled by the rise of vertical, domain-specific tasks and the need to optimize performance under constraints such as latency and budget. Knowledge distillation, as an efficient model compression and transfer technique, offers a feasible solution. However, existing distillation frameworks often require m… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

  46. arXiv:2510.14824  [pdf, ps, other

    cs.CL cs.CV cs.IR

    Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking

    Authors: Ziqi Dai, Xin Zhang, Mingxin Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang

    Abstract: In information retrieval, training reranking models mainly focuses on two types of objectives: metric learning (e.g. contrastive loss to increase the predicted scores on relevant query-document pairs) and classification (binary label prediction of relevance vs. irrelevance). For BERT-style encoders, various studies have shown that contrastive learning (CL) can be more effective than discriminative… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  47. arXiv:2510.13626  [pdf, ps, other

    cs.RO cs.CL cs.CV

    LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

    Authors: Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, Jinlan Fu, Jingjing Gong, Xipeng Qiu

    Abstract: Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures a… ▽ More

    Submitted 26 December, 2025; v1 submitted 15 October, 2025; originally announced October 2025.

  48. arXiv:2510.12899  [pdf, ps, other

    cs.CL

    EduDial: Constructing a Large-scale Multi-turn Teacher-Student Dialogue Corpus

    Authors: Shouang Wei, Min Zhang, Xin Lin, Bo Jiang, Zhongxiang Dai, Kun Kuang

    Abstract: Recently, several multi-turn dialogue benchmarks have been proposed to evaluate the conversational abilities of large language models (LLMs). As LLMs are increasingly recognized as a key technology for advancing intelligent education, owing to their ability to deeply understand instructional contexts and provide personalized guidance, the construction of dedicated teacher-student dialogue benchmar… ▽ More

    Submitted 14 October, 2025; originally announced October 2025.

  49. arXiv:2510.10995  [pdf, ps, other

    cs.SD

    MSRBench: A Benchmarking Dataset for Music Source Restoration

    Authors: Yongyi Zang, Jiarui Hai, Wanying Ge, Qiuqiang Kong, Zheqi Dai, Helin Wang, Yuki Mitsufuji, Mark D. Plumbley

    Abstract: Music Source Restoration (MSR) extends source separation to realistic settings where signals undergo production effects (equalization, compression, reverb) and real-world degradations, with the goal of recovering the original unprocessed sources. Existing benchmarks cannot measure restoration fidelity: synthetic datasets use unprocessed stems but unrealistic mixtures, while real production dataset… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

  50. arXiv:2510.09388  [pdf, ps, other

    cs.LG cs.CL

    HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness

    Authors: Xinyi Wang, Jinyi Han, Zishang Jiang, Tingyun Li, Jiaqing Liang, Sihang Jiang, Zhaoqian Dai, Shuguang Ma, Fei Yu, Yanghua Xiao

    Abstract: Reinforcement Learning (RL) has become a key driver for enhancing the long chain-of-thought (CoT) reasoning capabilities of Large Language Models (LLMs). However, prevalent methods like GRPO often fail when task difficulty exceeds the model's capacity, leading to reward sparsity and inefficient training. While prior work attempts to mitigate this using off-policy data, such as mixing RL with Super… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.