Skip to main content

Showing 1–50 of 724 results for author: Cheng, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.19320  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CV

    MAGIC: Achieving Superior Model Merging via Magnitude Calibration

    Authors: Yayuan Li, Jian Zhang, Jintao Guo, Zihan Cheng, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: The proliferation of pre-trained models has given rise to a wide array of specialised, fine-tuned models. Model merging aims to merge the distinct capabilities of these specialised models into a unified model, requiring minimal or even no additional training. A core objective of model merging is to ensure the merged model retains the behavioural characteristics of the specialised models, typically… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  2. arXiv:2512.18246  [pdf, ps, other

    cs.LG cs.AI

    Offline Behavioral Data Selection

    Authors: Shiye Lei, Zhihao Cheng, Dacheng Tao

    Abstract: Behavioral cloning is a widely adopted approach for offline policy learning from expert demonstrations. However, the large scale of offline behavioral datasets often results in computationally intensive training when used in downstream tasks. In this paper, we uncover the striking data saturation in offline behavioral data: policy performance rapidly saturates when trained on a small fraction of t… ▽ More

    Submitted 20 December, 2025; originally announced December 2025.

    Comments: Accepted by KDD 2026

  3. arXiv:2512.18133  [pdf, ps, other

    cs.LG cs.AI cs.CR

    Grad: Guided Relation Diffusion Generation for Graph Augmentation in Graph Fraud Detection

    Authors: Jie Yang, Rui Zhang, Ziyang Cheng, Dawei Cheng, Guang Yang, Bo Wang

    Abstract: Nowadays, Graph Fraud Detection (GFD) in financial scenarios has become an urgent research topic to protect online payment security. However, as organized crime groups are becoming more professional in real-world scenarios, fraudsters are employing more sophisticated camouflage strategies. Specifically, fraudsters disguise themselves by mimicking the behavioral data collected by platforms, ensurin… ▽ More

    Submitted 19 December, 2025; originally announced December 2025.

    Comments: Accepted by The Web Conference 2025 (WWW'25). 12 pages, includes implementation details. Code: https://github.com/AI4Risk/antifraud and https://github.com/Muyiiiiii/WWW25-Grad

    ACM Class: H.2.8; G.2.2

    Journal ref: Proceedings of the ACM Web Conference 2025 (WWW '25), April 28-May 2, 2025, Sydney, NSW, Australia

  4. arXiv:2512.16071  [pdf, ps, other

    cs.ET cs.AI eess.SP

    Feasibility of Radio Frequency Based Wireless Sensing of Lead Contamination in Soil

    Authors: Yixuan Gao, Tanvir Ahmed, Mikhail Mohammed, Zhongqi Cheng, Rajalakshmi Nandakumar

    Abstract: Widespread Pb (lead) contamination of urban soil significantly impacts food safety and public health and hinders city greening efforts. However, most existing technologies for measuring Pb are labor-intensive and costly. In this study, we propose SoilScanner, a radio frequency-based wireless system that can detect Pb in soils. This is based on our discovery that the propagation of different freque… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: 12 pages, 12 Figures, International Conference on Embedded Wireless Systems and Networks, https://ewsn.org/file-repository/ewsn2024/ewsn24-final99.pdf, Best Paper Award of EWSN2024

  5. arXiv:2512.16041  [pdf, ps, other

    cs.CL cs.AI

    Are We on the Right Way to Assessing LLM-as-a-Judge?

    Authors: Yuanning Feng, Sinan Wang, Zhengxiang Cheng, Yao Wan, Dongping Chen

    Abstract: LLM-as-a-Judge has been widely adopted as an evaluation method and served as supervised rewards in model training. However, existing benchmarks for LLM-as-a-Judge are mainly relying on human-annotated ground truth, which introduces human bias that undermines the assessment of reliability and imposes scalability constraints. To overcome these limitations, we introduce Sage, a novel evaluation suite… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  6. arXiv:2512.15699  [pdf, ps, other

    cs.LG cs.SE

    FrontierCS: Evolving Challenges for Evolving Intelligence

    Authors: Qiuyang Mang, Wenhao Chai, Zhifei Li, Huanzhi Mao, Shang Zhou, Alexander Du, Hanchen Li, Shu Liu, Edwin Chen, Yichuan Wang, Xieting Chu, Zerui Cheng, Yuan Xu, Tian Xia, Zirui Wang, Tianneng Shi, Jianzhu Yao, Yilong Zhao, Qizheng Zhang, Charlie Ruan, Zeyu Shen, Kaiyuan Liu, Runyuan He, Dong Xing, Zerui Li , et al. (26 additional authors not shown)

    Abstract: We introduce FrontierCS, a benchmark of 156 open-ended problems across diverse areas of computer science, designed and reviewed by experts, including CS PhDs and top-tier competitive programming participants and problem setters. Unlike existing benchmarks that focus on tasks with known optimal solutions, FrontierCS targets problems where the optimal solution is unknown, but the quality of a soluti… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: Code with instruction: https://github.com/FrontierCS/Frontier-CS

  7. arXiv:2512.15176  [pdf, ps, other

    cs.LG cs.AI

    DEER: Draft with Diffusion, Verify with Autoregressive Models

    Authors: Zicong Cheng, Guo-Wei Yang, Jia Li, Zhijie Deng, Meng-Hao Guo, Shi-Min Hu

    Abstract: Efficiency, as a critical practical challenge for LLM-driven agentic and reasoning systems, is increasingly constrained by the inherent latency of autoregressive (AR) decoding. Speculative decoding mitigates this cost through a draft-verify scheme, yet existing approaches rely on AR draft models (a.k.a., drafters), which introduce two fundamental issues: (1) step-wise uncertainty accumulation lead… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

    Comments: Homepage : https://czc726.github.io/DEER/

  8. arXiv:2512.13564  [pdf, ps, other

    cs.CL cs.AI

    Memory in the Age of AI Agents

    Authors: Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, Zhenrong Cheng, Xuanbo Fan, Jiaxin Guo, Xinlei Yu, Zhenhong Zhou, Zewen Hu , et al. (22 additional authors not shown)

    Abstract: Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the prol… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

  9. MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction

    Authors: Bate Li, Houqiang Zhong, Zhengxue Cheng, Qiang Hu, Qiang Wang, Li Song, Wenjun Zhang

    Abstract: Multi-view egocentric dynamic scene reconstruction holds significant research value for applications in holographic documentation of social interactions. However, existing reconstruction datasets focus on static multi-view or single-egocentric view setups, lacking multi-view egocentric datasets for dynamic scene reconstruction. Therefore, we present MultiEgo, the first multi-view egocentric datase… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

    Comments: ACM MM 2025 Dataset Track

  10. arXiv:2512.10956  [pdf, ps, other

    cs.CV

    Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision

    Authors: Wentao Zhou, Xuweiyi Chen, Vignesh Rajagopal, Jeffrey Chen, Rohan Chandra, Zezhou Cheng

    Abstract: The success of foundation models in language and vision motivated research in fully end-to-end robot navigation foundation models (NFMs). NFMs directly map monocular visual input to control actions and ignore mid-level vision modules (tracking, depth estimation, etc) entirely. While the assumption that vision capabilities will emerge implicitly is compelling, it requires large amounts of pixel-to-… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

    Comments: Project Page: https://www.cs.virginia.edu/~tsx4zn/stereowalk/

  11. arXiv:2512.06692  [pdf, ps, other

    cs.LG

    State Diversity Matters in Offline Behavior Distillation

    Authors: Shiye Lei, Zhihao Cheng, Dacheng Tao

    Abstract: Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In this paper, we uncover a misalignment between original and distilled datasets, observing that a high-quality original dataset does not necessarily yield a superio… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

    Comments: 12 pages, 5 figures, 5 tables

  12. arXiv:2512.06201  [pdf, ps, other

    cs.LG

    K2-V2: A 360-Open, Reasoning-Enhanced LLM

    Authors: K2 Team, Zhengzhong Liu, Liping Tang, Linghao Jin, Haonan Li, Nikhil Ranjan, Desai Fan, Shaurya Rohatgi, Richard Fan, Omkar Pangarkar, Huijuan Wang, Zhoujun Cheng, Suqi Sun, Seungwook Han, Bowen Tan, Gurpreet Gosal, Xudong Han, Varad Pimpalkhute, Shibo Hao, Ming Shan Hee, Joel Hestness, Haolong Jia, Liqun Ma, Aaryamonvikram Singh, Daria Soboleva , et al. (14 additional authors not shown)

    Abstract: We introduce K2-V2, a 360-open LLM built from scratch as a superior base for reasoning adaptation, in addition to functions such as conversation and knowledge retrieval from general LLMs. It stands as the strongest fully open model, rivals open-weight leaders in its size class, outperforms Qwen2.5-72B and approaches the performance of Qwen3-235B. We actively infuse domain knowledge, reasoning, lon… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  13. arXiv:2512.05150  [pdf, ps, other

    cs.CV

    TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

    Authors: Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin

    Abstract: Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim… ▽ More

    Submitted 3 December, 2025; originally announced December 2025.

    Comments: arxiv v0

  14. arXiv:2512.04699  [pdf, ps, other

    cs.CV

    OmniScaleSR: Unleashing Scale-Controlled Diffusion Prior for Faithful and Realistic Arbitrary-Scale Image Super-Resolution

    Authors: Xinning Chai, Zhengxue Cheng, Yuhong Zhang, Hengsheng Zhang, Yingsheng Qin, Yucai Yang, Rong Xie, Li Song

    Abstract: Arbitrary-scale super-resolution (ASSR) overcomes the limitation of traditional super-resolution (SR) methods that operate only at fixed scales (e.g., 4x), enabling a single model to handle arbitrary magnification. Most existing ASSR approaches rely on implicit neural representation (INR), but its regression-driven feature extraction and aggregation intrinsically limit the ability to synthesize fi… ▽ More

    Submitted 4 December, 2025; originally announced December 2025.

    Comments: Accepted as TCSVT, 15 pages

  15. arXiv:2512.02924  [pdf, ps, other

    cs.CL

    AutoNeural: Co-Designing Vision-Language Models for NPU Inference

    Authors: Wei Chen, Liangmin Wu, Yunhai Hu, Zhiyuan Li, Zhiyuan Cheng, Yicheng Qian, Lingyue Zhu, Zhipeng Hu, Luoyi Liang, Qiang Tang, Zhen Liu, Han Yang

    Abstract: While Neural Processing Units (NPUs) offer high theoretical efficiency for edge AI, state-of-the-art Vision--Language Models (VLMs) tailored for GPUs often falter on these substrates. We attribute this hardware-model mismatch to two primary factors: the quantization brittleness of Vision Transformers (ViTs) and the I/O-bound nature of autoregressive attention mechanisms, which fail to utilize the… ▽ More

    Submitted 7 December, 2025; v1 submitted 2 December, 2025; originally announced December 2025.

  16. arXiv:2512.01672  [pdf, ps, other

    cs.LG cs.AI

    ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models

    Authors: Zhongyuan Wu, Jingyuan Wang, Zexuan Cheng, Yilong Zhou, Weizhi Wang, Juhua Pu, Chao Li, Changqing Ma

    Abstract: Anomaly detection (AD) is a fundamental task of critical importance across numerous domains. Current systems increasingly operate in rapidly evolving environments that generate diverse yet interconnected data modalities -- such as time series, system logs, and tabular records -- as exemplified by modern IT systems. Effective AD methods in such environments must therefore possess two critical capab… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  17. arXiv:2512.01274  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SUPERChem: A Multimodal Reasoning Benchmark in Chemistry

    Authors: Zehua Zhao, Zhixian Huang, Junren Li, Siyu Lin, Junting Zhou, Fengqi Cao, Kun Zhou, Rui Ge, Tingting Long, Yuexiang Zhu, Yan Liu, Jie Zheng, Junnian Wei, Rong Zhu, Peng Zou, Wenyu Li, Zekai Cheng, Tian Ding, Yaxuan Wang, Yizhao Yan, Tingru Wei, Haowei Ming, Weijie Mao, Chen Sun, Yiming Liu , et al. (6 additional authors not shown)

    Abstract: Current benchmarks for evaluating the chemical reasoning capabilities of Large Language Models (LLMs) are limited by oversimplified tasks, lack of process-level evaluation, and misalignment with expert-level chemistry skills. To address these issues, we introduce SUPERChem, a benchmark of 500 expert-curated reasoning-intensive chemistry problems, covering diverse subfields and provided in both mul… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: 35 pages, 11 figures, 5 tables

  18. arXiv:2511.22859  [pdf, ps, other

    eess.IV cs.CR

    TokCom-UEP: Semantic Importance-Matched Unequal Error Protection for Resilient Image Transmission

    Authors: Kaizheng Zhang, Zuolin Jin, Zhihang Cheng, Ming Zeng, Li Qiao, Zesong Fei

    Abstract: Based on the provided LaTeX code, here is the metadata for the submission form: Title: TokCom-UEP: Semantic Importance-Matched Unequal Error Protection for Resilient Image Transmission Author(s): Kaizheng Zhang, Zuolin Jin, Zhihang Cheng, Ming Zeng, Li Qiao, Zesong Fei Abstract: Token communication (TokCom), an emerging semantic communication framework powered by Large Multimodal Model (LMM), has… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

  19. arXiv:2511.21631  [pdf, ps, other

    cs.CV cs.AI

    Qwen3-VL Technical Report

    Authors: Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu , et al. (39 additional authors not shown)

    Abstract: We introduce Qwen3-VL, the most capable vision-language model in the Qwen series to date, achieving superior performance across a broad range of multimodal benchmarks. It natively supports interleaved contexts of up to 256K tokens, seamlessly integrating text, images, and video. The model family includes both dense (2B/4B/8B/32B) and mixture-of-experts (30B-A3B/235B-A22B) variants to accommodate d… ▽ More

    Submitted 27 November, 2025; v1 submitted 26 November, 2025; originally announced November 2025.

    Comments: 42 pages

  20. arXiv:2511.20857  [pdf, ps, other

    cs.CL cs.AI

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    Authors: Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H. Chi, Chi Wang, Shuo Chen, Fernando Pereira, Wang-Cheng Kang, Derek Zhiyuan Cheng

    Abstract: Statefulness is essential for large language model (LLM) agents to perform long-term planning and problem-solving. This makes memory a critical component, yet its management and evolution remain largely underexplored. Existing evaluations mostly focus on static conversational settings, where memory is passively retrieved from dialogue to answer queries, overlooking the dynamic ability to accumulat… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

  21. arXiv:2511.20258  [pdf, ps, other

    cs.CV cs.LG

    Modality-Balanced Collaborative Distillation for Multi-Modal Domain Generalization

    Authors: Xiaohan Wang, Zhangtao Cheng, Ting Zhong, Leiting Chen, Fan Zhou

    Abstract: Weight Averaging (WA) has emerged as a powerful technique for enhancing generalization by promoting convergence to a flat loss landscape, which correlates with stronger out-of-distribution performance. However, applying WA directly to multi-modal domain generalization (MMDG) is challenging: differences in optimization speed across modalities lead WA to overfit to faster-converging ones in early st… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

  22. arXiv:2511.17623  [pdf, ps, other

    cs.LG cs.AI

    M$^2$OE$^2$-GL: A Family of Probabilistic Load Forecasters That Scales to Massive Customers

    Authors: Haoran Li, Zhe Cheng, Muhao Guo, Yang Weng, Yannan Sun, Victor Tran, John Chainaranont

    Abstract: Probabilistic load forecasting is widely studied and underpins power system planning, operation, and risk-aware decision making. Deep learning forecasters have shown strong ability to capture complex temporal and contextual patterns, achieving substantial accuracy gains. However, at the scale of thousands or even hundreds of thousands of loads in large distribution feeders, a deployment dilemma em… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 5 pages

  23. arXiv:2511.17282  [pdf, ps, other

    cs.CV cs.AI cs.CY

    Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation

    Authors: Chuancheng Shi, Shangze Li, Shiming Guo, Simiao Xie, Wenhua Wu, Jingtong Dou, Chao Wu, Canran Xiao, Cong Wang, Zifeng Cheng, Fei Shen, Tat-Seng Chua

    Abstract: Multilingual text-to-image (T2I) models have advanced rapidly in terms of visual realism and semantic alignment, and are now widely utilized. Yet outputs vary across cultural contexts: because language carries cultural connotations, images synthesized from multilingual prompts should preserve cross-lingual cultural consistency. We conduct a comprehensive analysis showing that current T2I models of… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

  24. arXiv:2511.14107  [pdf, ps, other

    cs.CV

    RTS-Mono: A Real-Time Self-Supervised Monocular Depth Estimation Method for Real-World Deployment

    Authors: Zeyu Cheng, Tongfei Liu, Tao Lei, Xiang Hua, Yi Zhang, Chengkai Tang

    Abstract: Depth information is crucial for autonomous driving and intelligent robot navigation. The simplicity and flexibility of self-supervised monocular depth estimation are conducive to its role in these fields. However, most existing monocular depth estimation models consume many computing resources. Although some methods have reduced the model's size and improved computing efficiency, the performance… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 14 pages, 10 figures

  25. arXiv:2511.13329  [pdf, ps, other

    cs.CL cs.CR

    RegionMarker: A Region-Triggered Semantic Watermarking Framework for Embedding-as-a-Service Copyright Protection

    Authors: Shufan Yang, Zifeng Cheng, Zhiwei Jiang, Yafeng Yin, Cong Wang, Shiping Ge, Yuchen Fu, Qing Gu

    Abstract: Embedding-as-a-Service (EaaS) is an effective and convenient deployment solution for addressing various NLP tasks. Nevertheless, recent research has shown that EaaS is vulnerable to model extraction attacks, which could lead to significant economic losses for model providers. For copyright protection, existing methods inject watermark embeddings into text embeddings and use them to detect copyrigh… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: AAAI 2026

  26. arXiv:2511.11719  [pdf, ps, other

    cs.DC cs.AI

    ECCENTRIC: Edge-Cloud Collaboration Framework for Distributed Inference Using Knowledge Adaptation

    Authors: Mohammad Mahdi Kamani, Zhongwei Cheng, Lin Chen

    Abstract: The massive growth in the utilization of edge AI has made the applications of machine learning models ubiquitous in different domains. Despite the computation and communication efficiency of these systems, due to limited computation resources on edge devices, relying on more computationally rich systems on the cloud side is inevitable in most cases. Cloud inference systems can achieve the best per… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  27. arXiv:2511.10232  [pdf, ps, other

    cs.CL cs.AI cs.SD

    VocalNet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi-Codebook Tokenization and Multi-Token Prediction

    Authors: Yuhao Wang, Ziyang Cheng, Heyang Liu, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang

    Abstract: Current end-to-end spoken language models (SLMs) have made notable progress, yet they still encounter considerable response latency. This delay primarily arises from the autoregressive generation of speech tokens and the reliance on complex flow-matching models for speech synthesis. To overcome this, we introduce VocalNet-M2, a novel low-latency SLM that integrates a multi-codebook tokenizer and a… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  28. arXiv:2511.09212  [pdf, ps, other

    cs.SE

    Leveraging Self-Paced Learning for Software Vulnerability Detection

    Authors: Zeru Cheng, Yanjing Yang, He Zhang, Lanxin Yang, Jinghao Hu, Jinwei Xu, Bohan Liu, Haifeng Shen

    Abstract: Software vulnerabilities are major risks to software systems. Recently, researchers have proposed many deep learning approaches to detect software vulnerabilities. However, their accuracy is limited in practice. One of the main causes is low-quality training data (i.e., source code). To this end, we propose a new approach: SPLVD (Self-Paced Learning for Software Vulnerability Detection). SPLVD dyn… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  29. arXiv:2511.09119  [pdf, ps, other

    cs.RO

    Data Assessment for Embodied Intelligence

    Authors: Jiahao Xiao, Bowen Yan, Jianbo Zhang, Jia Wang, Chunyi Li, Zhengxue Cheng, Guangtao Zhai

    Abstract: In embodied intelligence, datasets play a pivotal role, serving as both a knowledge repository and a conduit for information transfer. The two most critical attributes of a dataset are the amount of information it provides and how easily this information can be learned by models. However, the multimodal nature of embodied data makes evaluating these properties particularly challenging. Prior work… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

  30. arXiv:2511.08887   

    cs.LG cs.AI

    FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis

    Authors: Tianming Sha, Zechuan Chen, Zhan Cheng, Haotian Zhai, Xuwei Ding, Keze Wang

    Abstract: Stroke is an acute cerebrovascular disease, and timely diagnosis significantly improves patient survival. However, existing automated diagnosis methods suffer from fairness issues across demographic groups, potentially exacerbating healthcare disparities. In this work we propose FAST-CAD, a theoretically grounded framework that combines domain-adversarial training (DAT) with group distributionally… ▽ More

    Submitted 4 December, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: This paper has been withdrawn by the submitting author while the authorship and institutional ethics approval are being clarified and re-evaluated. A substantially revised version may be posted in the future

  31. arXiv:2511.08230   

    cs.CL

    VocalBench-zh: Decomposing and Benchmarking the Speech Conversational Abilities in Mandarin Context

    Authors: Heyang Liu, Ziyang Cheng, Yuhao Wang, Hongcheng Liu, Yiqi Li, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang

    Abstract: The development of multi-modal large language models (LLMs) leads to intelligent approaches capable of speech interactions. As one of the most widely spoken languages globally, Mandarin is supported by most models to enhance their applicability and reach. However, the scarcity of comprehensive speech-to-speech (S2S) benchmarks in Mandarin contexts impedes systematic evaluation for developers and h… ▽ More

    Submitted 16 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: This article will serve as an extension of the preceding work, "VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models" (arXiv:2505.15727). Therefore, we have chosen to withdraw to avoid potential duplicate publication. We will update the previously open-sourced paper of VocalBench in several weeks to include the content of VocalBench-zh

  32. arXiv:2511.07148  [pdf, ps, other

    cs.CL

    TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine

    Authors: Zihao Cheng, Yuheng Lu, Huaiqian Ye, Zeming Liu, Minqi Wang, Jingjing Liu, Zihan Li, Wei Fan, Yuanfang Guo, Ruiji Fu, Shifeng She, Gang Wang, Yunhong Wang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in modern medicine, yet their application in Traditional Chinese Medicine (TCM) remains severely limited by the absence of standardized benchmarks and the scarcity of high-quality training data. To address these challenges, we introduce TCM-Eval, the first dynamic and extensible benchmark for TCM, meticulously curated from nati… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

    Comments: Work in Progress

  33. arXiv:2511.06897  [pdf, ps, other

    cs.CV

    Adaptive Morph-Patch Transformer for Aortic Vessel Segmentation

    Authors: Zhenxi Zhang, Fuchen Zheng, Adnan Iltaf, Yifei Han, Zhenyu Cheng, Yue Du, Bin Li, Tianyong Liu, Shoujun Zhou

    Abstract: Accurate segmentation of aortic vascular structures is critical for diagnosing and treating cardiovascular diseases.Traditional Transformer-based models have shown promise in this domain by capturing long-range dependencies between vascular features. However, their reliance on fixed-size rectangular patches often influences the integrity of complex vascular structures, leading to suboptimal segmen… ▽ More

    Submitted 11 November, 2025; v1 submitted 10 November, 2025; originally announced November 2025.

    Comments: This is the preprint version of a paper accepted by AAAI 2026. The final version will appear in the AAAI Proceedings

  34. arXiv:2511.05965  [pdf, ps, other

    cs.CV cs.AI

    Adaptive Agent Selection and Interaction Network for Image-to-point cloud Registration

    Authors: Zhixin Cheng, Xiaotian Yin, Jiacheng Deng, Bohao Liao, Yujia Chen, Xu Zhou, Baoqun Yin, Tianzhu Zhang

    Abstract: Typical detection-free methods for image-to-point cloud registration leverage transformer-based architectures to aggregate cross-modal features and establish correspondences. However, they often struggle under challenging conditions, where noise disrupts similarity computation and leads to incorrect correspondences. Moreover, without dedicated designs, it remains difficult to effectively select in… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI2026

  35. arXiv:2511.04976  [pdf, ps, other

    cs.RO

    iFlyBot-VLM Technical Report

    Authors: Xin Nie, Zhiyuan Cheng, Yuan Zhang, Chao Ji, Jiajia Wu, Yuhan Zhang, Jia Pan

    Abstract: We introduce iFlyBot-VLM, a general-purpose Vision-Language Model (VLM) used to improve the domain of Embodied Intelligence. The central objective of iFlyBot-VLM is to bridge the cross-modal semantic gap between high-dimensional environmental perception and low-level robotic motion control. To this end, the model abstracts complex visual and spatial information into a body-agnostic and transferabl… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

  36. arXiv:2511.03929  [pdf, ps, other

    cs.LG cs.AI cs.CV

    NVIDIA Nemotron Nano V2 VL

    Authors: NVIDIA, :, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Guo Chen, Karan Sapra, Zhiding Yu, Adi Renduchintala, Charles Wang, Peter Jin, Arushi Goel, Mike Ranzinger, Lukas Voegtle, Philipp Fischer, Timo Roman, Wei Ping, Boxin Wang, Zhuolin Yang , et al. (99 additional authors not shown)

    Abstract: We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and… ▽ More

    Submitted 6 November, 2025; v1 submitted 5 November, 2025; originally announced November 2025.

  37. arXiv:2511.03285  [pdf

    cs.LG

    Graph Neural AI with Temporal Dynamics for Comprehensive Anomaly Detection in Microservices

    Authors: Qingyuan Zhang, Ning Lyu, Le Liu, Yuxi Wang, Ziyu Cheng, Cancan Hua

    Abstract: This study addresses the problem of anomaly detection and root cause tracing in microservice architectures and proposes a unified framework that combines graph neural networks with temporal modeling. The microservice call chain is abstracted as a directed graph, where multidimensional features of nodes and edges are used to construct a service topology representation, and graph convolution is appl… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  38. arXiv:2511.03279  [pdf

    cs.LG

    Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning

    Authors: Ning Lyu, Yuxi Wang, Ziyu Cheng, Qingyuan Zhang, Feng Chen

    Abstract: As cloud computing and microservice architectures become increasingly prevalent, API rate limiting has emerged as a critical mechanism for ensuring system stability and service quality. Traditional rate limiting algorithms, such as token bucket and sliding window, while widely adopted, struggle to adapt to dynamic traffic patterns and varying system loads. This paper proposes an adaptive rate limi… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

  39. arXiv:2511.01170  [pdf, ps, other

    cs.AI

    DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models

    Authors: Ruofan Zhang, Bin Xia, Zhen Cheng, Cairen Jian, Minglun Yang, Ngai Wong, Yuan Cheng

    Abstract: Adaptive reasoning is essential for aligning the computational effort of large language models (LLMs) with the intrinsic difficulty of problems. Current chain-of-thought methods boost reasoning ability but indiscriminately generate long explanations, leading to evident inefficiency. However, existing reinforcement learning approaches to adaptive thinking remain unstable and heavily reward-dependen… ▽ More

    Submitted 16 December, 2025; v1 submitted 2 November, 2025; originally announced November 2025.

  40. arXiv:2510.25025  [pdf, ps, other

    cs.CR cs.IR cs.LG

    Secure Retrieval-Augmented Generation against Poisoning Attacks

    Authors: Zirui Cheng, Jikai Sun, Anjun Gao, Yueyang Quan, Zhuqing Liu, Xiaohua Hu, Minghong Fang

    Abstract: Large language models (LLMs) have transformed natural language processing (NLP), enabling applications from content generation to decision support. Retrieval-Augmented Generation (RAG) improves LLMs by incorporating external knowledge but also introduces security risks, particularly from data poisoning, where the attacker injects poisoned texts into the knowledge database to manipulate system outp… ▽ More

    Submitted 9 November, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

    Comments: To appear in IEEE BigData 2025

  41. arXiv:2510.21668  [pdf, ps, other

    cs.GT cs.IT

    Privacy Guarantee for Nash Equilibrium Computation of Aggregative Games Based on Pointwise Maximal Leakage

    Authors: Zhaoyang Cheng, Guanpu Chen, Tobias J. Oechtering, Mikael Skoglund

    Abstract: Privacy preservation has served as a key metric in designing Nash equilibrium (NE) computation algorithms. Although differential privacy (DP) has been widely employed for privacy guarantees, it does not exploit prior distributional knowledge of datasets and is ineffective in assessing information leakage for correlated datasets. To address these concerns, we establish a pointwise maximal leakage (… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

  42. arXiv:2510.19338  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning

    Authors: Ling Team, Bin Han, Caizhi Tang, Chen Liang, Donghao Zhang, Fan Yuan, Feng Zhu, Jie Gao, Jingyu Hu, Longfei Li, Meng Li, Mingyang Zhang, Peijie Jiang, Peng Jiao, Qian Zhao, Qingyuan Yang, Wenbo Shen, Xinxing Yang, Yalin Zhang, Yankun Ren, Yao Zhao, Yibo Cao, Yixuan Sun, Yue Zhang, Yuchen Fang , et al. (3 additional authors not shown)

    Abstract: In this technical report, we present the Ring-linear model series, specifically including Ring-mini-linear-2.0 and Ring-flash-linear-2.0. Ring-mini-linear-2.0 comprises 16B parameters and 957M activations, while Ring-flash-linear-2.0 contains 104B parameters and 6.1B activations. Both models adopt a hybrid architecture that effectively integrates linear attention and softmax attention, significant… ▽ More

    Submitted 23 October, 2025; v1 submitted 22 October, 2025; originally announced October 2025.

    Comments: 20 pages, 13 figures

  43. arXiv:2510.16028  [pdf, ps, other

    cs.CR cs.AI cs.LG eess.SY

    Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks

    Authors: Jianzhu Yao, Hongxu Su, Taobo Liao, Zerui Cheng, Huan Zhang, Xuechao Wang, Pramod Viswanath

    Abstract: Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard… ▽ More

    Submitted 21 October, 2025; v1 submitted 15 October, 2025; originally announced October 2025.

    Comments: 17 pages, 7 figures

  44. arXiv:2510.15215  [pdf

    cs.DC

    Spatiotemporal Traffic Prediction in Distributed Backend Systems via Graph Neural Networks

    Authors: Zhimin Qiu, Feng Liu, Yuxiao Wang, Chenrui Hu, Ziyu Cheng, Di Wu

    Abstract: This paper addresses the problem of traffic prediction in distributed backend systems and proposes a graph neural network based modeling approach to overcome the limitations of traditional models in capturing complex dependencies and dynamic features. The system is abstracted as a graph with nodes and edges, where node features represent traffic and resource states, and adjacency relations describ… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  45. arXiv:2510.15210  [pdf

    cs.NI

    Structural Generalization for Microservice Routing Using Graph Neural Networks

    Authors: Chenrui Hu, Ziyu Cheng, Di Wu, Yuxiao Wang, Feng Liu, Zhimin Qiu

    Abstract: This paper focuses on intelligent routing in microservice systems and proposes an end-to-end optimization framework based on graph neural networks. The goal is to improve routing decision efficiency and overall system performance under complex topologies. The method models invocation relationships among microservices as a graph. In this graph, service nodes and communication links are treated as g… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  46. arXiv:2510.12803  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.PL

    AutoCode: LLMs as Problem Setters for Competitive Programming

    Authors: Shang Zhou, Zihan Zheng, Kaiyuan Liu, Zeyu Shen, Zerui Cheng, Zexing Chen, Hansen He, Jianzhu Yao, Huanzhi Mao, Qiuyang Mang, Tianfu Fu, Beichen Li, Dongruixuan Li, Wenhao Chai, Zhuang Liu, Aleksandra Korolova, Peter Henderson, Natasha Jaques, Pramod Viswanath, Saining Xie, Jingbo Shang

    Abstract: Writing competitive programming problems is exacting. Authors must: set constraints, input distributions, and edge cases that rule out shortcuts; target specific algorithms (e.g., max-flow, dynamic programming, data structures); and calibrate complexity beyond the reach of most competitors. We argue that this makes for an ideal test of general large language model capabilities and study whether th… ▽ More

    Submitted 29 September, 2025; originally announced October 2025.

    Comments: Project page: https://livecodebenchpro.com/projects/autocode/overview

  47. arXiv:2510.11877  [pdf, ps, other

    cs.LG cs.GT

    Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling

    Authors: Xiaohang Tang, Zhuowen Cheng, Satyabrat Kumar

    Abstract: The Transformer, a highly expressive architecture for sequence modeling, has recently been adapted to solve sequential decision-making, most notably through the Decision Transformer (DT), which learns policies by conditioning on desired returns. Yet, the adversarial robustness of reinforcement learning methods based on sequence modeling remains largely unexplored. Here we introduce the Conservativ… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

    Comments: Accepted by Reliable ML Workshop @ NeurIPS 2025

  48. arXiv:2510.11639  [pdf, ps, other

    cs.IR

    OneRec-Think: In-Text Reasoning for Generative Recommendation

    Authors: Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai , et al. (1 additional authors not shown)

    Abstract: The powerful generative capacity of Large Language Models (LLMs) has instigated a paradigm shift in recommendation. However, existing generative models (e.g., OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning-a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, re… ▽ More

    Submitted 11 November, 2025; v1 submitted 13 October, 2025; originally announced October 2025.

  49. arXiv:2510.10168  [pdf, ps, other

    cs.AI

    Concise Reasoning in the Lens of Lagrangian Optimization

    Authors: Chengqian Gao, Haonan Li, Taylor W. Killian, Jianshu She, Renxi Wang, Liqun Ma, Zhoujun Cheng, Shibo Hao, Zhiqiang Xu

    Abstract: Concise reasoning in large language models seeks to generate only essential intermediate steps needed to arrive at a final answer, thereby alleviating issues of overthinking. Most proposed approaches hinge on carefully hand-crafted heuristics, struggling to balance concision with performance, often failing to adapt across domains and model scales. In this work, we address these challenges by intro… ▽ More

    Submitted 14 October, 2025; v1 submitted 11 October, 2025; originally announced October 2025.

  50. arXiv:2510.09997  [pdf, ps, other

    cs.GR cs.CV

    CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting

    Authors: Zhigang Cheng, Mingchao Sun, Yu Liu, Zengye Ge, Luyang Tang, Mu Xu, Yangyan Li, Peng Pan

    Abstract: Level of Detail (LoD) is a fundamental technique in real-time computer graphics for managing the rendering costs of complex scenes while preserving visual fidelity. Traditionally, LoD is implemented using discrete levels (DLoD), where multiple, distinct versions of a model are swapped out at different distances. This long-standing paradigm, however, suffers from two major drawbacks: it requires si… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.