Skip to main content

Showing 1–50 of 314 results for author: Mao, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.12365  [pdf, ps, other

    cs.NE

    Adaptive Spiking Neurons for Vision and Language Modeling

    Authors: Chenlin Zhou, Sihang Guo, Jiaqi Wang, Dongyang Ma, Jin Cheng, Qingyan Meng, Zhengyu Ma, Yonghong Tian

    Abstract: Regarded as the third generation of neural networks, Spiking Neural Networks (SNNs) have garnered significant traction due to their biological plausibility and energy efficiency. Recent advancements in large models necessitate spiking neurons capable of high performance, adaptability, and training efficiency. In this work, we first propose a novel functional perspective that provides general guida… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: 10 pages

  2. arXiv:2604.11778  [pdf, ps, other

    cs.CL cs.AI

    General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

    Authors: Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang, Wenling Yuan, Yifan Zhou, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai

    Abstract: Contemporary large language models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in specialized domains like mathematics and physics. However, their ability to generalize these reasoning skills to more general and broader contexts--often termed general reasoning--remains under-explored. Unlike domain-specific reasoning, general reasoning relies less on expert knowledge b… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: 17 pages, 9 figures

  3. arXiv:2604.11321  [pdf, ps, other

    cs.NE

    Winner-Take-All Spiking Transformer for Language Modeling

    Authors: Chenlin Zhou, Sihang Guo, Jiaqi Wang, Dongyang Ma, Kaiwei Che, Baiyu Chen, Qingyan Meng, Zhengyu Ma, Yonghong Tian

    Abstract: Spiking Transformers, which combine the scalability of Transformers with the sparse, energy-efficient property of Spiking Neural Networks (SNNs), have achieved impressive results in neuromorphic and vision tasks and attracted increasing attention. However, existing directly trained spiking transformers primarily focus on vision tasks. For language modeling with spiking transformer, convergence rel… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: 15 pages

  4. arXiv:2604.11198   

    cs.LG

    Towards Situation-aware State Modeling for Air Traffic Flow Prediction

    Authors: Anqi Liu, Bin Wang, Jiangtao Zhao, Dechuan Ma, Guiyuan Jiang, Feng Hong, Yanwei Yu, Tianrui Li

    Abstract: Accurate air traffic prediction in the terminal airspace (TA) is pivotal for proactive air traffic management (ATM). However, existing data-driven approaches predominantly rely on time series-based forecasting paradigms, which inherently overlook critical aircraft state information, such as real-time kinematics and proximity to airspace boundaries. To address this limitation, we propose \textit{Ae… ▽ More

    Submitted 14 April, 2026; v1 submitted 13 April, 2026; originally announced April 2026.

    Comments: There are issues with the authors of the paper I submitted, as well as problems with the content of the article, so it needs to be withdrawn. Thank you for your understanding

  5. arXiv:2604.11095  [pdf, ps, other

    cs.LG cs.AI

    Bottleneck Tokens for Unified Multimodal Retrieval

    Authors: Siyu Sun, Jing Ren, Zhaohe Liao, Dongxiao Mao, Xiangyuan Ren, Yiyi Zhang, Haohua Zhao, Weixiong Lin, Jiang Shaohua, Liqing Zhang, Yuchao Zheng

    Abstract: Adapting decoder-only multimodal large language models (MLLMs) for unified multimodal retrieval faces two structural gaps. First, existing methods rely on implicit pooling, which overloads the hidden state of a standard vocabulary token (e.g., <EOS>) as the sequence-level representation, a mechanism never designed for information aggregation. Second, contrastive fine-tuning specifies what the embe… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

  6. arXiv:2604.07988  [pdf, ps, other

    cs.DC cs.AI

    LogAct: Enabling Agentic Reliability via Shared Logs

    Authors: Mahesh Balakrishnan, Ashwin Bharambe, Davide Testuggine, David Geraghty, David Mao, Vidhya Venkat, Ilya Mironov, Rithesh Baradi, Gayathri Aiyer, Victoria Dudin

    Abstract: Agents are LLM-driven components that can mutate environments in powerful, arbitrary ways. Extracting guarantees for the execution of agents in production environments can be challenging due to asynchrony and failures. In this paper, we propose a new abstraction called LogAct, where each agent is a deconstructed state machine playing a shared log. In LogAct, agentic actions are visible in the shar… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  7. arXiv:2604.04771  [pdf, ps, other

    cs.CV cs.CL

    MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

    Authors: Bin Wang, Tianyao He, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Tao Chu, Yuan Qu, Zhenjiang Jin, Weijun Zeng, Ziyang Miao, Bangrui Xu, Junbo Niu, Mengzhang Cai, Jiantao Qiu, Qintong Zhang, Dongsheng Ma, Yuefeng Sun, Hejun Dong, Wenzheng Zhang, Jutao Xiao, Jiayong Shi, Pengyu Liao, Xiaomeng Zhao, Huaping Zhong, Liqun Wei , et al. (18 additional authors not shown)

    Abstract: Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training… ▽ More

    Submitted 9 April, 2026; v1 submitted 6 April, 2026; originally announced April 2026.

    Comments: Technical Report

  8. arXiv:2604.02794  [pdf, ps, other

    cs.AI

    CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

    Authors: Situo Zhang, Yifan Zhang, Zichen Zhu, Da Ma, Lei Pan, Danyang Zhang, Zihan Zhao, Lu Chen, Kai Yu

    Abstract: Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the need for fine-grained visual grounding and precise numerical computation. To address these challenges, we first propose DuoChart, a scalable dual-source da… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  9. arXiv:2604.01526  [pdf, ps, other

    cs.LG

    Learning ECG Image Representations via Dual Physiological-Aware Alignments

    Authors: Hung Manh Pham, Jialu Tang, Aaqib Saeed, Dong Ma, Bin Zhu, Pan Zhou

    Abstract: Electrocardiograms (ECGs) are among the most widely used diagnostic tools for cardiovascular diseases, and a large amount of ECG data worldwide appears only in image form. However, most existing automated ECG analysis methods rely on access to raw signal recordings, limiting their applicability in real-world and resource-constrained settings. In this paper, we present ECG-Scan, a self-supervised f… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  10. arXiv:2603.23884  [pdf, ps, other

    cs.GL

    POSIM: A Multi-Agent Simulation Framework for Social Media Public Opinion Evolution and Governance

    Authors: Yongmao Zhang, Kai Qiao, Zhengyan Wang, Ningning Liang, Dekui Ma, Wenyao Sun, Jian Chen, Bin Yan

    Abstract: Modeling social media public opinion evolution is essential for governance decision-making. Traditional epidemic models and rule-based agent-based models (ABMs) fail to capture the cognitive processes and adaptive behaviors of real users. Recent large language model (LLM)-based social simulations can reproduce group-level phenomena like polarization and conformity, yet remain unable to recreate th… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  11. arXiv:2603.19564  [pdf, ps, other

    cs.LG

    Wearable Foundation Models Should Go Beyond Static Encoders

    Authors: Yu Yvonne Wu, Yuwei Zhang, Hyungjun Yoon, Ting Dang, Dimitris Spathis, Tong Xia, Qiang Yang, Jing Han, Dong Ma, Sung-Ju Lee, Cecilia Mascolo

    Abstract: Wearable foundation models (WFMs), trained on large volumes of data collected by affordable, always-on devices, have demonstrated strong performance on short-term, well-defined health monitoring tasks, including activity recognition, fitness tracking, and cardiovascular signal assessment. However, most existing WFMs primarily map short temporal windows to predefined labels via static encoders, emp… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 13 pages

  12. arXiv:2603.12792  [pdf, ps, other

    cs.IT

    Upward Spatial Coverage Recovery via Movable Antenna in Low-Altitude Communications

    Authors: Kan Yu, Kaixuan Li, Yujia Zhao, Dingyou Ma, Qixun Zhang, Zhiyong Feng

    Abstract: The rapid proliferation of unmanned aerial vehicle (UAV) applications imposes stringent requirements on continuous and reliable communication coverage in low-altitude airspace. Conventional cellular systems built upon fixed-position antennas (FPAs) are inherently constrained by static array geometries and limited mechanical degrees of freedom, which severely restrict their ability to adapt to high… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  13. arXiv:2603.11408  [pdf, ps, other

    q-fin.ST cs.CL

    Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction

    Authors: Dehao Dai, Ding Ma, Dou Liu, Kerui Geng, Yiqing Wang

    Abstract: Forecasting crude oil prices remains challenging because market-relevant information is embedded in large volumes of unstructured news and is not fully captured by traditional polarity-based sentiment measures. This paper examines whether multi-dimensional sentiment signals extracted by large language models improve the prediction of weekly WTI crude oil futures returns. Using energy-sector news a… ▽ More

    Submitted 16 March, 2026; v1 submitted 11 March, 2026; originally announced March 2026.

    Comments: 28 pages, 4 figures, 4 tables

  14. arXiv:2603.10468  [pdf, ps, other

    eess.AS cs.AI cs.HC cs.MM cs.SD

    G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

    Authors: Jing Peng, Ziyi Chen, Haoyu Li, Yucheng Wang, Duo Ma, Mengtian Li, Yunfan Du, Dezhu Xu, Kai Yu, Shuai Wang

    Abstract: We study timestamped speaker-attributed ASR for long-form, multi-party speech with overlap, where chunk-wise inference must preserve meeting-level speaker identity consistency while producing time-stamped, speaker-labeled transcripts. Previous Speech-LLM systems tend to prioritize either local diarization or global labeling, but often lack the ability to capture fine-grained temporal boundaries or… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: submitted to Interspeech 2026

  15. arXiv:2603.06956  [pdf, ps, other

    cs.CV

    Virtual Intraoperative CT (viCT): Sequential Anatomic Updates for Modeling Tissue Resection Throughout Endoscopic Sinus Surgery

    Authors: Nicole M. Gunderson, Graham J. Harris, Jeremy S. Ruthberg, Pengcheng Chen, Di Mao, Randall A. Bly, Waleed M. Abuzeid, Eric J. Seibel

    Abstract: Purpose: Incomplete dissection is a common cause of persistent disease and revision endoscopic sinus surgery (ESS) in chronic rhinosinusitis. Current image-guided surgery systems typically reference static preoperative CT (pCT), and do not model evolving resection boundaries. We present Virtual Intraoperative CT (viCT), a method for sequentially updating pCT throughout ESS using intraoperative 3D… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  16. arXiv:2603.03331  [pdf, ps, other

    cs.CL cs.AI

    PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning

    Authors: Hung Manh Pham, Jinyang Wu, Xiao Ma, Yiming Zhang, Yixin Xu, Aaqib Saeed, Bin Zhu, Zhou Pan, Dong Ma

    Abstract: Photoplethysmography (PPG) is a widely used non-invasive sensing modality for continuous cardiovascular and physiological monitoring across clinical, laboratory, and wearable settings. While existing PPG datasets support a broad range of downstream tasks, they typically provide supervision in the form of numerical measurements or task-specific labels, limiting their suitability for language-based… ▽ More

    Submitted 10 February, 2026; originally announced March 2026.

    Comments: PulseLM v1

  17. arXiv:2603.02680  [pdf, ps, other

    cs.AI

    LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

    Authors: Yang Zhao, Zihao Li, Zhiyu Jiang, Dandan Ma, Ganchao Liu, Wenzhe Zhao

    Abstract: While Large Language Models (LLMs) form the cornerstone of sequential decision-making agent development, they have inherent limitations in high-frequency decision tasks. Existing research mainly focuses on discrete embodied decision scenarios with low-frequency and significant semantic differences in state space (e.g., household planning). These methods suffer from limited performance in high-freq… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

  18. arXiv:2602.24134  [pdf, ps, other

    cs.CV cs.CL

    AgenticOCR: Parsing Only What You Need for Efficient Retrieval-Augmented Generation

    Authors: Zhengren Wang, Dongsheng Ma, Huaping Zhong, Jiayu Li, Wentao Zhang, Bin Wang, Conghui He

    Abstract: The expansion of retrieval-augmented generation (RAG) into multimodal domains has intensified the challenge for processing complex visual documents, such as financial reports. While page-level chunking and retrieval is a natural starting point, it creates a critical bottleneck: delivering entire pages to the generator introduces excessive extraneous context. This not only overloads the generator's… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

  19. arXiv:2602.23719  [pdf, ps, other

    cs.RO cs.AI

    SAGE-LLM: Towards Safe and Generalizable LLM Controller with Fuzzy-CBF Verification and Graph-Structured Knowledge Retrieval for UAV Decision

    Authors: Wenzhe Zhao, Yang Zhao, Ganchao Liu, Zhiyu Jiang, Dandan Ma, Zihao Li, Xuelong Li

    Abstract: In UAV dynamic decision, complex and variable hazardous factors pose severe challenges to the generalization capability of algorithms. Despite offering semantic understanding and scene generalization, Large Language Models (LLM) lack domain-specific UAV control knowledge and formal safety assurances, restricting their direct applicability. To bridge this gap, this paper proposes a train-free two-l… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

  20. arXiv:2602.19840  [pdf, ps, other

    cs.CL

    SAMAS: A Spectrum-Guided Multi-Agent System for Achieving Style Fidelity in Literary Translation

    Authors: Jingzhuo Wu, Jiajun Zhang, Keyan Jin, Dehua Ma, Junbo Wang

    Abstract: Modern large language models (LLMs) excel at generating fluent and faithful translations. However, they struggle to preserve an author's unique literary style, often producing semantically correct but generic outputs. This limitation stems from the inability of current single-model and static multi-agent systems to perceive and adapt to stylistic variations. To address this, we introduce the Style… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

  21. arXiv:2602.12108  [pdf, ps, other

    cs.AI

    The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

    Authors: Xiaoyuan Liu, Tian Liang, Dongyang Ma, Deyu Zhou, Haitao Mi, Pinjia He, Yan Wang

    Abstract: In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

  22. arXiv:2602.10604  [pdf, ps, other

    cs.CL cs.AI

    Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

    Authors: Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao, Changyi Wan, Chao Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengting Feng, Chengyuan Yao, Chunrui Han, Dan Ma, Dapeng Shi, Daxin Jiang, Dehua Ma, Deshan Sun , et al. (191 additional authors not shown)

    Abstract: We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/f… ▽ More

    Submitted 23 February, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

    Comments: Technical report for Step 3.5 Flash

  23. arXiv:2602.08030  [pdf, ps, other

    cs.AI cs.CL

    Free(): Learning to Forget in Malloc-Only Reasoning Models

    Authors: Yilun Zheng, Dongyang Ma, Tian Liang, Jiahao Xu, Xinting Huang, Lihui Chen, Haitao Mi, Yan Wang

    Abstract: Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To brea… ▽ More

    Submitted 10 February, 2026; v1 submitted 8 February, 2026; originally announced February 2026.

  24. arXiv:2602.05085  [pdf, ps, other

    cs.CL

    Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

    Authors: Sidi Lu, Zhenwen Liang, Dongyang Ma, Yan Wang, Haitao Mi, Dong Yu

    Abstract: In this paper, we aim to bridge test-time-training with a new type of parametric memory that can be flexibly offloaded from or merged into model parameters. We present Locas, a Locally-Supported parametric memory that shares the design of FFN blocks in modern transformers, allowing it to be flexibly permanentized into the model parameters while supporting efficient continual learning. We discuss t… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

    Comments: Tencent AI Lab Technical Report

  25. arXiv:2602.01274  [pdf, ps, other

    cs.CL cs.AI

    PACER: Blockwise Pre-verification for Speculative Decoding with Adaptive Length

    Authors: Situo Zhang, Yifan Zhang, Zichen Zhu, Hankun Wang, Da Ma, Danyang Zhang, Lu Chen, Kai Yu

    Abstract: Speculative decoding (SD) is a powerful technique for accelerating the inference process of large language models (LLMs) without sacrificing accuracy. Typically, SD employs a small draft model to generate a fixed number of draft tokens, which are then verified in parallel by the target model. However, our experiments reveal that the optimal draft length varies significantly across different decodi… ▽ More

    Submitted 1 February, 2026; originally announced February 2026.

  26. arXiv:2602.00846  [pdf, ps, other

    cs.CL

    Omni-RRM: Advancing Omni Reward Modeling via Automatic Rubric-Grounded Preference Synthesis

    Authors: Zicheng Kong, Dehua Ma, Zhenbo Xu, Alven Yang, Yiwei Ru, Haoran Wang, Zixuan Zhou, Fuqing Bie, Liuyu Xiang, Huijia Wu, Jian Zhao, Zhaofeng He

    Abstract: Multimodal large language models (MLLMs) have shown remarkable capabilities, yet their performance is often capped by the coarse nature of existing alignment techniques. A critical bottleneck remains the lack of effective reward models (RMs): existing RMs are predominantly vision-centric, return opaque scalar scores, and rely on costly human annotations. We introduce \textbf{Omni-RRM}, the first o… ▽ More

    Submitted 31 January, 2026; originally announced February 2026.

  27. arXiv:2601.20906  [pdf, ps, other

    cs.LG

    TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins

    Authors: Nikita Makarov, Maria Bordukova, Lena Voith von Voithenberg, Estrella Pivel-Villanueva, Sabrina Mielke, Jonathan Wickes, Hanchen Wang, Mingyu Derek Ma, Keunwoo Choi, Kyunghyun Cho, Stephen Ra, Raul Rodriguez-Esteban, Fabian Schmich, Michael Menden

    Abstract: Precision oncology requires forecasting clinical events and trajectories, yet modeling sparse, multi-modal clinical time series remains a critical challenge. We introduce TwinWeaver, an open-source framework that serializes longitudinal patient histories into text, enabling unified event prediction as well as forecasting with large language models, and use it to build Genie Digital Twin (GDT) on 9… ▽ More

    Submitted 9 February, 2026; v1 submitted 28 January, 2026; originally announced January 2026.

  28. arXiv:2601.20239  [pdf, ps, other

    cs.RO

    TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

    Authors: Zhemeng Zhang, Jiahua Ma, Xincheng Yang, Xin Wen, Yuzhi Zhang, Boyan Li, Yiran Qin, Jin Liu, Can Zhao, Li Kang, Haoqin Hong, Zhenfei Yin, Philip Torr, Hao Su, Ruimao Zhang, Daolin Ma

    Abstract: Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy… ▽ More

    Submitted 24 February, 2026; v1 submitted 27 January, 2026; originally announced January 2026.

  29. arXiv:2601.18497  [pdf, ps, other

    cs.HC

    BAIT: Visual-illusion-inspired Privacy Preservation for Mobile Data Visualization

    Authors: Sizhe Cheng, Songheng Zhang, Dong Ma, Yong Wang

    Abstract: With the prevalence of mobile data visualizations, there have been growing concerns about their privacy risks, especially shoulder surfing attacks. Inspired by prior research on visual illusion, we propose BAIT, a novel approach to automatically generate privacy-preserving visualizations by stacking a decoy visualization over a given visualization. It allows visualization owners at proximity to cl… ▽ More

    Submitted 3 February, 2026; v1 submitted 26 January, 2026; originally announced January 2026.

    Comments: Accepted by CHI'26

  30. arXiv:2601.18292  [pdf, ps, other

    cs.LG cs.AI

    TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment

    Authors: Zhewen Tan, Wenhan Yu, Jianfeng Si, Tongxin Liu, Kaiqi Guan, Huiyan Jin, Jiawen Tao, Xiaokun Yuan, Duohe Ma, Xiangzheng Zhang, Tong Yang, Lin Sun

    Abstract: In recent years, safety risks associated with large language models have become increasingly prominent, highlighting the urgent need to mitigate the generation of toxic and harmful content. The mainstream paradigm for LLM safety alignment typically adopts a collaborative framework involving three roles: an attacker for adversarial prompt generation, a defender for safety defense, and an evaluator… ▽ More

    Submitted 30 January, 2026; v1 submitted 26 January, 2026; originally announced January 2026.

  31. arXiv:2601.15867  [pdf, ps, other

    cs.CV

    Out-of-Distribution Detection Based on Total Variation Estimation

    Authors: Dabiao Ma, Zhiba Su, Jian Yang, Haojun Fei

    Abstract: This paper introduces a novel approach to securing machine learning model deployments against potential distribution shifts in practical applications, the Total Variation Out-of-Distribution (TV-OOD) detection method. Existing methods have produced satisfactory results, but TV-OOD improves upon these by leveraging the Total Variation Network Estimator to calculate each input's contribution to the… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  32. arXiv:2601.12988  [pdf, ps, other

    cs.LG

    PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient

    Authors: Zijian Wang, Tiancheng Huang, Hanqi Li, Da Ma, Lu Chen, Kai Yu

    Abstract: The accelerating growth of the scientific literature makes it increasingly difficult for researchers to track new advances through manual reading alone. Recent progress in large language models (LLMs) has therefore spurred interest in autonomous agents that can read scientific papers and extract task-relevant information. However, most existing approaches rely either on heavily engineered promptin… ▽ More

    Submitted 19 January, 2026; originally announced January 2026.

    Comments: 35 pages, 9 figures, 7 tables

  33. arXiv:2601.10718  [pdf, ps, other

    cs.AI cs.CL cs.IR cs.LG

    Japanese AI Agent System on Human Papillomavirus Vaccination: System Design

    Authors: Junyu Liu, Siwen Yang, Dexiu Ma, Qian Niu, Zequn Zhang, Momoko Nagai-Tanima, Tomoki Aoyama

    Abstract: Human papillomavirus (HPV) vaccine hesitancy poses significant public health challenges, particularly in Japan where proactive vaccination recommendations were suspended from 2013 to 2021. The resulting information gap is exacerbated by misinformation on social media, and traditional ways cannot simultaneously address individual queries while monitoring population-level discourse. This study aimed… ▽ More

    Submitted 15 December, 2025; originally announced January 2026.

  34. arXiv:2601.10457  [pdf, ps, other

    cs.AI

    NSR-Boost: A Neuro-Symbolic Residual Boosting Framework for Industrial Legacy Models

    Authors: Ziming Dai, Dabiao Ma, Jinle Tong, Mengyuan Han, Jian Yang, Hongtao Liu, Haojun Fei, Qing Yang

    Abstract: Although the Gradient Boosted Decision Trees (GBDTs) dominate industrial tabular applications, upgrading legacy models in high-concurrency production environments still faces prohibitive retraining costs and systemic risks. To address this problem, we present NSR-Boost, a neuro-symbolic residual boosting framework designed specifically for industrial scenarios. Its core advantage lies in being "no… ▽ More

    Submitted 31 January, 2026; v1 submitted 15 January, 2026; originally announced January 2026.

    Comments: 14 pages, 12 figures

  35. arXiv:2512.20491  [pdf, ps, other

    cs.CL

    Step-DeepResearch Technical Report

    Authors: Chen Hu, Haikuo Du, Heng Wang, Lin Lin, Mingrui Chen, Peng Liu, Ruihang Miao, Tianchi Yue, Wang You, Wei Ji, Wei Yuan, Wenjin Deng, Xiaojian Yuan, Xiaoyun Zhang, Xiangyu Liu, Xikai Liu, Yanming Xu, Yicheng Cao, Yifei Zhang, Yongyao Wang, Yubo Shu, Yurong Zhang, Yuxiang Zhang, Zheng Gong, Zhichao Chang , et al. (42 additional authors not shown)

    Abstract: As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-source verification. To address this, we introduce Step-DeepResearch, a cost-effective, end-to-end agent… ▽ More

    Submitted 29 December, 2025; v1 submitted 23 December, 2025; originally announced December 2025.

  36. arXiv:2512.16270  [pdf, ps, other

    cs.CV cs.AI

    TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering

    Authors: Rui Gui, Yang Wan, Haochen Han, Dongxing Mao, Fangming Liu, Min Li, Alex Jinpeng Wang

    Abstract: Text rendering has recently emerged as one of the most challenging frontiers in visual generation, drawing significant attention from large-scale diffusion and multimodal models. However, text editing within images remains largely unexplored, as it requires generating legible characters while preserving semantic, geometric, and contextual coherence. To fill this gap, we introduce TextEditBench, a… ▽ More

    Submitted 18 December, 2025; originally announced December 2025.

  37. arXiv:2512.00020  [pdf, ps, other

    cs.AR cs.AI

    Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead

    Authors: Guang Yang, Wei Zheng, Xiang Chen, Dong Liang, Peng Hu, Yukui Yang, Shaohang Peng, Zhenghan Li, Jiahui Feng, Xiao Wei, Kexin Sun, Deyuan Ma, Haotian Cheng, Yiheng Shen, Xing Hu, Terry Yue Zhuo, David Lo

    Abstract: Code generation has emerged as a critical research area at the intersection of Software Engineering (SE) and Artificial Intelligence (AI), attracting significant attention from both academia and industry. Within this broader landscape, Verilog, as a representative hardware description language (HDL), plays a fundamental role in digital circuit design and verification, making its automated generati… ▽ More

    Submitted 24 December, 2025; v1 submitted 29 October, 2025; originally announced December 2025.

    Comments: Under Review

  38. arXiv:2511.17136  [pdf, ps, other

    cs.SD cs.AI

    Device-Guided Music Transfer

    Authors: Manh Pham Hung, Changshuo Hu, Ting Dang, Dong Ma

    Abstract: Device-guided music transfer adapts playback across unseen devices for users who lack them. Existing methods mainly focus on modifying the timbre, rhythm, harmony, or instrumentation to mimic genres or artists, overlooking the diverse hardware properties of the playback device (i.e., speaker). Therefore, we propose DeMT, which processes a speaker's frequency response curve as a line graph using a… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

  39. arXiv:2511.16147  [pdf, ps, other

    cs.CL cs.AI

    TS-PEFT: Unveiling Token-Level Redundancy in Parameter-Efficient Fine-Tuning

    Authors: Dabiao Ma, Ziming Dai, Zhimin Xin, Shu Wang, Jian Yang, Haojun Fei

    Abstract: Current Parameter-Efficient Fine-Tuning (PEFT) methods typically operate under an implicit assumption: Once a target module is selected, every token passing through it contributes equally to the downstream task and requires a parameter update. In this paper, we challenge this convention by revealing a pervasive token-level redundancy in the fine-tuning of large models (LMs). We propose TS-PEFT, a… ▽ More

    Submitted 29 January, 2026; v1 submitted 20 November, 2025; originally announced November 2025.

    Comments: 11 pages, 3 figures

  40. iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

    Authors: Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang

    Abstract: Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they ov… ▽ More

    Submitted 19 March, 2026; v1 submitted 11 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026

    Journal ref: Proc. AAAI Conf. Artif. Intell. 40(42): 23984-23992, 2026

  41. arXiv:2511.08866  [pdf, ps, other

    cs.CL

    BioVerge: A Comprehensive Benchmark and Study of Self-Evaluating Agents for Biomedical Hypothesis Generation

    Authors: Fuyi Yang, Chenchen Ye, Mingyu Derek Ma, Yijia Xiao, Matthew Yang, Wei Wang

    Abstract: Hypothesis generation in biomedical research has traditionally centered on uncovering hidden relationships within vast scientific literature, often using methods like Literature-Based Discovery (LBD). Despite progress, current approaches typically depend on single data types or predefined extraction patterns, which restricts the discovery of novel and complex connections. Recent advances in Large… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

  42. arXiv:2511.02778  [pdf, ps, other

    cs.CV cs.CL

    VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

    Authors: Kevin Qinghong Lin, Yuhao Zheng, Hangyu Ran, Dantong Zhu, Dongxing Mao, Linjie Li, Philip Torr, Alex Jinpeng Wang

    Abstract: Code has emerged as a precise and executable medium for reasoning and action in the agent era. Yet, progress has largely focused on language-centric tasks such as program synthesis and debugging, leaving visual-centric coding underexplored. Inspired by how humans reason over sketches, we advocate SVG code as a compact, interpretable, and executable visual representation. We introduce VCode, a benc… ▽ More

    Submitted 4 November, 2025; originally announced November 2025.

    Comments: Project page: https://csu-jpg.github.io/VCode Github: https://github.com/CSU-JPG/VCode

  43. arXiv:2511.02228  [pdf, ps, other

    cs.CV cs.AI

    Collaborative Attention and Consistent-Guided Fusion of MRI and PET for Alzheimer's Disease Diagnosis

    Authors: Delin Ma, Menghui Zhou, Jun Qi, Yun Yang, Po Yang

    Abstract: Alzheimer's disease (AD) is the most prevalent form of dementia, and its early diagnosis is essential for slowing disease progression. Recent studies on multimodal neuroimaging fusion using MRI and PET have achieved promising results by integrating multi-scale complementary features. However, most existing approaches primarily emphasize cross-modal complementarity while overlooking the diagnostic… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  44. arXiv:2510.26768  [pdf, ps, other

    cs.CL cs.AI

    AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

    Authors: Shengnan An, Xunliang Cai, Xuezhi Cao, Xiaoyu Li, Yehao Lin, Junlin Liu, Xinxuan Lv, Dan Ma, Xuanlin Wang, Ziwen Wang, Shuang Zhou

    Abstract: We present AMO-Bench, an Advanced Mathematical reasoning benchmark with Olympiad level or even higher difficulty, comprising 50 human-crafted problems. Existing benchmarks have widely leveraged high school math competitions for evaluating mathematical reasoning capabilities of large language models (LLMs). However, many existing math competitions are becoming less effective for assessing top-tier… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

    Comments: 14 pages, 9 figures

  45. arXiv:2510.26697  [pdf, ps, other

    cs.CL cs.AI

    The End of Manual Decoding: Towards Truly End-to-End Language Models

    Authors: Zhichao Wang, Dongyang Ma, Xinting Huang, Deng Cai, Tian Lan, Jiahao Xu, Haitao Mi, Xiaoying Tang, Yan Wang

    Abstract: The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel architecture that enables truly "end-to-end" generation by learning to control its own decoding strategy. We augment the standard transformer with lightweight head… ▽ More

    Submitted 31 October, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

  46. arXiv:2510.22126  [pdf, ps, other

    cs.RO

    EasyUUV: An LLM-Enhanced Universal and Lightweight Sim-to-Real Reinforcement Learning Framework for UUV Attitude Control

    Authors: Guanwen Xie, Jingzehua Xu, Jiwei Tang, Yubo Huang, Zixi Wang, Shuai Zhang, Dongfang Ma, Juntian Qu, Xiaofan Li

    Abstract: Despite recent advances in Unmanned Underwater Vehicle (UUV) attitude control, existing methods still struggle with generalizability, robustness to real-world disturbances, and efficient deployment. To address the above challenges, this paper presents EasyUUV, a Large Language Model (LLM)-enhanced, universal, and lightweight simulation-to-reality reinforcement learning (RL) framework for robust at… ▽ More

    Submitted 12 February, 2026; v1 submitted 24 October, 2025; originally announced October 2025.

    Comments: 10 pages, 13 figures

  47. arXiv:2510.21551  [pdf, ps, other

    cs.LG

    Interpretable Multimodal Zero-Shot ECG Diagnosis via Structured Clinical Knowledge Alignment

    Authors: Jialu Tang, Hung Manh Pham, Ignace De Lathauwer, Henk S. Schipper, Yuan Lu, Dong Ma, Aaqib Saeed

    Abstract: Electrocardiogram (ECG) interpretation is essential for cardiovascular disease diagnosis, but current automated systems often struggle with transparency and generalization to unseen conditions. To address this, we introduce ZETA, a zero-shot multimodal framework designed for interpretable ECG diagnosis aligned with clinical workflows. ZETA uniquely compares ECG signals against structured positive… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

  48. arXiv:2510.17932  [pdf, ps, other

    cs.SE cs.AI

    From Charts to Code: A Hierarchical Benchmark for Multimodal Models

    Authors: Jiahao Tang, Henry Hengyuan Zhao, Lijian Wu, Yifei Tao, Dongxing Mao, Yang Wan, Jingru Tan, Min Zeng, Min Li, Alex Jinpeng Wang

    Abstract: We introduce Chart2Code, a new benchmark for evaluating the chart understanding and code generation capabilities of large multimodal models (LMMs). Chart2Code is explicitly designed from a user-driven perspective, capturing diverse real-world scenarios and progressively increasing task difficulty. It consists of three levels: Level 1 (Chart Reproduction) reproduces charts from a reference figure a… ▽ More

    Submitted 21 January, 2026; v1 submitted 20 October, 2025; originally announced October 2025.

  49. arXiv:2510.12171  [pdf, ps, other

    cs.AI

    MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science

    Authors: Junkai Zhang, Jingru Gan, Xiaoxuan Wang, Zian Jia, Changquan Gu, Jianpeng Chen, Yanqiao Zhu, Mingyu Derek Ma, Dawei Zhou, Ling Li, Wei Wang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities in scientific reasoning, yet their reasoning capabilities in materials science remain underexplored. To fill this gap, we introduce MatSciBench, a comprehensive college-level benchmark comprising 1,340 problems that span the essential subdisciplines of materials science. MatSciBench features a structured and fine-grained taxonomy… ▽ More

    Submitted 14 October, 2025; originally announced October 2025.

  50. arXiv:2509.25540  [pdf, ps, other

    cs.AI

    RadOnc-GPT: An Autonomous LLM Agent for Real-Time Patient Outcomes Labeling at Scale

    Authors: Jason Holmes, Yuexing Hao, Mariana Borras-Osorio, Federico Mastroleo, Santiago Romero Brufau, Valentina Carducci, Katie M Van Abel, David M Routman, Andrew Y. K. Foong, Liv M Muller, Satomi Shiraishi, Daniel K Ebner, Daniel J Ma, Sameer R Keole, Samir H Patel, Mirek Fatyga, Martin Bues, Brad J Stish, Yolanda I Garces, Michelle A Neben Wittich, Robert L Foote, Sujay A Vora, Nadia N Laack, Mark R Waddle, Wei Liu

    Abstract: Manual labeling limits the scale, accuracy, and timeliness of patient outcomes research in radiation oncology. We present RadOnc-GPT, an autonomous large language model (LLM)-based agent capable of independently retrieving patient-specific information, iteratively assessing evidence, and returning structured outcomes. Our evaluation explicitly validates RadOnc-GPT across two clearly defined tiers… ▽ More

    Submitted 12 December, 2025; v1 submitted 29 September, 2025; originally announced September 2025.