Skip to main content

Showing 1–50 of 316 results for author: Lu, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.27138  [pdf, ps, other

    cs.LG

    ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference

    Authors: Qiuyang Zhang, Kai Zhou, Ding Tang, Kai Lu, Cheng Li, Zhenyu Yang, Peng Xu, Jiguang Wan

    Abstract: Large language models encounter critical GPU memory capacity constraints during long-context inference, where KV cache memory consumption severely limits decode batch sizes. While existing research has explored offloading KV cache to DRAM, these approaches either demand frequent GPU-CPU data transfers or impose extensive CPU computation requirements, resulting in poor GPU utilization as the system… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: Accepted at the 63rd Design Automation Conference (DAC 2026)

  2. arXiv:2603.19724  [pdf, ps, other

    cs.CG

    Locality Sensitive Hashing in Hyperbolic Space

    Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

    Abstract: For a metric space $(X, d)$, a family $\mathcal{H}$ of locality sensitive hash functions is called $(r, cr, p_1, p_2)$ sensitive if a randomly chosen function $h\in \mathcal{H}$ has probability at least $p_1$ (at most $p_2$) to map any $a, b\in X$ in the same hash bucket if $d(a, b)\leq r$ (or $d(a, b)\geq cr$). Locality Sensitive Hashing (LSH) is one of the most popular techniques for approximate… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: 22 pages, 8 figures, socg 2026 paper

  3. arXiv:2603.19598  [pdf, ps, other

    cs.CV

    FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

    Authors: Zhifei Yang, Guangyao Zhai, Keyang Lu, YuYang Yin, Chao Zhang, Zhen Xiao, Jieyi Long, Nassir Navab, Yikai Wang

    Abstract: Scene generation has extensive industrial applications, demanding both high realism and precise control over geometry and appearance. Language-driven retrieval methods compose plausible scenes from a large object database, but overlook object-level control and often fail to enforce scene-level style coherence. Graph-based formulations offer higher controllability over objects and inform holistic c… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  4. arXiv:2603.19195  [pdf, ps, other

    eess.AS cs.CL cs.SD

    How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation

    Authors: Ke-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang, Zhehuai Chen, Sung-Feng Huang, Chih-Kai Yang, Yi-Cheng Lin, Chi-Yuan Hsiao, Wenze Ren, En-Pei Hu, Yu-Han Huang, An-Yu Cheng, Cheng-Han Chiang, Yu Tsao, Yu-Chiang Frank Wang, Hung-yi Lee

    Abstract: Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how this affects downstream performance remains unclear. We study this gap by comparing different LLMs under two text-only and one audio-grounded setting: (1) direct probing on AKB-2000, a curated benchmark… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Project website: https://kehanlu.github.io/AKB

  5. arXiv:2603.15154  [pdf, ps, other

    eess.IV cs.CV

    Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

    Authors: Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

    Abstract: Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we propose a three-stage source-aware multi-expert framework for multi-source COVID-19 CT classification. First, we build a lung-aware 3D expert by combining original CT volumes and lung-extracted CT volum… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  6. arXiv:2603.15143  [pdf, ps, other

    eess.IV cs.CV

    Clinical Priors Guided Lung Disease Detection in 3D CT Scans

    Authors: Kejin Lu, Jianfa Bai, Qingqiu Li, Runtian Yuan, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

    Abstract: Accurate classification of lung diseases from chest CT scans plays an important role in computer-aided diagnosis systems. However, medical imaging datasets often suffer from severe class imbalance, which may significantly degrade the performance of deep learning models, especially for minority disease categories. To address this issue, we propose a gender-aware two-stage lung disease classificatio… ▽ More

    Submitted 17 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  7. arXiv:2603.10163  [pdf, ps, other

    cs.CR cs.AI

    Compatibility at a Cost: Systematic Discovery and Exploitation of MCP Clause-Compliance Vulnerabilities

    Authors: Nanzi Yang, Weiheng Bai, Kangjie Lu

    Abstract: The Model Context Protocol (MCP) is a recently proposed interoperability standard that unifies how AI agents connect with external tools and data sources. By defining a set of common client-server message exchange clauses, MCP replaces fragmented integrations with a standardized, plug-and-play framework. However, to be compatible with diverse AI agents, the MCP specification relaxes many behaviora… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  8. arXiv:2603.09714  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

    Authors: Chih-Kai Yang, Yun-Shao Tsai, Yu-Kai Guo, Ping-Le Tsai, Yen-Ting Piao, Hung-Wei Chen, Ting-Lin Hsiao, Yun-Man Hsu, Ke-Han Lu, Hung-yi Lee

    Abstract: While multi-audio understanding is critical for large audio-language models (LALMs), it remains underexplored. We introduce MUGEN, a comprehensive benchmark evaluating this capability across speech, general audio, and music. Our experiments reveal consistent weaknesses in multi-audio settings, and performance degrades sharply as the number of concurrent audio inputs increases, identifying input sc… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

    Comments: 6 pages, 3 figures, 3 tables. Dataset: https://huggingface.co/Multi-Audio-Grounding

  9. arXiv:2603.06660  [pdf, ps, other

    cs.IR cs.DB cs.LG

    Approximate Nearest Neighbor Search for Modern AI: A Projection-Augmented Graph Approach

    Authors: Kejing Lu, Zhenpeng Pan, Jianbin Qin, Yoshiharu Ishikawa, Chuan Xiao

    Abstract: Approximate Nearest Neighbor Search (ANNS) is fundamental to modern AI applications. Most existing solutions optimize query efficiency but fail to align with the practical requirements of modern workloads. In this paper, we outline six critical demands of modern AI applications: high query efficiency, fast indexing, low memory footprint, scalability to high dimensionality, robustness across varyin… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

    Comments: Source code is available at https://github.com/KejingLu-810/PAG/

  10. arXiv:2603.05094   

    cs.SD

    TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling

    Authors: Hao-Hui Xie, Ho-Lam Chung, Yi-Cheng Lin, Ke-Han Lu, Wenze Ren, Xie Chen, Hung-yi Lee

    Abstract: Large Audio-Language Models (LALMs) typically struggle with localized dialectal prosody due to the scarcity of specialized corpora. We present TW-Sound580K, a Taiwanese audio-text instruction dataset developed through a Verify-Generate-Critique (VGC) protocol. This pipeline leverages Dual-ASR validation to filter 522K raw clips, subsequently expanding them into 580,000 high-fidelity instruction pa… ▽ More

    Submitted 27 March, 2026; v1 submitted 5 March, 2026; originally announced March 2026.

    Comments: The authors have decided to withdraw this submission as the work is no longer intended for public dissemination at this time

  11. arXiv:2603.03781  [pdf, ps, other

    cs.AI

    LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

    Authors: Zihao Cheng, Weixin Wang, Yu Zhao, Ziyang Ren, Jiaxuan Chen, Ruiyang Xu, Shuai Huang, Yang Chen, Guowei Li, Mengshi Wang, Yi Xie, Ren Zhu, Zeren Jiang, Keda Lu, Yihong Li, Xiaoliang Wang, Liwei Liu, Cam-Tu Nguyen

    Abstract: Long-term memory is fundamental for personalized agents capable of accumulating knowledge, reasoning over user experiences, and adapting across time. However, existing memory benchmarks primarily target declarative memory, specifically semantic and episodic types, where all information is explicitly presented in dialogues. In contrast, real-world actions are also governed by non-declarative memory… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: A total of 28 pages, 8 pages of main text, and 15 figures and tables

  12. arXiv:2602.17769  [pdf, ps, other

    cs.MM cs.SD eess.AS

    MusicSem: A Semantically Rich Language--Audio Dataset of Natural Music Descriptions

    Authors: Rebecca Salganik, Teng Tu, Fei-Yueh Chen, Xiaohao Liu, Keifeng Lu, Ethan Luvisia, Zhiyao Duan, Guillaume Salha-Galvan, Anson Kahng, Yunshan Ma, Jian Kang

    Abstract: Music representation learning is central to music information retrieval and generation. While recent advances in multimodal learning have improved alignment between text and audio for tasks such as cross-modal music retrieval, text-to-music generation, and music-to-text generation, existing models often struggle to capture users' expressed intent in natural language descriptions of music. This obs… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

  13. arXiv:2602.15617  [pdf, ps, other

    cs.LG cs.NI

    DNN-Enabled Multi-User Beamforming for Throughput Maximization under Adjustable Fairness

    Authors: Kaifeng Lu, Markus Rupp, Stefan Schwarz

    Abstract: Ensuring user fairness in wireless communications is a fundamental challenge, as balancing the trade-off between fairness and sum rate leads to a non-convex, multi-objective optimization whose complexity grows with network scale. To alleviate this conflict, we propose an optimization-based unsupervised learning approach based on the wireless transformer (WiT) architecture that learns from channel… ▽ More

    Submitted 17 February, 2026; originally announced February 2026.

  14. arXiv:2602.10045  [pdf, ps, other

    cs.CV cs.LG stat.ME stat.ML

    Conformal Prediction Sets for Instance Segmentation

    Authors: Kerri Lu, Dan M. Kluger, Stephen Bates, Sherrie Wang

    Abstract: Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image a… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

  15. arXiv:2602.06020  [pdf, ps, other

    cs.LG q-bio.BM

    Mechanisms of AI Protein Folding in ESMFold

    Authors: Kevin Lu, Jannik Brinkmann, Stefan Huber, Aaron Mueller, Yonatan Belinkov, David Bau, Chris Wendler

    Abstract: How do protein structure prediction models fold proteins? We investigate this question by tracing how ESMFold folds a beta hairpin, a prevalent structural motif. Through counterfactual interventions on model latents, we identify two computational stages in the folding trunk. In the first stage, early blocks initialize pairwise biochemical signals: residue identities and associated biochemical feat… ▽ More

    Submitted 8 February, 2026; v1 submitted 5 February, 2026; originally announced February 2026.

    Comments: Our code, data, and results are available at https://folding.baulab.info

  16. Decoupled Hierarchical Distillation for Multimodal Emotion Recognition

    Authors: Yong Li, Yuanzhi Wang, Yi Ding, Shiqing Zhang, Ke Lu, Cuntai Guan

    Abstract: Human multimodal emotion recognition (MER) seeks to infer human emotions by integrating information from language, visual, and acoustic modalities. Although existing MER approaches have achieved promising results, they still struggle with inherent multimodal heterogeneities and varying contributions from different modalities. To address these challenges, we propose a novel framework, Decoupled Hie… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

    Comments: arXiv admin note: text overlap with arXiv:2303.13802

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 2026

  17. arXiv:2601.20833  [pdf, ps, other

    cs.CE

    Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives

    Authors: Tengyue Xu, Zhuoyang Qian, Gaoge Liu, Li Ling, Zhentao Zhang, Biao Wu, Shuo Zhang, Ke Lu, Wei Shi, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Harry Wang, Kris Chen

    Abstract: Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, existing systems largely rely on runtime-centric execution paradigms, repeatedly reading, summarizing, and reasoning over large volumes of scientific literature online. This on-the-spot computation strateg… ▽ More

    Submitted 28 January, 2026; originally announced January 2026.

    Comments: 11 pages, 3 figures

    ACM Class: F.2.2; I.2.7

  18. arXiv:2601.18113  [pdf, ps, other

    cs.CR cs.AI

    MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs

    Authors: Dezhang Kong, Zhuxi Wu, Shiqi Liu, Zhicheng Tan, Kuichen Lu, Minghao Li, Qichen Liu, Shengyu Chu, Zhenhua Xu, Xuan Liu, Meng Han

    Abstract: LLM-based web agents have become increasingly popular for their utility in daily life and work. However, they exhibit critical vulnerabilities when processing malicious URLs: accepting a disguised malicious URL enables subsequent access to unsafe webpages, which can cause severe damage to service providers and users. Despite this risk, no benchmark currently targets this emerging threat. To addres… ▽ More

    Submitted 13 March, 2026; v1 submitted 25 January, 2026; originally announced January 2026.

  19. arXiv:2601.16007  [pdf, ps, other

    cs.CV cs.AI

    PhysicsMind: Sim and Real Mechanics Benchmarking for Physical Reasoning and Prediction in Foundational VLMs and World Models

    Authors: Chak-Wing Mak, Guanyu Zhu, Boyi Zhang, Hongji Li, Xiaowei Chi, Kevin Zhang, Yichen Wu, Yangfan He, Chun-Kai Fan, Wentao Lu, Kuangzhi Ge, Xinyu Fang, Hongyang He, Kuan Lu, Tianxiang Xu, Li Zhang, Yongxin Ni, Youhua Li, Shanghang Zhang

    Abstract: Modern foundational Multimodal Large Language Models (MLLMs) and video world models have advanced significantly in mathematical, common-sense, and visual reasoning, but their grasp of the underlying physics remains underexplored. Existing benchmarks attempting to measure this matter rely on synthetic, Visual Question Answer templates or focus on perceptual video quality that is tangential to measu… ▽ More

    Submitted 22 January, 2026; originally announced January 2026.

  20. arXiv:2601.09952  [pdf, ps, other

    cs.CV cs.RO

    OT-Drive: Out-of-Distribution Off-Road Traversable Area Segmentation via Optimal Transport

    Authors: Zhihua Zhao, Guoqiang Li, Chen Min, Kangping Lu

    Abstract: Reliable traversable area segmentation in unstructured environments is critical for planning and decision-making in autonomous driving. However, existing data-driven approaches often suffer from degraded segmentation performance in out-of-distribution (OOD) scenarios, consequently impairing downstream driving tasks. To address this issue, we propose OT-Drive, an Optimal Transport--driven multi-mod… ▽ More

    Submitted 14 January, 2026; originally announced January 2026.

    Comments: 9 pages, 8 figures, 6 tables. This work has been submitted to the IEEE for possible publication. Code will be released upon acceptance

  21. arXiv:2601.02712  [pdf, ps, other

    eess.IV cs.MM

    Transform and Entropy Coding in AV2

    Authors: Alican Nalci, Hilmi E. Egilmez, Madhu P. Krishnan, Keng-Shih Lu, Joe Young, Debargha Mukherjee, Lin Zheng, Jingning Han, Joel Sole, Xiaoqing Zhu, Xin Zhao, Tianqi Liu, Liang Zhao, Todd Nguyen, Urvang Joshi, Kruthika Koratti Sivakumar, Luhang Xu, Zhijun Lei, Van Luong Pham, Yue Yu, Aki Kuusela, Minhua Zhou, Andrey Norkin, Adrian Grange

    Abstract: AV2 is the successor to the AV1 video coding standard developed by the Alliance for Open Media (AOMedia). Its primary objective is to deliver substantial compression gains and subjective quality improvements while maintaining low-complexity encoder and decoder operations. This paper describes the transform, quantization and entropy coding design in AV2, including redesigned transform kernels and d… ▽ More

    Submitted 7 February, 2026; v1 submitted 5 January, 2026; originally announced January 2026.

  22. arXiv:2601.02295  [pdf, ps, other

    cs.RO

    CycleVLA: Proactive Self-Correcting Vision-Language-Action Models via Subtask Backtracking and Minimum Bayes Risk Decoding

    Authors: Chenyang Ma, Guangyu Yang, Kai Lu, Shitong Xu, Bill Byrne, Niki Trigoni, Andrew Markham

    Abstract: Current work on robot failure detection and correction typically operate in a post hoc manner, analyzing errors and applying corrections only after failures occur. This work introduces CycleVLA, a system that equips Vision-Language-Action models (VLAs) with proactive self-correction, the capability to anticipate incipient failures and recover before they fully manifest during execution. CycleVLA a… ▽ More

    Submitted 5 January, 2026; originally announced January 2026.

    Comments: Project Page: https://dannymcy.github.io/cyclevla/

  23. arXiv:2512.23262  [pdf, ps, other

    cs.LG

    PFed-Signal: An ADR Prediction Model based on Federated Learning

    Authors: Tao Li, Peilin Li, Kui Lu, Yilei Wang, Junliang Shang, Guangshun Li, Huiyu Zhou

    Abstract: The adverse drug reactions (ADRs) predicted based on the biased records in FAERS (U.S. Food and Drug Administration Adverse Event Reporting System) may mislead diagnosis online. Generally, such problems are solved by optimizing reporting odds ratio (ROR) or proportional reporting ratio (PRR). However, these methods that rely on statistical methods cannot eliminate the biased data, leading to inacc… ▽ More

    Submitted 29 December, 2025; originally announced December 2025.

    Comments: IEEE International Conference on Bioinformatics and Biomedicine

  24. arXiv:2512.20677  [pdf, ps, other

    cs.CR cs.CL

    Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models

    Authors: Zhang Wei, Peilu Hu, Zhenyuan Wei, Chenwei Liang, Jing Luo, Ziyi Ni, Hao Yan, Li Mei, Shengning Lang, Kuan Lu, Xi Xiao, Zhimo Han, Yijin Wang, Yichao Zhang, Chen Yang, Junfeng Hao, Jiayi Gu, Riyang Bao, Mu-Jiang-Shan Wang

    Abstract: The increasing deployment of large language models (LLMs) in safety-critical applications raises fundamental challenges in systematically evaluating robustness against adversarial behaviors. Existing red-teaming practices are largely manual and expert-driven, which limits scalability, reproducibility, and coverage in high-dimensional prompt spaces. We formulate automated LLM red-teaming as a struc… ▽ More

    Submitted 13 February, 2026; v1 submitted 21 December, 2025; originally announced December 2025.

    Comments: accepted by EACL

  25. arXiv:2512.15550  [pdf, ps, other

    cs.CL

    CTkvr: KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing

    Authors: Kuan Lu, Shuhang Lin, Sai Wu, Yichen Yao, Junhan Yang, Huan Li, Wei Chu, Xu Yinghui, Yuan Qi, Gang Chen

    Abstract: Large language models (LLMs) are increasingly applied in long-context scenarios such as multi-turn conversations. However, long contexts pose significant challenges for inference efficiency, including high memory overhead from Key-Value (KV) cache and increased latency due to excessive memory accesses. Recent methods for dynamic KV selection struggle with trade-offs: block-level indexing degrades… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  26. arXiv:2512.11280  [pdf, ps, other

    cs.CL

    AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference

    Authors: Kuan-Wei Lu, Ding-Yong Hong, Pangfeng Liu

    Abstract: Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding mitigates this issue by leveraging a smaller draft model to predict candidate tokens, which are then verified by a larger target model. However, existing approaches often require additional training, extensive h… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  27. arXiv:2512.06864  [pdf, ps, other

    cs.CV

    Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training

    Authors: Kaixuan Lu, Mehmet Onurcan Kaya, Dim P. Papadopoulos

    Abstract: Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow dependencies through synthetic data, they remain constrained by the synthetic-to-real domain gap. We present AutoQ-VIS, a novel unsupervised framework that bridges this… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

    Comments: Accepted to WACV 2026. arXiv admin note: substantial text overlap with arXiv:2508.19808

  28. arXiv:2511.21202  [pdf, ps, other

    cs.CV

    Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition

    Authors: Baoli Sun, Yihan Wang, Xinzhu Ma, Zhihui Wang, Kun Lu, Zhiyong Wang

    Abstract: Fine-grained action recognition (FGAR) aims to identify subtle and distinctive differences among fine-grained action categories. However, current recognition methods often capture coarse-grained motion patterns but struggle to identify subtle details in local regions evolving over time. In this work, we introduce the Action-Region Tracking (ART) framework, a novel solution leveraging a query-respo… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  29. arXiv:2511.18734  [pdf, ps, other

    cs.CV cs.AI

    Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion

    Authors: Keyang Lu, Sifan Zhou, Hongbin Xu, Gang Xu, Zhifei Yang, Yikai Wang, Zhen Xiao, Jieyi Long, Ming Li

    Abstract: Realistic 3D city generation is fundamental to a wide range of applications, including virtual reality and digital twins. However, most existing methods rely on training a single diffusion model, which limits their ability to generate personalized and boundless city-scale scenes. In this paper, we present Yo'City, a novel agentic framework that enables user-customized and infinitely expandable 3D… ▽ More

    Submitted 7 March, 2026; v1 submitted 23 November, 2025; originally announced November 2025.

    Comments: Accepted by CVPR 2026

  30. arXiv:2511.17726  [pdf, ps, other

    cs.CR cs.AR

    Pre-cache: A Microarchitectural Solution to prevent Meltdown and Spectre

    Authors: Subhash Sethumurugan, Hari Cherupalli, Kangjie Lu, John Sartori

    Abstract: Recent work has shown that out-of-order and speculative execution mechanisms used to increase performance in the majority of processors expose the processors to critical attacks. These attacks, called Meltdown and Spectre, exploit the side effects of performance-enhancing features in modern microprocessors to expose secret data through side channels in the microarchitecture. The well known impleme… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

    Comments: 17 pages; 19 figures

  31. arXiv:2511.13626  [pdf, ps, other

    cs.AI

    CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product

    Authors: Kaiwen Xue, Chenglong Li, Zhonghong Ou, Guoxin Zhang, Kaoyan Lu, Shuai Lyu, Yifan Zhu, Ping Zong Junpeng Ding, Xinyu Liu, Qunlin Chen, Weiwei Qin, Yiran Shen, Jiayi Cen

    Abstract: Human-defined creativity is highly abstract, posing a challenge for multimodal large language models (MLLMs) to comprehend and assess creativity that aligns with human judgments. The absence of an existing benchmark further exacerbates this dilemma. To this end, we propose CreBench, which consists of two key components: 1) an evaluation benchmark covering the multiple dimensions from creative idea… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

    Comments: 13 pages, 3 figures,The 40th Annual AAAI Conference on Artificial Intelligence(AAAI 2026),Paper has been accepted for a poster presentation

  32. arXiv:2511.10828  [pdf, ps, other

    cs.CR cs.SE

    AFLGopher: Accelerating Directed Fuzzing via Feasibility-Aware Guidance

    Authors: Weiheng Bai, Kefu Wu, Qiushi Wu, Kangjie Lu

    Abstract: Directed fuzzing is a useful testing technique that aims to efficiently reach target code sites in a program. The core of directed fuzzing is the guiding mechanism that directs the fuzzing to the specified target. A general guiding mechanism adopted in existing directed fuzzers is to calculate the control-flow distance between the current progress and the target, and use that as feedback to guide… ▽ More

    Submitted 13 November, 2025; originally announced November 2025.

  33. arXiv:2510.23495  [pdf, ps, other

    cs.RO

    COOPERA: Continual Open-Ended Human-Robot Assistance

    Authors: Chenyang Ma, Kai Lu, Ruta Desai, Xavier Puig, Andrew Markham, Niki Trigoni

    Abstract: To understand and collaborate with humans, robots must account for individual human traits, habits, and activities over time. However, most robotic assistants lack these abilities, as they primarily focus on predefined tasks in structured environments and lack a human model to learn from. This work introduces COOPERA, a novel framework for COntinual, OPen-Ended human-Robot Assistance, where simula… ▽ More

    Submitted 27 October, 2025; originally announced October 2025.

    Comments: NeurIPS 2025 (Spotlight); Project Page: https://dannymcy.github.io/coopera/

  34. arXiv:2510.22401  [pdf, ps, other

    cs.DS

    Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry

    Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

    Abstract: The Johnson-Lindenstrauss (JL) lemma is a cornerstone of dimensionality reduction in Euclidean space, but its applicability to non-Euclidean data has remained limited. This paper extends the JL lemma beyond Euclidean geometry to handle general dissimilarity matrices that are prevalent in real-world applications. We present two complementary approaches: First, we show the JL transform can be applie… ▽ More

    Submitted 25 October, 2025; originally announced October 2025.

    Comments: Accepted to Neurips 2025

  35. arXiv:2510.16917  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

    Authors: Chih-Kai Yang, Yen-Ting Piao, Tzu-Wen Hsu, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee

    Abstract: Knowledge editing enables targeted updates without retraining, but prior work focuses on textual or visual facts, leaving abstract auditory perceptual knowledge underexplored. We introduce SAKE, the first benchmark for editing perceptual auditory attribute knowledge in large audio-language models (LALMs), which requires modifying acoustic generalization rather than isolated facts. We evaluate eigh… ▽ More

    Submitted 15 March, 2026; v1 submitted 19 October, 2025; originally announced October 2025.

    Comments: Work in progress. Resources: https://github.com/ckyang1124/SAKE

  36. arXiv:2510.16893  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations

    Authors: Bo-Han Feng, Chien-Feng Liu, Yu-Hsuan Li Liang, Chih-Kai Yang, Szu-Wei Fu, Zhehuai Chen, Ke-Han Lu, Sung-Feng Huang, Chao-Han Huck Yang, Yu-Chiang Frank Wang, Yun-Nung Chen, Hung-yi Lee

    Abstract: Large audio-language models (LALMs) extend text-based LLMs with auditory understanding, offering new opportunities for multimodal applications. While their perception, reasoning, and task performance have been widely studied, their safety alignment under paralinguistic variation remains underexplored. This work systematically investigates the role of speaker emotion. We construct a dataset of mali… ▽ More

    Submitted 19 October, 2025; originally announced October 2025.

    Comments: Submitted to ICASSP 2026

  37. arXiv:2510.16753  [pdf, ps, other

    cs.AI

    ELMM: Efficient Lightweight Multimodal Large Language Models for Multimodal Knowledge Graph Completion

    Authors: Wei Huang, Peining Li, Meiyu Liang, Xu Hou, Junping Du, Yingxia Shao, Guanhua Ye, Wu Liu, Kangkang Lu, Yang Yu

    Abstract: Multimodal Knowledge Graphs (MKGs) extend traditional knowledge graphs by incorporating visual and textual modalities, enabling richer and more expressive entity representations. However, existing MKGs often suffer from incompleteness, which hinder their effectiveness in downstream tasks. Therefore, multimodal knowledge graph completion (MKGC) task is receiving increasing attention. While large la… ▽ More

    Submitted 6 January, 2026; v1 submitted 19 October, 2025; originally announced October 2025.

    Comments: 14 pages, 5 figures

    MSC Class: 68T30 ACM Class: H.3.3

  38. arXiv:2510.05142  [pdf, ps, other

    cs.CL cond-mat.mtrl-sci

    Reliable End-to-End Material Information Extraction from the Literature with Source-Tracked Multi-Stage Large Language Models

    Authors: Xin Wang, Anshu Raj, Matthew Luebbe, Haiming Wen, Shuozhi Xu, Kun Lu

    Abstract: Data-driven materials discovery requires large-scale experimental datasets, yet most of the information remains trapped in unstructured literature. Existing extraction efforts often focus on a limited set of features and have not addressed the integrated composition-processing-microstructure-property relationships essential for understanding materials behavior, thereby posing challenges for buildi… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

    Comments: 27 pages, 4 figures, 7 tables

  39. arXiv:2509.26329  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics

    Authors: Yi-Cheng Lin, Yu-Hua Chen, Jia-Kai Dong, Yueh-Hsuan Huang, Szu-Chi Chen, Yu-Chen Chen, Chih-Yao Chen, Yu-Jung Lin, Yu-Ling Chen, Zih-Yu Chen, I-Ning Tsai, Hsiu-Hsuan Wang, Ho-Lam Chung, Ke-Han Lu, Hung-yi Lee

    Abstract: Large audio-language models are advancing rapidly, yet most evaluations emphasize speech or globally sourced sounds, overlooking culturally distinctive cues. This gap raises a critical question: can current models generalize to localized, non-semantic audio that communities instantly recognize but outsiders do not? To address this, we present TAU (Taiwan Audio Understanding), a benchmark of everyd… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: 5 pages; submitted to ICASSP 2026

  40. arXiv:2509.26092  [pdf, ps, other

    cs.DC cs.LG

    Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training

    Authors: Kuan-Wei Lu, Ding-Yong Hong, Pangfeng Liu, Jan-Jan Wu

    Abstract: Distributed machine learning is critical for training deep learning models on large datasets with numerous parameters. Current research primarily focuses on leveraging additional hardware resources and powerful computing units to accelerate the training process. As a result, larger batch sizes are often employed to speed up training. However, training with large batch sizes can lead to lower accur… ▽ More

    Submitted 31 October, 2025; v1 submitted 30 September, 2025; originally announced September 2025.

  41. arXiv:2509.22796  [pdf, ps, other

    cs.CR cs.LG

    What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs

    Authors: Xingyu Li, Juefei Pu, Yifan Wu, Xiaochen Zou, Shitong Zhu, Xiaochen Zou, Shitong Zhu, Qiushi Wu, Zheng Zhang, Joshua Hsu, Yue Dong, Zhiyun Qian, Kangjie Lu, Trent Jaeger, Michael De Lucia, Srikanth V. Krishnamurthy

    Abstract: Open-source software projects are foundational to modern software ecosystems, with the Linux kernel standing out as a critical exemplar due to its ubiquity and complexity. Although security patches are continuously integrated into the Linux mainline kernel, downstream maintainers often delay their adoption, creating windows of vulnerability. A key reason for this lag is the difficulty in identifyi… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  42. arXiv:2509.17354  [pdf

    cs.AI cs.LG

    Multi-Scenario Highway Lane-Change Intention Prediction: A Physics-Informed AI Framework for Three-Class Classification

    Authors: Jiazhao Shi, Yichen Lin, Yiheng Hua, Ziyu Wang, Zijian Zhang, Wenjia Zheng, Yun Song, Kuan Lu, Shoufeng Lu

    Abstract: Lane-change maneuvers are a leading cause of highway accidents, underscoring the need for accurate intention prediction to improve the safety and decision-making of autonomous driving systems. While prior studies using machine learning and deep learning methods (e.g., SVM, CNN, LSTM, Transformers) have shown promise, most approaches remain limited by binary classification, lack of scenario diversi… ▽ More

    Submitted 30 November, 2025; v1 submitted 22 September, 2025; originally announced September 2025.

  43. arXiv:2509.02333  [pdf, ps, other

    cs.CL cs.AI cs.LG

    DCPO: Dynamic Clipping Policy Optimization

    Authors: Shihui Yang, Chengfeng Dou, Peidong Guo, Kai Lu, Qiang Ju, Fei Deng, Rihui Xin

    Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a promising framework for enhancing the reasoning capabilities of large language models. However, existing approaches such as GRPO often suffer from zero gradients. This problem arises primarily due to fixed clipping bounds for token-level probability ratios and the standardization of identical rewards, which can lead to ineffect… ▽ More

    Submitted 8 September, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

  44. arXiv:2509.02208  [pdf, ps, other

    cs.LG cs.AI

    Baichuan-M2: Scaling Medical Capability with Large Verifier System

    Authors: Baichuan-M2 Team, :, Chengfeng Dou, Chong Liu, Fan Yang, Fei Li, Jiyuan Jia, Mingyang Chen, Qiang Ju, Shuai Wang, Shunya Dang, Tianpeng Li, Xiangrong Zeng, Yijie Zhou, Chenzheng Zhu, Da Pan, Fei Deng, Guangwei Ai, Guosheng Dong, Hongda Zhang, Jinyang Tai, Jixiang Hong, Kai Lu, Linzhuang Sun, Peidong Guo , et al. (10 additional authors not shown)

    Abstract: As large language models (LLMs) advance in conversational and reasoning capabilities, their practical application in healthcare has become a critical research focus. However, there is a notable gap between the performance of medical LLMs on static benchmarks such as USMLE and their utility in real-world clinical decision-making. This discrepancy arises because traditional exams fail to capture the… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

    Comments: Baichuan-M2 Technical Report

  45. arXiv:2508.20244  [pdf, ps, other

    cs.AI

    Do Students Rely on AI? Analysis of Student-ChatGPT Conversations from a Field Study

    Authors: Jiayu Zheng, Lingxin Hao, Kelun Lu, Ashi Garg, Mike Reese, Melo-Jean Yap, I-Jeng Wang, Xingyun Wu, Wenrui Huang, Jenna Hoffman, Ariane Kelly, My Le, Ryan Zhang, Yanyu Lin, Muhammad Faayez, Anqi Liu

    Abstract: This study explores how college students interact with generative AI (ChatGPT-4) during educational quizzes, focusing on reliance and predictors of AI adoption. Conducted at the early stages of ChatGPT implementation, when students had limited familiarity with the tool, this field study analyzed 315 student-AI conversations during a brief, quiz-based scenario across various STEM courses. A novel f… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

  46. arXiv:2508.19808  [pdf, ps, other

    cs.CV

    AutoQ-VIS: Improving Unsupervised Video Instance Segmentation via Automatic Quality Assessment

    Authors: Kaixuan Lu, Mehmet Onurcan Kaya, Dim P. Papadopoulos

    Abstract: Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow dependencies through synthetic data, they remain constrained by the synthetic-to-real domain gap. We present AutoQ-VIS, a novel unsupervised framework that bridges this… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

    Comments: Accepted to ICCV 2025 Workshop LIMIT

  47. arXiv:2508.10925  [pdf, ps, other

    cs.CL cs.AI

    gpt-oss-120b & gpt-oss-20b Model Card

    Authors: OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook , et al. (102 additional authors not shown)

    Abstract: We present gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models that push the frontier of accuracy and inference cost. The models use an efficient mixture-of-expert transformer architecture and are trained using large-scale distillation and reinforcement learning. We optimize the models to have strong agentic capabilities (deep research browsing, python tool use, and support for develope… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

  48. arXiv:2508.10019  [pdf, ps, other

    cs.CL cs.AI

    Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning

    Authors: Li Wang, Changhao Zhang, Zengqi Xiu, Kai Lu, Xin Yu, Kui Zhang, Wenjun Wu

    Abstract: Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., up to 1.5B parameters) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. T… ▽ More

    Submitted 15 December, 2025; v1 submitted 6 August, 2025; originally announced August 2025.

  49. arXiv:2508.00344  [pdf, ps, other

    cs.CL

    PilotRL: Training Language Model Agents via Global Planning-Guided Progressive Reinforcement Learning

    Authors: Keer Lu, Chong Chen, Xili Wang, Bin Cui, Yunhuai Liu, Wentao Zhang

    Abstract: Large Language Models (LLMs) have shown remarkable advancements in tackling agent-oriented tasks. Despite their potential, existing work faces challenges when deploying LLMs in agent-based environments. The widely adopted agent paradigm ReAct centers on integrating single-step reasoning with immediate action execution, which limits its effectiveness in complex tasks requiring long-term strategic p… ▽ More

    Submitted 7 January, 2026; v1 submitted 1 August, 2025; originally announced August 2025.

  50. arXiv:2507.23541  [pdf, ps, other

    cs.CL

    Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning

    Authors: Keer Lu, Zheng Liang, Youquan Li, Jiejun Tan, Xili Wang, Da Pan, Shusen Zhang, Guosheng Dong, Bin Cui, Yunhuai Liu, Wentao Zhang

    Abstract: In medical scenarios, effectively retrieving external knowledge and leveraging it for rigorous logical reasoning is of significant importance. Despite their potential, existing work has predominantly focused on enhancing either retrieval or reasoning capabilities of the models in isolation, with little attention given to their joint optimization, which leads to limited coordination between the two… ▽ More

    Submitted 19 January, 2026; v1 submitted 31 July, 2025; originally announced July 2025.