Skip to main content

Showing 1–50 of 1,290 results for author: Lu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.08455  [pdf, ps, other

    cs.AI

    KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

    Authors: Tongbo Chen, Zhengxi Lu, Zhan Xu, Guocheng Shao, Shaohan Zhao, Fei Tang, Yong Du, Kaitao Song, Yizhou Liu, Yuchen Yan, Wenqi Zhang, Xu Tan, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

    Abstract: Personalized mobile agents that infer user preferences and calibrate proactive assistance hold great promise as everyday digital assistants, yet existing benchmarks fail to capture what this requires. Prior work evaluates preference recovery from static histories or intent prediction from fixed contexts. Neither tests whether an agent can elicit missing preferences through interaction, nor whether… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  2. arXiv:2604.07121  [pdf, ps, other

    cs.HC cs.AI

    Mixed-Initiative Context: Structuring and Managing Context for Human-AI Collaboration

    Authors: Haichang Li, Qinshi Zhang, Piaohong Wang, Zhicong Lu

    Abstract: In the human-AI collaboration area, the context formed naturally through multi-turn interactions is typically flattened into a chronological sequence and treated as a fixed whole in subsequent reasoning, with no mechanism for dynamic organization and management along the collaboration workflow. Yet these contexts differ substantially in lifecycle, structural hierarchy, and relevance. For instance,… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 19 pages, 3 figures, 1 table. Appendix on pages 13-19 (main text is self-contained)

    ACM Class: H.5.2

  3. arXiv:2604.05620  [pdf, ps, other

    cs.CV cs.AI

    Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening

    Authors: Chenyu Xue, Yiran Liu, Mian Zhou, Jionglong Su, Zhixiang Lu

    Abstract: Medical image segmentation driven by free-text clinical instructions is a critical frontier in computer-aided diagnosis. However, existing multimodal and foundation models struggle with the semantic ambiguity of clinical reports and fail to disambiguate complex anatomical overlaps in low-contrast scans. Furthermore, fully fine-tuning these massive architectures on limited medical datasets invariab… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  4. arXiv:2604.05365  [pdf, ps, other

    cs.IR

    From Clues to Generation: Language-Guided Conditional Diffusion for Cross-Domain Recommendation

    Authors: Ziang Lu, Lei Sang, Lin Mu, Yiwen Zhang

    Abstract: Cross-domain Recommendation (CDR) exploits multi-domain correlations to alleviate data sparsity. As a core task within this field, inter-domain recommendation focuses on predicting preferences for users who interact in a source domain but lack behavioral records in a target domain. Existing approaches predominantly rely on overlapping users as anchors for knowledge transfer. In real-world scenario… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    Comments: 11 pages, 6 figures

  5. arXiv:2604.04839  [pdf, ps, other

    cs.CL

    MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

    Authors: Zhixiang Lu, Chong Zhang, Chenyu Xue, Angelos Stefanidis, Chong Li, Jionglong Su, Zhengyong Jiang

    Abstract: Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such a… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  6. arXiv:2604.03045  [pdf, ps, other

    cs.CV cs.MM

    STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models

    Authors: Linfeng Fan, Yuan Tian, Ziwei Li, Zhiwu Lu

    Abstract: Video Large Language Models (Video-LLMs) remain prone to spatiotemporal hallucinations, often generating visually unsupported details or incorrect temporal relations. Existing mitigation methods typically treat hallucination as a uniform decoding failure, applying globally shared correction rules. We instead observe that decoder layers contribute differently to visual grounding and later linguisti… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: Preprint

  7. arXiv:2604.02268  [pdf, ps, other

    cs.LG

    SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

    Authors: Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

    Abstract: Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  8. arXiv:2604.00780  [pdf, ps, other

    cs.AR

    RePart: Efficient Hypergraph Partitioning with Logic Replication Optimization for Multi-FPGA System

    Authors: Zizhuo Fu, Yifan Zhou, Zhaoxin Lu, Guangyu Sun, Runsheng Wang, Meng Li, Yibo Lin

    Abstract: Multi-FPGA systems (MFS) are widely adopted for VLSI emulation and rapid prototyping. In an MFS, FPGAs connect only to a limited number of neighbors through bandwidth-constrained links, so inter-FPGA communication cost depends on network topology. This setting exposes two fundamental limitations of existing MFS-aware partitioning methods: conventional hypergraph partitioners focus solely on cut si… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: 2026 International Symposium of Electronics Design Automation (ISEDA)

  9. arXiv:2604.00590  [pdf, ps, other

    cs.IR cs.AI

    UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

    Authors: Mingming Ha, Guanchen Wang, Linxun Chen, Xuan Rao, Yuexin Shi, Tianbao Ma, Zhaojie Liu, Yunqian Fan, Zilong Lu, Yanan Niu, Han Li, Kun Gai

    Abstract: In recent years, the scaling laws of recommendation models have attracted increasing attention, which govern the relationship between performance and parameters/FLOPs of recommenders. Currently, there are three mainstream architectures for achieving scaling in recommendation models, namely attention-based, TokenMixer-based, and factorization-machine-based methods, which exhibit fundamental differe… ▽ More

    Submitted 1 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

  10. arXiv:2603.28651  [pdf, ps, other

    cs.AI

    Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning

    Authors: Rongjin Li, Zichen Tang, Xianghe Wang, Xinyi Hu, Zhengyu Wang, Zhengyu Lu, Yiling Huang, Jiayuan Chen, Weisheng Tan, Jiacheng Liu, Zhongjun Yang, Haihong E

    Abstract: With the rapid progress of multimodal large language models (MLLMs), AI already performs well at literature retrieval and certain reasoning tasks, serving as a capable assistant to human researchers, yet it remains far from autonomous research. The fundamental reason is that current work on academic paper reasoning is largely confined to a search-oriented paradigm centered on pre-specified targets… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: Accepted to ICLR 2026

  11. arXiv:2603.27460  [pdf, ps, other

    cs.CV cs.AI

    Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

    Authors: Zhongying Deng, Cheng Tang, Ziyan Huang, Jiashi Lin, Ying Chen, Junzhi Ning, Chenglong Ma, Jiyao Liu, Wei Li, Yinghao Zhu, Shujian Gao, Yanyan Huang, Sibo Ju, Yanzhou Su, Pengcheng Chen, Wenhao Tang, Tianbin Li, Haoyu Wang, Yuanfeng Ji, Hui Sun, Shaobo Min, Liang Peng, Feilong Tang, Haochen Xue, Rulin Zhou , et al. (102 additional authors not shown)

    Abstract: Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

    Comments: 157 pages, 19 figures, 26 tables. Project repo: \url{https://github.com/uni-medical/Project-Imaging-X}

  12. arXiv:2603.27186  [pdf, ps, other

    cs.LG

    Hybrid Deep Learning with Temporal Data Augmentation for Accurate Remaining Useful Life Prediction of Lithium-Ion Batteries

    Authors: Yun Tian, Guili Wang, Jian Bi, Kaixin Han, Chenglu Wu, Zhiyi Lu, Chenhao Li, Liangwang Sun, Minyu Zhou, Chenchen Xu

    Abstract: Accurate prediction of lithium-ion battery remaining useful life (RUL) is essential for reliable health monitoring and data-driven analysis of battery degradation. However, the robustness and generalization capabilities of existing RUL prediction models are significantly challenged by complex operating conditions and limited data availability. To address these limitations, this study proposes a hy… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

  13. arXiv:2603.26420  [pdf, ps, other

    cs.HC cs.CY cs.SI

    "Law at Your Fingertips": Understanding Legal Information Seeking on Video-Sharing Platforms in China

    Authors: Zhiyang Wu, Junliang Chen, Qian Wan, Qing Xiao, Piaohong Wang, Ge Gao, Zhicong Lu

    Abstract: Equipping laypeople with the capabilities to seek legal information has been an important goal for Legal Empowerment in modern society. However, unlike general information-seeking behaviors, legal information seeking is characterized by high stakes, urgency, and a critical need for emotional support, which traditional text-based searching platforms struggle to satisfy. In recent years, people have… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: 25 pages, 1 figure; Accepted by ACM CSCW 2026. To appear in Proceedings of the ACM on Human-Computer Interaction (CSCW)

    ACM Class: H.5.m; K.4.0

  14. arXiv:2603.25498  [pdf, ps, other

    cs.AI

    EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

    Authors: Linxiao Li, Zhixiang Lu

    Abstract: As the Web transitions from static retrieval to generative interaction, the escalating environmental footprint of Large Language Models (LLMs) presents a critical sustainability challenge. Current paradigms indiscriminately apply computation-intensive strategies like Chain-of-Thought (CoT) to billions of daily queries, causing LLM overthinking, a redundancy that amplifies carbon emissions and oper… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: Accepted by WWW 2026

  15. arXiv:2603.24533  [pdf, ps, other

    cs.LG cs.AI cs.CV

    UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

    Authors: Zichuan Lin, Feiyu Liu, Yijun Yang, Jiafei Lyu, Yiming Gao, Yicheng Liu, Zhicong Lu, Yangbin Yu, Mingyu Yang, Junyou Li, Deheng Ye, Jie Jiang

    Abstract: Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

    Comments: Code and models are available at https://github.com/ui-voyager/UI-Voyager

  16. arXiv:2603.23953  [pdf, ps, other

    cs.CV cs.ET

    VOLMO: Versatile and Open Large Models for Ophthalmology

    Authors: Zhenyue Qin, Younjoon Chung, Elijah Lee, Wanyue Feng, Xuguang Ai, Serina Applebaum, Minjie Zou, Yang Liu, Pan Xiao, Mac Singer, Amisha Dave, Aidan Gilson, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih-Chung Tham, Ron Adelman, Luciano V. Del Priore, Qingyu Chen

    Abstract: Vision impairment affects millions globally, and early detection is critical to preventing irreversible vision loss. Ophthalmology workflows require clinicians to integrate medical images, structured clinical data, and free-text notes to determine disease severity and management, which is time-consuming and burdensome. Recent multimodal large language models (MLLMs) show promise, but existing gene… ▽ More

    Submitted 26 March, 2026; v1 submitted 25 March, 2026; originally announced March 2026.

  17. arXiv:2603.23324  [pdf, ps, other

    cs.CV

    Pose-Free Omnidirectional Gaussian Splatting for 360-Degree Videos with Consistent Depth Priors

    Authors: Chuanqing Zhuang, Xin Lu, Zehui Deng, Zhengda Lu, Yiqun Wang, Junqi Diao, Jun Xiao

    Abstract: Omnidirectional 3D Gaussian Splatting with panoramas is a key technique for 3D scene representation, and existing methods typically rely on slow SfM to provide camera poses and sparse points priors. In this work, we propose a pose-free omnidirectional 3DGS method, named PFGS360, that reconstructs 3D Gaussians from unposed omnidirectional videos. To achieve accurate camera pose estimation, we first… ▽ More

    Submitted 26 March, 2026; v1 submitted 24 March, 2026; originally announced March 2026.

  18. arXiv:2603.23007  [pdf, ps, other

    cs.CR cs.AI

    AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents

    Authors: Yutao Luo, Haotian Zhu, Shuchao Pang, Zhigang Lu, Tian Dong, Yongbin Zhou, Minhui Xue

    Abstract: The rapid adoption of mobile graphical user interface (GUI) agents, which autonomously control applications and operating systems (OS), exposes new system-level attack surfaces. Existing backdoors against web GUI agents and general GenAI models rely on environmental injection or deceptive pop-ups to mislead the agent operation. However, these techniques do not work on screenshots-based mobile GUI… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  19. arXiv:2603.21022  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Knowledge Boundary Discovery for Large Language Models

    Authors: Ziquan Wang, Zhongqi Lu

    Abstract: We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore the knowledge boundaries of the Large Language Models (LLMs). We define the knowledge boundary by automatically generating two types of questions: (i) those the LLM can confidently answer (within-knowledge boundary) and (ii) those it cannot (beyond-knowledge boundary). Iteratively exploring and explo… ▽ More

    Submitted 13 January, 2026; originally announced March 2026.

    Comments: 9 pages,4 figures

  20. arXiv:2603.21010  [pdf, ps, other

    cs.CV

    SkinCLIP-VL: Consistency-Aware Vision-Language Learning for Multimodal Skin Cancer Diagnosis

    Authors: Zhixiang Lu, Shijie Xu, Kaicheng Yan, Xuyue Cai, Chong Zhang, Yulong Li, Angelos Stefanidis, Anh Nguyen, Jionglong Su

    Abstract: The deployment of vision-language models (VLMs) in dermatology is hindered by the trilemma of high computational costs, extreme data scarcity, and the black-box nature of deep learning. To address these challenges, we present SkinCLIP-VL, a resource-efficient framework that adapts foundation models for trustworthy skin cancer diagnosis. Adopting a frozen perception, adaptive reasoning paradigm, we… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: Accepted by 2026 IEEE International Conference on Multimedia and Expo (ICME 2026)

  21. arXiv:2603.20340  [pdf, ps, other

    cs.SE cs.AI

    ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents

    Authors: Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Lianyong Qi, Shi Jin

    Abstract: Self-generated skills for web agents are often unstable and can even hurt performance relative to direct acting. We argue that the key bottleneck is not only skill generation quality, but the fact that web skills remain implicit and therefore cannot be checked or locally repaired. To address this, we present ContractSkill, a framework that converts a draft skill into an executable artifact with ex… ▽ More

    Submitted 31 March, 2026; v1 submitted 20 March, 2026; originally announced March 2026.

    Comments: 10 pages, 4 figures, 6 tables

  22. arXiv:2603.19931  [pdf, ps, other

    cs.CL

    SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia

    Authors: Zhixiang Lu, Chong Zhang, Yulong Li, Angelos Stefanidis, Anh Nguyen, Imran Razzak, Jionglong Su, Zhengyong Jiang

    Abstract: The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large language models (LLMs) offer a potential solution for translation, their deployment in data-poor contexts faces a dual challenge: the scarcity of high-quality, culturally relevant data and the prohibitive energy costs of training on… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: Accepted by WWW 2026

  23. arXiv:2603.19790  [pdf, ps, other

    cs.CV

    From Plausibility to Verifiability: Risk-Controlled Generative OCR for Vision-Language Models

    Authors: Weile Gong, Yiping Zuo, Zijian Lu, Xin He, Weibei Fan, Lianyong Qi, Shi Jin

    Abstract: Modern vision-language models (VLMs) can act as generative OCR engines, yet open-ended decoding can expose rare but consequential failures. We identify a core deployment misalignment in generative OCR. Autoregressive decoding favors semantic plausibility, whereas OCR requires outputs that are visually grounded and geometrically verifiable. This mismatch produces severe errors, especially over-gene… ▽ More

    Submitted 31 March, 2026; v1 submitted 20 March, 2026; originally announced March 2026.

    Comments: 10 pages, 5 figures, 5 tables

  24. arXiv:2603.19110  [pdf, ps, other

    quant-ph cs.CR

    Post-Quantum Cryptography from Quantum Stabilizer Decoding

    Authors: Jonathan Z. Lu, Alexander Poremba, Yihui Quek, Akshar Ramkumar

    Abstract: Post-quantum cryptography currently rests on a small number of hardness assumptions, posing significant risks should any one of them be compromised. This vulnerability motivates the search for new and cryptographically versatile assumptions that make a convincing case for quantum hardness. In this work, we argue that decoding random quantum stabilizer codes -- a quantum analog of the well-studied… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 49 pages

  25. arXiv:2603.18683  [pdf, ps, other

    cs.LG cs.AI cs.CL

    HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement Learning

    Authors: Zhicong Lu, Zichuan Lin, Wei Jia, Changyuan Tian, Deheng Ye, Peiguang Li, Li Jin, Nayu Liu, Guangluan Xu, Wei Feng

    Abstract: While large language models excel in diverse domains, their performance on complex longhorizon agentic decision-making tasks remains limited. Most existing methods concentrate on designing effective reward models (RMs) to advance performance via multi-turn reinforcement learning. However, they suffer from delayed propagation in sparse outcome rewards and unreliable credit assignment with potential… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Submitted to ACL 2026 on Jan 5, 2026

  26. arXiv:2603.18623  [pdf, ps, other

    cs.CV cs.AI

    OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

    Authors: Bin Cao, Sipeng Zheng, Hao Luo, Boyuan Li, Jing Liu, Zongqing Lu

    Abstract: Text-to-motion (T2M) generation aims to create realistic human movements from text descriptions, with promising applications in animation and robotics. Despite recent progress, current T2M models perform poorly on unseen text descriptions due to the small scale and limited diversity of existing motion datasets. To address this problem, we introduce OpenT2M, a million-level, high-quality, and open-… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  27. arXiv:2603.16542  [pdf, ps, other

    cs.RO

    Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

    Authors: Wanpeng Zhang, Hao Luo, Sipeng Zheng, Yicheng Feng, Haiweng Xu, Ziheng Xi, Chaoyi Xu, Haoqi Yuan, Zongqing Lu

    Abstract: Offline post-training adapts a pretrained robot policy to a target dataset by supervised regression on recorded actions. In practice, robot datasets are heterogeneous: they mix embodiments, camera setups, and demonstrations of varying quality, so many trajectories reflect recovery behavior, inconsistent operator skill, or weakly informative supervision. Uniform post-training gives equal credit to… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  28. arXiv:2603.16446  [pdf, ps, other

    cs.CV

    Unified Removal of Raindrops and Reflections: A New Benchmark and A Novel Pipeline

    Authors: Xingyu Liu, Zewei He, Yu Chen, Chunyu Zhu, Zixuan Chen, Xing Luo, Zhe-Ming Lu

    Abstract: When capturing images through glass surfaces or windshields on rainy days, raindrops and reflections frequently co-occur to significantly reduce the visibility of captured images. This practical problem lacks attention and needs to be resolved urgently. Prior de-raindrop, de-reflection, and all-in-one models have failed to address this composite degradation. To this end, we first formally define t… ▽ More

    Submitted 6 April, 2026; v1 submitted 17 March, 2026; originally announced March 2026.

    Comments: 25 pages, 22 figures

  29. arXiv:2603.16301  [pdf, ps, other

    cs.RO

    OGScene3D: Incremental Open-Vocabulary 3D Gaussian Scene Graph Mapping for Scene Understanding

    Authors: Siting Zhu, Ziyun Lu, Guangming Wang, Chenguang Huang, Yongbo Chen, I-Ming Chen, Wolfram Burgard, Hesheng Wang

    Abstract: Open-vocabulary scene understanding is crucial for robotic applications, enabling robots to comprehend complex 3D environmental contexts and supporting various downstream tasks such as navigation and manipulation. However, existing methods require pre-built complete 3D semantic maps to construct scene graphs for scene understanding, which limits their applicability in robotic scenarios where envir… ▽ More

    Submitted 17 March, 2026; v1 submitted 17 March, 2026; originally announced March 2026.

  30. arXiv:2603.15611  [pdf, ps, other

    cs.CL

    Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

    Authors: Aozhe Wang, Yuchen Yan, Nan Zhou, Zhengxi Lu, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

    Abstract: Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent self-play methods unify code and test generation in a single model, but face a inherent dilemma: white-box access leads to self-collusion where the model produ… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: Project Page: https://zju-real.github.io/Code-A1 Code: https://github.com/ZJU-REAL/Code-A1

  31. arXiv:2603.15025  [pdf, ps, other

    cs.CV

    One CT Unified Model Training Framework to Rule All Scanning Protocols

    Authors: Fengzhi Xu, Ziyuan Yang, Zexin Lu, Yingyu Chen, Fenglei Fan, Hongming Shan, Yi Zhang

    Abstract: Non-ideal measurement computed tomography (NICT), which lowers radiation at the cost of image quality, is expanding the clinical use of CT. Although unified models have shown promise in NICT enhancement, most methods require paired data, which is an impractical demand due to inevitable organ motion. Unsupervised approaches attempt to overcome this limitation, but their assumption of homogeneous no… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  32. arXiv:2603.14452  [pdf, ps, other

    cs.CV

    Uni-MDTrack: Learning Decoupled Memory and Dynamic States for Parameter-Efficient Visual Tracking in All Modality

    Authors: Wenrui Cai, Zhenyi Lu, Yuzhe Li, Yongchao Feng, Jinqing Zhang, Qingjie Liu, Yunhong Wang

    Abstract: With the advent of Transformer-based one-stream trackers that possess strong capability in inter-frame relation modeling, recent research has increasingly focused on how to introduce spatio-temporal context. However, most existing methods rely on a limited number of historical frames, which not only leads to insufficient utilization of the context, but also inevitably increases the length of input… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: 15 pages, 9 figures, 16 tables

  33. arXiv:2603.14189  [pdf, ps, other

    cs.CV cs.AI

    Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

    Authors: Zhiyang Lu, Wen Jiang, Tianren Wu, Zhichao Wang, Changwang Zhang, Siqi Shen, Ming Cheng

    Abstract: Gait recognition is an emerging biometric technology that enables non-intrusive and hard-to-spoof human identification. However, most existing methods are confined to short-range, unimodal settings and fail to generalize to long-range and cross-distance scenarios under real-world conditions. To address this gap, we present \textbf{LRGait}, the first LiDAR-Camera multimodal benchmark designed for r… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

    Comments: Accepted by AAAI 2026

  34. arXiv:2603.13615  [pdf, ps, other

    cs.CV cs.RO

    Egocentric World Model for Photorealistic Hand-Object Interaction Synthesis

    Authors: Dayou Li, Lulin Liu, Bangya Liu, Shijie Zhou, Jiu Feng, Ziqi Lu, Minghui Zheng, Chenyu You, Zhiwen Fan

    Abstract: To serve as a scalable data source for embodied AI, world models should act as true simulators that infer interaction dynamics strictly from user actions, rather than mere conditional video generators relying on privileged future object states. In this context, egocentric Human-Object Interaction (HOI) world models are critical for predicting physically grounded first-person rollouts. However, bui… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  35. arXiv:2603.12873  [pdf, ps, other

    cs.CV

    TRACE: Structure-Aware Character Encoding for Robust and Generalizable Document Watermarking

    Authors: Jiale Meng, Jie Zhang, Runyi Hu, Zhe-Ming Lu, Tianwei Zhang, Yiming Li

    Abstract: We propose TRACE, a structure-aware framework leveraging diffusion models for localized character encoding to embed data. Unlike existing methods that rely on edge features or pre-defined codebooks, TRACE exploits character structures that provide inherent resistance to noise interference due to their stability and unified representation across diverse characters. Our framework comprises three key… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  36. arXiv:2603.12826  [pdf, ps, other

    cs.CL

    Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design

    Authors: Xu Guo, Qiming Ge, Jian Tong, Kedi Chen, Jin Zhang, Xiaogui Yang, Xuan Gao, Haijun Lv, Zhihui Lu, Yicheng Zou, Qipeng Guo

    Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the reasoning capabilities of Large Language Models. When applied to RLVR, Multiple-Choice Questions (MCQs) offer a scalable source of verifiable data but risk inducing reward hacking, where models shortcut reasoning via random guessing or simple elimination. Current approaches often mitigate this by converting MCQs to op… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  37. arXiv:2603.11219  [pdf, ps, other

    cs.CV

    Senna-2: Aligning VLM and End-to-End Driving Policy for Consistent Decision Making and Planning

    Authors: Yuehao Song, Shaoyu Chen, Hao Gao, Yifan Zhu, Weixiang Yue, Jialv Zou, Bo Jiang, Zihao Lu, Yu Wang, Qian Zhang, Xinggang Wang

    Abstract: Vision-language models (VLMs) enhance the planning capability of end-to-end (E2E) driving policy by leveraging high-level semantic reasoning. However, existing approaches often overlook the dual-system consistency between VLM's high-level decision and E2E's low-level planning. As a result, the generated trajectories may misalign with the intended driving decisions, leading to weakened top-down gui… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 15 pages, 8 figures. Project page: https://ambitious-idiot.github.io/senna2-project

  38. arXiv:2603.10337  [pdf, ps, other

    cs.GR

    Landmark Guided 4D Facial Expression Generation

    Authors: Xin Lu, Zhengda Lu, Yiqun Wang, Jun Xiao

    Abstract: In this paper, we proposed a generative model that learns to synthesize the 4D facial expression with the neutral landmark. Existing works mainly focus on the generation of sequences guided by expression labels, speech, etc, while they are not robust to the change of different identities. Our LM-4DGAN utilizes neutral landmarks to guide the facial expression generation while adding an identity dis… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  39. arXiv:2603.10326  [pdf, ps, other

    cs.GR

    FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing

    Authors: Xin Lu, Chuanqing Zhuang. Zhengda Lu, Yiqun Wang, Jun Xiao

    Abstract: 4D facial expression synthesizing is a critical problem in the fields of computer vision and graphics. Current methods lack flexibility and smoothness when simulating the inter-frame motion of expression sequences. In this paper, we propose a frequency-controlled 4D facial expression synthesizing method, FC-4DFS. Specifically, we introduce a frequency-controlled LSTM network to generate 4D facial… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  40. arXiv:2603.09565  [pdf, ps, other

    cs.RO

    ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly

    Authors: Minchi Ruan, LiangQing Zhou, Hongtong Li, Zongtao Wang, ZhaoMing Lu, Jianwei Zhang, Bin Fang

    Abstract: Precision assembly requires sub-millimeter corrections in contact-rich "last-millimeter" regions where visual feedback fails due to occlusion from the end-effector and workpiece. We present ReTac-ACT (Reconstruction-enhanced Tactile ACT), a vision-tactile imitation learning policy that addresses this challenge through three synergistic mechanisms: (i) bidirectional cross-attention enabling recipro… ▽ More

    Submitted 18 March, 2026; v1 submitted 10 March, 2026; originally announced March 2026.

  41. arXiv:2603.09541  [pdf, ps, other

    cs.CV cs.MM

    Memory-Guided View Refinement for Dynamic Human-in-the-loop EQA

    Authors: Xin Lu, Rui Li, Xun Huang, Weixin Li, Chuanqing Zhuang, Jiayuan Li, Zhengda Lu, Jun Xiao, Yunhong Wang

    Abstract: Embodied Question Answering (EQA) has traditionally been evaluated in temporally stable environments where visual evidence can be accumulated reliably. However, in dynamic, human-populated scenes, human activities and occlusions introduce significant perceptual non-stationarity: task-relevant cues are transient and view-dependent, while a store-then-retrieve strategy over-accumulates redundant evi… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  42. arXiv:2603.09358  [pdf, ps, other

    cs.CR

    ProvAgent: Threat Detection Based on Identity-Behavior Binding and Multi-Agent Collaborative Attack Investigation

    Authors: Wenhao Yan, Ning An, Linxu Li, Bingsheng Bi, Bo Jiang, Zhigang Lu, Baoxu Liu, Junrong Liu, Cong Dong

    Abstract: Advanced Persistent Threats (APTs) pose critical challenges to modern cybersecurity due to their multi-stage and stealthy nature. While provenance-based detection approaches show promise in capturing causal attack semantics, current threat provenance practices face two paradoxical issues: (1) expert skepticism, where human analysts doubt the capability of traditional detection models to identify c… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

    Comments: The code of ProvAgent is publicly available at \url{https://github.com/Win7ery/ProvAgent}

  43. arXiv:2603.08812  [pdf, ps, other

    cs.CV

    VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model

    Authors: Jinxiang Lai, Wenzhe Zhao, Zexin Lu, Hualei Zhang, Qinyu Yang, Rongwei Quan, Zhimin Li, Shuai Shao, Song Guo, Qinglin Lu

    Abstract: Visual content generation has advanced from single-image to multi-image workflows, yet existing agents remain largely plan-driven and lack systematic reflection mechanisms to correct mid-trajectory visual errors. To address this limitation, we propose VisionCreator-R1, a native visual generation agent with explicit reflection, together with a Reflection-Plan Co-Optimization (RPCO) training methodo… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  44. arXiv:2603.06401  [pdf, ps, other

    eess.SP cs.LG

    U6G XL-MIMO Radiomap Prediction: Multi-Config Dataset and Beam Map Approach

    Authors: Xiaojie Li, Yu Han, Zhizheng Lu, Shi Jin, Chao-Kai Wen

    Abstract: The upper 6 GHz (U6G) band with XL-MIMO is a key enabler for sixth-generation wireless systems, yet intelligent radiomap prediction for such systems remains challenging. Existing datasets support only small-scale arrays (up to 8x8) with predominantly isotropic antennas, far from the 1024-element directional arrays envisioned for 6G. Moreover, current methods encode array configurations as scalar p… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

    Comments: This work has been submitted to the IEEE for possible publication

  45. arXiv:2603.06256  [pdf, ps, other

    cs.CV cs.AI

    GazeMoE: Perception of Gaze Target with Mixture-of-Experts

    Authors: Zhuangzhuang Dai, Zhongxi Lu, Vincent G. Zakka, Luis J. Manso, Jose M Alcaraz Calero, Chen Li

    Abstract: Estimating human gaze target from visible images is a critical task for robots to understand human attention, yet the development of generalizable neural architectures and training paradigms remains challenging. While recent advances in pre-trained vision foundation models offer promising avenues for locating gaze targets, the integration of multi-modal cues -- including eyes, head poses, gestures… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

    Comments: 8 pages, 3 figures, ICRA 2026

  46. arXiv:2603.05385  [pdf, ps, other

    cs.RO eess.SY

    Accelerating Sampling-Based Control via Learned Linear Koopman Dynamics

    Authors: Wenjian Hao, Yuxuan Fang, Zehui Lu, Shaoshuai Mou

    Abstract: This paper presents an efficient model predictive path integral (MPPI) control framework for systems with complex nonlinear dynamics. To improve the computational efficiency of classic MPPI while preserving control performance, we replace the nonlinear dynamics used for trajectory propagation with a learned linear deep Koopman operator (DKO) model, enabling faster rollout and more efficient trajec… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  47. arXiv:2603.05308  [pdf, ps, other

    cs.CL cs.AI

    Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

    Authors: Qiao Jin, Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong, Zhizheng Wang, Nicholas Wan, Joey Chan, Donald C. Comeau, Robert Leaman, Charalampos S. Floudas, Aidong Zhang, Michael F. Chiang, Yifan Peng, Zhiyong Lu

    Abstract: Assessing whether an article supports an assertion is essential for hallucination detection and claim verification. While large language models (LLMs) have the potential to automate this task, achieving strong performance requires frontier models such as GPT-5 that are prohibitively expensive to deploy at scale. To efficiently perform biomedical evidence attribution, we present Med-V1, a family of… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  48. arXiv:2603.04299  [pdf, ps, other

    cs.CL

    The Company You Keep: How LLMs Respond to Dark Triad Traits

    Authors: Zeyi Lu, Angelica Henestrosa, Pavel Chizhov, Ivan P. Yamshchikov

    Abstract: Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompt… ▽ More

    Submitted 7 April, 2026; v1 submitted 4 March, 2026; originally announced March 2026.

    ACM Class: J.4; K.4.1; K.4.2; I.2.7; I.2.0

  49. arXiv:2603.04022  [pdf, ps, other

    cs.CV

    Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation

    Authors: Zilin Lu, Ruifeng Yuan, Weiwei Cao, Wanxing Chang, Zhongyu Wei, Sinuo Wang, Yong Xia, Ling Zhang, Jianpeng Zhang

    Abstract: Radiologists highly desire fully automated AI for radiology report generation (R2G), yet existing approaches fall short in clinical utility. Reinforcement learning (RL) holds potential to address these shortcomings, but its adoption in this task remains underexplored. In this paper, we revisit RL in terms of data efficiency and optimization effectiveness for R2G tasks. First, we explore the impact… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  50. arXiv:2603.02964  [pdf, ps, other

    cs.CV

    Improving Anomaly Detection with Foundation-Model Synthesis and Wavelet-Domain Attention

    Authors: Wensheng Wu, Zheming Lu, Ziqian Lu, Zewei He, Xuecheng Sun, Zhao Wang, Jungong Han, Yunlong Yu

    Abstract: Industrial anomaly detection faces significant challenges due to the scarcity of anomalous samples and the complexity of real-world anomalies. In this paper, we propose a foundation model-based anomaly synthesis pipeline (FMAS) that generates highly realistic anomalous samples without fine-tuning or class-specific training. Motivated by the distinct frequency-domain characteristics of anomalies, w… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.