Skip to main content

Showing 1–50 of 1,383 results for author: Du, J

.
  1. arXiv:2604.07884  [pdf, ps, other

    cs.CV cs.AI

    Reinforcement-Guided Synthetic Data Generation for Privacy-Sensitive Identity Recognition

    Authors: Xuemei Jia, Jiawei Du, Hui Wei, Jun Chen, Joey Tianyi Zhou, Zheng Wang

    Abstract: High-fidelity generative models are increasingly needed in privacy-sensitive scenarios, where access to data is severely restricted due to regulatory and copyright constraints. This scarcity hampers model development--ironically, in settings where generative models are most needed to compensate for the lack of data. This creates a self-reinforcing challenge: limited data leads to poor generative m… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  2. arXiv:2604.07413  [pdf, ps, other

    cs.CV cs.AI cs.LG

    FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios

    Authors: Xiangru Jian, Hao Xu, Wei Pang, Xinjian Zhao, Chengyu Tao, Qixin Zhang, Xikun Zhang, Chao Zhang, Guanzhi Deng, Alex Xue, Juan Du, Tianshu Yu, Garth Tarr, Linqi Song, Qiuzhuang Sun, Dacheng Tao

    Abstract: The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution, yet current evaluations fail to reflect the rigorous demands of real-world manufacturing environments. Progress is hindered by data scarcity and a lack of fine-grained domain semantics in existing datasets. To bridge this gap, we introduce FORGE.… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: Project Page:https://ai4manufacturing.github.io/forge-web

  3. arXiv:2604.06747  [pdf

    cs.AI

    TurboAgent: An LLM-Driven Autonomous Multi-Agent Framework for Turbomachinery Aerodynamic Design

    Authors: Juan Du, Yueteng Wu, Pan Zhao, Yuze Liu, Min Zhang, Xiaobin Xu, Xinglong Zhang

    Abstract: The aerodynamic design of turbomachinery is a complex and tightly coupled multi-stage process involving geometry generation, performance prediction, optimization, and high-fidelity physical validation. Existing intelligent design approaches typically focus on individual stages or rely on loosely coupled pipelines, making fully autonomous end-to-end design challenging. To address this issue, this s… ▽ More

    Submitted 8 April, 2026; v1 submitted 8 April, 2026; originally announced April 2026.

  4. arXiv:2604.05632  [pdf, ps, other

    cs.CV

    SGANet: Semantic and Geometric Alignment for Multimodal Multi-view Anomaly Detection

    Authors: Letian Bai, Chengyu Tao, Juan Du

    Abstract: Multi-view anomaly detection aims to identify surface defects on complex objects using observations captured from multiple viewpoints. However, existing unsupervised methods often suffer from feature inconsistency arising from viewpoint variations and modality discrepancies. To address these challenges, we propose a Semantic and Geometric Alignment Network (SGANet), a unified framework for multimo… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  5. arXiv:2604.04451  [pdf, ps, other

    cs.CV

    Beyond Few-Step Inference: Accelerating Video Diffusion Transformer Model Serving with Inter-Request Caching Reuse

    Authors: Hao Liu, Ye Huang, Chenghuan Huang, Zhenyi Zheng, Jiangsu Du, Ziyang Ma, Jing Lyu, Yutong Lu

    Abstract: Video Diffusion Transformer (DiT) models are a dominant approach for high-quality video generation but suffer from high inference cost due to iterative denoising. Existing caching approaches primarily exploit similarity within the diffusion process of a single request to skip redundant denoising steps. In this paper, we introduce Chorus, a caching approach that leverages similarity across requests… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  6. arXiv:2604.03014  [pdf, ps, other

    cs.IR cs.AI

    User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation

    Authors: Jing Du, Zesheng Ye, Congbo Ma, Feng Liu, Flora. D. Salim

    Abstract: Multi-modal recommendation (MMR) enriches item representations by introducing item content, e.g., visual and textual descriptions, to improve upon interaction-only recommenders. The success of MMR hinges on aligning these content modalities with user preferences derived from interaction data, yet dominant practices based on disentangling modality-invariant preference-driving signals from modality-… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 11 pages, 7 figures, 3 tables

  7. arXiv:2604.01618  [pdf, ps, other

    cs.CV cs.AI

    Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models

    Authors: Jiawei Chen, Simin Huang, Jiawei Du, Shuaihang Chen, Yu Tian, Mingjie Wei, Chao Yu, Zhaoxia Yin

    Abstract: Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adver… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  8. arXiv:2604.01146  [pdf, ps, other

    math.NA

    A deterministic multiple-shift lattice algorithm for function approximation in Korobov and half-period Cosine spaces

    Authors: Jiarui Du, Josef Dick

    Abstract: Approximating multivariate periodic functions in weighted Korobov spaces via rank-1 lattices is fundamentally limited by frequency aliasing. Existing optimal-rate methods rely on randomized constructions or large pre-computations. We propose a fully deterministic multiple-shift lattice algorithm without pre-computation. First, we develop a simplified multiple shift framework for aliased frequency… ▽ More

    Submitted 3 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

    Comments: 32 pages, 12 figures

    MSC Class: 65D15; 65D40; 65M70; 65T50

  9. arXiv:2603.29254  [pdf, ps, other

    cs.RO

    SuperGrasp: Single-View Object Grasping via Superquadric Similarity Matching, Evaluation, and Refinement

    Authors: Lijingze Xiao, Jinhong Du, Yang Cong, Supeng Diao, Yu Ren

    Abstract: Robotic grasping from single-view observations remains a critical challenge in manipulation. Existing methods still struggle to generate stable and valid grasp poses when confronted with incomplete geometric information. To address these limitations, we propose SuperGrasp, a novel two-stage framework for single-view grasping with parallel-jaw grippers that decomposes the grasping process into init… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

  10. arXiv:2603.27703  [pdf, ps, other

    cs.CL cs.LG

    KAT-Coder-V2 Technical Report

    Authors: Fengxiang Li, Han Zhang, Haoyang Huang, Jinghui Wang, Jinhua Hao, Kun Yuan, Mengtong Li, Minglei Zhang, Pengcheng Xu, Wenhao Zhuang, Yizhen Shao, Zongxian Feng, Can Tang, Chao Wang, Chengxiao Tong, Fan Yang, Gang Xiong, Haixuan Gao, Han Gao, Hao Wang, Haochen Liu, Hongliang Sun, Jiabao Li, Jingwen Chang, Jun Du , et al. (21 additional authors not shown)

    Abstract: We present KAT-Coder-V2, an agentic coding model developed by the KwaiKAT team at Kuaishou. KAT-Coder-V2 adopts a "Specialize-then-Unify" paradigm that decomposes agentic coding into five expert domains - SWE, WebCoding, Terminal, WebSearch, and General - each undergoing independent supervised fine-tuning and reinforcement learning, before being consolidated into a single model via on-policy disti… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: 22 pages, 7 figures

  11. arXiv:2603.26864  [pdf, ps, other

    cond-mat.mtrl-sci

    Giant Magnetostriction by Design: A First-Principles Screening of Co-based Heusler Alloys

    Authors: Pengju Wu, Jie Du, Liang Yao, Hang Li, Xiaodong Zhou, Tao Zhu, Wenhong Wang

    Abstract: The pursuit of high-performance, rare-earth-free magnetostrictive materials is crucial for advancing technologies in sensing, actuation, and microelectromechanical systems. Heusler alloys represent a promising, yet underexplored, class of materials for this purpose. In this work, we perform a systematic first-principles investigation of the magnetostrictive properties of 25 Co-based full Heusler a… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: 10 pages, 6 figures

    Journal ref: Phys. Rev. B 112 214446 (2025)

  12. arXiv:2603.25226  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.MA

    WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing

    Authors: Fanheng Kong, Jingyuan Zhang, Yang Yue, Chenxi Sun, Yang Tian, Shi Feng, Xiaocui Yang, Daling Wang, Yu Tian, Jun Du, Wenchong Zeng, Han Li, Kun Gai

    Abstract: The emergence of Large Language Models (LLMs) has catalyzed a paradigm shift in programming, giving rise to "vibe coding", where users can build complete projects and even control computers using natural language instructions. This paradigm has driven automated webpage development, but it introduces a new requirement about how to automatically verify whether the web functionalities are reliably im… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: 24 pages, code: https://github.com/friedrichor/WebTestBench

  13. arXiv:2603.23170  [pdf, ps, other

    astro-ph.GA

    Tentative Detection of the Glycine Isomer Glycolamide in Hot Molecular Core

    Authors: Chunguo Duan, Fengwei Xu, Qian Gou, Xuefang Xu, Donghui Quan, Laurent Pagani, Xi Chen, Jun Kang, Jiaxin Du

    Abstract: Understanding whether prebiotic molecules can endure and reform through the energetic stages of star formation is essential for tracing the continuity of interstellar chemistry toward life. Glycolamide, an isomer of glycine, was recently detected in the molecular cloud G+0.693-0.027. However, establishing its presence in warm, high-density environments is crucial to evaluate the chemical continuit… ▽ More

    Submitted 6 April, 2026; v1 submitted 24 March, 2026; originally announced March 2026.

    Comments: 9 pages, 5 figures, 2 tables, accepted for publication in A&A

  14. arXiv:2603.22819  [pdf, ps, other

    cs.CV cs.AI

    TDATR: Improving End-to-End Table Recognition via Table Detail-Aware Learning and Cell-Level Visual Alignment

    Authors: Chunxia Qin, Chenyu Liu, Pengcheng Xia, Jun Du, Baocai Yin, Bing Yin, Cong Liu

    Abstract: Tables are pervasive in diverse documents, making table recognition (TR) a fundamental task in document analysis. Existing modular TR pipelines separately model table structure and content, leading to suboptimal integration and complex workflows. End-to-end approaches rely heavily on large-scale TR data and struggle in data-constrained scenarios. To address these issues, we propose TDATR (Table De… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: Acceptd by CVPR 2026. Project Page: https://github.com/Chunchunwumu/TDATR.git

  15. arXiv:2603.21957  [pdf, ps, other

    cs.CV

    Unified Spatiotemporal Token Compression for Video-LLMs at Ultra-Low Retention

    Authors: Junhao Du, Jialong Xue, Anqi Li, Jincheng Dai, Guo Lu

    Abstract: Video large language models (Video-LLMs) face high computational costs due to large volumes of visual tokens. Existing token compression methods typically adopt a two-stage spatiotemporal compression strategy, relying on stage-specific metrics and an implicit assumption of spatiotemporal separability. Under extremely low retention ratios, however, such approaches often result in unbalanced allocat… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  16. arXiv:2603.21195  [pdf, ps, other

    cs.RO

    GAPG: Geometry Aware Push-Grasping Synergy for Goal-Oriented Manipulation in Clutter

    Authors: Lijingze Xiao, Jinhong Du, Yang Cong, Supeng Diao, Yu Ren

    Abstract: Grasping target objects is a fundamental skill for robotic manipulation, but in cluttered environments with stacked or occluded objects, a single-step grasp is often insufficient. To address this, previous work has introduced pushing as an auxiliary action to create graspable space. However, these methods often struggle with both stability and efficiency because they neglect the scene's geometric… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: Accepted to ICRA 2026

  17. arXiv:2603.20731  [pdf, ps, other

    cs.CV

    VSD-MOT: End-to-End Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Distillation

    Authors: Jun Du

    Abstract: Existing multi-object tracking algorithms typically fail to adequately address the issues in low-quality videos, resulting in a significant decline in tracking performance when image quality deteriorates in real-world scenarios. This performance degradation is primarily due to the algorithms' inability to effectively tackle the problems caused by information loss in low-quality images. To address… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

  18. arXiv:2603.20307  [pdf, ps, other

    cs.CV cs.AI cs.MM cs.SD

    EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control

    Authors: Yuzhe Weng, Haotian Wang, Yuanhong Yu, Jun Du, Shan He, Xiaoyan Wu, Haoran Xu

    Abstract: Audio-driven talking head generation aims to create vivid and realistic videos from a static portrait and speech. Existing AR-based methods rely on intermediate facial representations, which limit their expressiveness and realism. Meanwhile, diffusion-based methods generate clip-by-clip, lacking fine-grained control and causing inherent latency due to overall denoising across the window. To addres… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  19. arXiv:2603.20238  [pdf, ps, other

    eess.SY cs.IT

    Joint Trajectory, RIS, and Computation Offloading Optimization via Decentralized Model-Based PPO in Urban Multi-UAV Mobile Edge Computing

    Authors: Liangshun Wu, Jianbo Du, Junsuo Qu

    Abstract: Efficient computation offloading in multi-UAV edge networks becomes particularly challenging in dense urban areas, where line-of-sight (LoS) links are frequently blocked and user demand varies rapidly. Reconfigurable intelligent surfaces (RISs) can mitigate blockage by creating controllable reflected links, but realizing their potential requires tightly coupled decisions on UAV trajectories, offlo… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  20. arXiv:2603.19005  [pdf, ps, other

    cs.LG cs.AI stat.ME

    AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

    Authors: An Luo, Jin Du, Xun Xian, Robert Specht, Fangqiao Tian, Ganghua Wang, Xuan Bi, Charles Fleming, Ashish Kundu, Jayanth Srinivasa, Mingyi Hong, Rui Zhang, Tianxi Li, Galin Jones, Jie Ding

    Abstract: Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    MSC Class: 62-07; 62-08; 68T05; 68T07; 68T01; 68T50 ACM Class: I.2.0; I.2.6; I.2.7; I.5.1; I.5.4; H.2.8; G.3

  21. arXiv:2603.15701  [pdf

    physics.plasm-ph

    The properties of plasma sheath containing the primary electrons with a Cairns-distribution

    Authors: Yida Zhang, Jiulin Du

    Abstract: We study the properties of plasma sheath containing the cold positive ions, the secondary electrons, and the primary electrons with a Cairns-distribution (a non-thermal velocity-distribution). We derive the generalized Bohm criterion and Bohm speed, the new floating potential at the wall, and the new critical secondary electron emission coefficient. We show that these properties of the plasma shea… ▽ More

    Submitted 18 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

    Comments: 13 pages, 7 figures, 25 references

    Journal ref: Entropy 28 (2026) 237

  22. arXiv:2603.15386  [pdf, ps, other

    cs.CV cs.AI

    RieMind: Geometry-Grounded Spatial Agent for Scene Understanding

    Authors: Fernando Ropero, Erkin Turkoz, Daniel Matos, Junqing Du, Antonio Ruiz, Yanfeng Zhang, Lu Liu, Mingwei Sun, Yongliang Wang

    Abstract: Visual Language Models (VLMs) have increasingly become the main paradigm for understanding indoor scenes, but they still struggle with metric and spatial reasoning. Current approaches rely on end-to-end video understanding or large-scale spatial question answering fine-tuning, inherently coupling perception and reasoning. In this paper, we investigate whether decoupling perception and reasoning le… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  23. arXiv:2603.15169  [pdf, ps, other

    cs.RO

    ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation

    Authors: Yang Li, Zhaxizhuoma, Hongru Jiang, Junjie Xia, Hongquan Zhang, Jinda Du, Yunsong Zhou, Jia Zeng, Ce Hao, Jieji Ren, Qiaojun Yu, Cewu Lu, Yu Qiao, Jiangmiao Pang

    Abstract: Embodied intelligence for contact-rich manipulation has predominantly relied on position control, while explicit awareness and regulation of interaction forces remain under-explored, limiting stability, precision, and robustness in real-world tasks. We propose ForceVLA2, an end-to-end vision-language-action framework that equips robots with hybrid force-position control and explicit force awarenes… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  24. arXiv:2603.15060  [pdf

    astro-ph.SR cond-mat.stat-mech

    The Chandrasekhar's Conditions as Equilibrium and Stability of Stars in a Universal Three-Parameter Non-Maxwell Distribution

    Authors: Wei Hu, Jiulin Du

    Abstract: The idea of the Chandrasekhar's conditions as equilibrium and stability of stars is revisited with a new universal three-parameter non-Maxwell distribution. We derive the maximum radiation pressures in the non-Maxwell distribution for a gas star and a centrally-condensed star, respectively, and thus we generalize the Chandrasekhar's conditions in a Maxwellian sense. By numerical analyses, we find… ▽ More

    Submitted 16 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

    Comments: 12 pages, 3 tables, 3 figures, 31 references

    Journal ref: Entropy 27 (2025) 470

  25. arXiv:2603.14354  [pdf, ps, other

    cs.LG cs.AI cs.RO

    Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

    Authors: Jiayuan Du, Yuebing Song, Yiming Zhao, Xianghui Pan, Jiawei Lian, Yuchu Lu, Liuyi Wang, Chengju Liu, Qijun Chen

    Abstract: End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded Lifelong Learning framework that integrates a Dirichlet process mixture model (… ▽ More

    Submitted 30 March, 2026; v1 submitted 15 March, 2026; originally announced March 2026.

  26. arXiv:2603.14317  [pdf, ps, other

    eess.SP

    AI/ML for mobile networks: Current status in Rel. 19 and challenges ahead

    Authors: Yuan Gao, Xinyi Wu, Jun Jiang, Bintao Hu, Jianbo Du, Qiang Ye, Shunqing Zhang, F. Richard Yu, Shugong Xu

    Abstract: The transformative power of artificial intelligence (AI) and machine learning (ML) is recognized as a key enabler for sixth generation (6G) mobile networks by both academia and industry. Research on AI/ML in mobile networks has been ongoing for years, and the 3rd generation partnership project (3GPP) launched standardization efforts to integrate AI into mobile networks. However, a comprehensive re… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

  27. arXiv:2603.14265  [pdf, ps, other

    cs.CL cs.MA

    MedPriv-Bench: Benchmarking the Privacy-Utility Trade-off of Large Language Models in Medical Open-End Question Answering

    Authors: Shaowei Guan, Yu Zhai, Hin Chi Kwok, Jiawei Du, Xinyu Feng, Jing Li, Harry Qin, Vivian Hui

    Abstract: Recent advances in Retrieval-Augmented Generation (RAG) have enabled large language models (LLMs) to ground outputs in clinical evidence. However, connecting LLMs with external databases introduces the risk of contextual leakage: a subtle privacy threat where unique combinations of medical details enable patient re-identification even without explicit identifiers. Current benchmarks in healthcare… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: 17 pages, 5 figures

  28. arXiv:2603.13839  [pdf, ps, other

    cs.CE

    NetSpatial: Spatially Conditional Traffic Generation for Cellular Planning and Operations

    Authors: Shiyuan Zhang, Jiale Du, Yuanwei Liu, Kaibin Huang, Hongyang Du

    Abstract: Base station (BS) deployment and operation are fundamental to network performance, yet they require accurate demand understanding, which remains difficult for operators. Cellular traffic in dense urban regions is well measured but highly dynamic, which undermines prediction-based management, whereas the scarcity of traffic measurements in emerging regions limits informed deployment decisions. Exis… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

    Comments: 15 pages, 12 figures

  29. arXiv:2603.09063  [pdf, ps, other

    physics.app-ph physics.comp-ph

    A Stable, High-Order Time-Stepping Scheme for the Drift-Diffusion Model in Modern Solar Cell Simulation

    Authors: Jun Du, Jun Yan

    Abstract: This paper presents a one-dimensional transient drift--diffusion simulator for advanced solar cells, integrating a structure-preserving finite-volume spatial discretization with Scharfetter--Gummel--type fluxes and a high-order, L-stable implicit Runge--Kutta (Radau IIA) temporal integrator. The scheme ensures local charge conservation, handles sharp material interfaces, and achieves second-order… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

    Comments: 16 pages, 12 figures, 5 tables

  30. arXiv:2603.07734  [pdf, ps, other

    math.CO

    The orthogonal connectedness of polyhedral surfaces

    Authors: Julia Q. Du, Xuemei He, Xiaotian Song, Daniela Stiller, Liping Yuan, Tudor Zamfirescu

    Abstract: Using the orthogonal connectedness, we introduce the notion of orthogonal decomposability of convex polytopes and study it in the case of Platonic and Archimedean solids. While doing so, we also encounter polytopes which are not orthogonally decomposable.

    Submitted 8 March, 2026; originally announced March 2026.

    Comments: 16 pages, 12 figures

    MSC Class: 52B10; 53A05; 54B15

  31. arXiv:2603.07327  [pdf, ps, other

    physics.med-ph

    Extending gPET for Multi-Layer PET Simulation

    Authors: Satzhan Sitmukhambetov, Junwei Du, Mingwu Jin, Yujie Chi

    Abstract: Depth-of-interaction (DOI) encoding is an effective strategy for reducing parallax error and preserving spatial resolution in positron emission tomography (PET), particularly in compact small-animal scanners. To enable efficient simulation-driven design of DOI-capable systems, we extend the GPU-accelerated Monte Carlo toolkit gPET to support flexible multi-layer detector geometries. The original t… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

  32. arXiv:2603.07131  [pdf, ps, other

    cs.CV cs.AI

    Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

    Authors: Shuai Lu, Meng Wang, Jia Guo, Jiawei Du, Bo Liu, Shengzhu Yang, Weihang Zhang, Huazhu Fu, Huiqi Li

    Abstract: Large Vision Language Models (LVLMs) show immense potential for automated ophthalmic diagnosis. However, their clinical deployment is severely hindered by lacking domain-specific knowledge. In this work, we identify two structural deficiencies hindering reliable medical reasoning: 1) the Perception Gap, where general-purpose visual encoders fail to resolve fine-grained pathological cues (e.g., mic… ▽ More

    Submitted 19 March, 2026; v1 submitted 7 March, 2026; originally announced March 2026.

  33. arXiv:2603.05786  [pdf, ps, other

    cs.CR cs.AI cs.CL

    Proof-of-Guardrail in AI Agents and What (Not) to Trust from It

    Authors: Xisen Jin, Michael Duan, Qin Lin, Aaron Chan, Zhenglun Chen, Junyi Du, Xiang Ren

    Abstract: As AI agents become widely deployed as online services, users often rely on an agent developer's claim about how safety is enforced, which introduces a threat where safety measures are falsely advertised. To address the threat, we propose proof-of-guardrail, a system that enables developers to provide cryptographic proof that a response is generated after a specific open-source guardrail. To gener… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

    Comments: 8 pages

  34. arXiv:2603.05552  [pdf, ps, other

    cs.RO

    TEGA: A Tactile-Enhanced Grasping Assistant for Assistive Robotics via Sensor Fusion and Closed-Loop Haptic Feedback

    Authors: Hengxu You, Tianyu Zhou, Fang Xu, Kaleb Smith, Eric Jing Du

    Abstract: Recent advances in teleoperation have enabled sophisticated manipulation of dexterous robotic hands, with most systems concentrating on guiding finger positions to achieve desired grasp configurations. However, while accurate finger positioning is essential, it often overlooks the equally critical task of grasp force modulation, vital for handling objects of diverse hardness, texture, and shape. T… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: Accepted to include in ICRA 2026

  35. arXiv:2603.01415  [pdf, ps, other

    eess.AS

    The USTC-NERCSLIP Systems for the CHiME-9 MCoRec Challenge

    Authors: Ya Jiang, Ruoyu Wang, Jingxuan Zhang, Jun Du, Yi Han, Zihao Quan, Hang Chen, Yeran Yang, Kongzhi Zheng, Zhuo Chen, Yanhui Tu, Shutong Niu, Changfeng Xi, Mengzhi Wang, Zhongbin Wu, Jieru Chen, Henghui Zhi, Weiyi Shi, Shuhang Wu, Genshun Wan, Jia Pan, Jianqing Gao

    Abstract: This report details our submission to the CHiME-9 MCoRec Challenge on recognizing and clustering multiple concurrent natural conversations within indoor social settings. Unlike conventional meetings centered on a single shared topic, this scenario contains multiple parallel dialogues--up to eight speakers across up to four simultaneous conversations--with a speech overlap rate exceeding 90%. To ta… ▽ More

    Submitted 1 March, 2026; originally announced March 2026.

  36. arXiv:2603.00691  [pdf, ps, other

    cs.AI

    AIoT-based Continuous, Contextualized, and Explainable Driving Assessment for Older Adults

    Authors: Yimeng Liu, Fangwei Zhang, Maolin Gan, Jialuo Du, Jingkai Lin, Yawen Wang, Fei Sun, Honglei Chen, Linda Hill, Ruofeng Liu, Tianxing Li, Zhichao Cao

    Abstract: The world is undergoing a major demographic shift as older adults become a rapidly growing share of the population, creating new challenges for driving safety. In car-dependent regions such as the United States, driving remains essential for independence, access to services, and social participation. At the same time, aging can introduce gradual changes in vision, attention, reaction time, and dri… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

  37. arXiv:2602.22520  [pdf, ps, other

    cs.LG

    TEFL: Prediction-Residual-Guided Rolling Forecasting for Multi-Horizon Time Series

    Authors: Xiannan Huang, Shen Fang, Shuhan Qiu, Chengcheng Yu, Jiayuan Du, Chao Yang

    Abstract: Time series forecasting plays a critical role in domains such as transportation, energy, and meteorology. Despite their success, modern deep forecasting models are typically trained to minimize point-wise prediction loss without leveraging the rich information contained in past prediction residuals from rolling forecasts - residuals that reflect persistent biases, unmodeled patterns, or evolving d… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  38. arXiv:2602.21625  [pdf, ps, other

    cs.RO

    Tacmap: Bridging the Tactile Sim-to-Real Gap via Geometry-Consistent Penetration Depth Map

    Authors: Lei Su, Zhijie Peng, Renyuan Ren, Shengping Mao, Juan Du, Kaifeng Zhang, Xuezhou Zhu

    Abstract: Vision-Based Tactile Sensors (VBTS) are essential for achieving dexterous robotic manipulation, yet the tactile sim-to-real gap remains a fundamental bottleneck. Current tactile simulations suffer from a persistent dilemma: simplified geometric projections lack physical authenticity, while high-fidelity Finite Element Methods (FEM) are too computationally prohibitive for large-scale reinforcement… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

    Comments: 8 pages

  39. arXiv:2602.19392  [pdf, ps, other

    cs.LG cs.SI

    Spiking Graph Predictive Coding for Reliable OOD Generalization

    Authors: Jing Ren, Jiapeng Du, Bowen Li, Ziqi Xu, Xin Zheng, Hong Jia, Suyu Ma, Xiwei Xu, Feng Xia

    Abstract: Graphs provide a powerful basis for modeling Web-based relational data, with expressive GNNs to support the effective learning in dynamic web environments. However, real-world deployment is hindered by pervasive out-of-distribution (OOD) shifts, where evolving user activity and changing content semantics alter feature distributions and labeling criteria. These shifts often lead to unstable or over… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

    Comments: 12 pages, 6 figures, WWW26, Dubai, United Arab Emirates

  40. arXiv:2602.17607  [pdf, ps, other

    cs.AI cs.LG math.NA

    AutoNumerics: An Autonomous, PDE-Agnostic Multi-Agent Pipeline for Scientific Computing

    Authors: Jianda Du, Youran Sun, Haizhao Yang

    Abstract: PDEs are central to scientific and engineering modeling, yet designing accurate numerical solvers typically requires substantial mathematical expertise and manual tuning. Recent neural network-based approaches improve flexibility but often demand high computational cost and suffer from limited interpretability. We introduce \texttt{AutoNumerics}, a multi-agent framework that autonomously designs,… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

  41. arXiv:2602.16158  [pdf

    cond-mat.mtrl-sci

    Dislocation-ledge coupling drives non-conservative migration of semicoherent precipitate interfaces

    Authors: Jin-Yu Zhang, Juan Du, Lin Yang, Frédéric Mompiou, Shigenobu Ogata, Wen-Zheng Zhang

    Abstract: Precipitate shape and size control the strength and stability of many structural alloys, yet the microscopic mechanism by which semicoherent precipitate interfaces migrate remains unclear. In particular, how dense interfacial dislocation networks move while accommodating transformation strain has resisted direct, time-resolved characterization. Here, we show that non-conservative motion of interfa… ▽ More

    Submitted 17 February, 2026; originally announced February 2026.

  42. arXiv:2602.15763  [pdf, ps, other

    cs.LG cs.CL

    GLM-5: from Vibe Coding to Agentic Engineering

    Authors: GLM-5-Team, :, Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chenghua Huang, Chengxing Xie, Chenzheng Zhu, Congfeng Yin, Cunxiang Wang, Gengzheng Pan, Hao Zeng, Haoke Zhang, Haoran Wang, Huilong Chen, Jiajie Zhang, Jian Jiao, Jiaqi Guo, Jingsen Wang, Jingzhao Du , et al. (162 additional authors not shown)

    Abstract: We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. To advance model alignment and autonomy, we implement a new asynchronous… ▽ More

    Submitted 24 February, 2026; v1 submitted 17 February, 2026; originally announced February 2026.

  43. arXiv:2602.14920  [pdf

    cond-mat.mtrl-sci physics.chem-ph

    Sub-1-Angstrom-Resolution Imaging Reveals Phase Contrast Transition in Ice Ih Caused by Basal Stacking Faults

    Authors: Jingshan S. Du, Suvo Banik, Lehan Yao, Shuai Zhang, Subramanian K. R. S. Sankaranarayanan, James J. De Yoreo

    Abstract: Phase-contrast transmission electron microscopy (TEM) of hexagonal ice (Ih) along [0001] sometimes shows a honeycomb-like pattern, often interpreted as individual oxygen columns in single crystals. Here, we show that this pattern commonly arises from intrinsic basal stacking faults instead. A translational boundary separating domains of comparable thickness, with an in-plane offset of… ▽ More

    Submitted 23 February, 2026; v1 submitted 16 February, 2026; originally announced February 2026.

    Comments: 17 pages, 4 figures, and 2 appendices. Including a Supplemental Material with 2 figures

  44. arXiv:2602.14483  [pdf, ps, other

    math.CO math.NT

    Modular Nahm sums for symmetrizable matrices of indices $({2,\ldots, 2},1)$ and $({1,\ldots, 1},2)$

    Authors: Julia Q. D. Du, Kathy Q. Ji, Erin Y. Y. Shen, Clara X. Y. Xu

    Abstract: In this paper, we present three families of modular Nahm sums for symmetrizable matrices with arbitrary rank $r\geq 2$ of indices $({2,\ldots, 2},1)$ and $({1,\ldots, 1},2)$. Specifically, the cases corresponding to $r = 2$ and $r = 3$ of these families have been previously demonstrated by Mizuno, Warnaar, and B. Wang-L. Wang. Building upon these three families, we construct two vector-valued auto… ▽ More

    Submitted 6 March, 2026; v1 submitted 16 February, 2026; originally announced February 2026.

    Comments: 59 pages

  45. arXiv:2602.12068  [pdf, ps, other

    cond-mat.mtrl-sci

    Stacking theory for bilayer two-dimensional magnets

    Authors: Jun-Xi Du, Sike Zeng, Yu-Jun Zhao

    Abstract: Two-dimensional unconventional magnetism has recently attracted growing interest due to its intriguing physical properties and promising applications in spintronics. However, existing studies on stacking-induced unconventional magnetism mainly focus on specific materials and stacking configurations. Here, we develop a general symmetry-based stacking theory for two-dimensional magnets. We first int… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

  46. arXiv:2602.11003  [pdf, ps, other

    physics.chem-ph

    Eliminating Delocalization Error through Localized Orbital Scaling Correction with Orbital Relaxation from Linear Response

    Authors: Yichen Fan, Jincheng Yu, Jiayi Du, Weitao Yang

    Abstract: Despite the great success Kohn-Sham density functional theory (KS-DFT) has achieved, the delocalization error remains a major challenge for commonly used density functional approximations (DFAs), resulting in systematic errors in ionization energies, electron affinities, band structures, and charge distributions. A recently developed localized orbital scaling correction (LOSC) method, namely linea… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

  47. arXiv:2602.09685  [pdf, ps, other

    eess.SP

    Generalizable and Robust Beam Prediction for 6G Networks: An Deep-Learning Framework with Positioning Feature Fusion

    Authors: Yanliang Jin, Yunfan Li, Jiang Jun, Yuan Gao, Shengli Liu, Jianbo Du, Zhaohui Yang, Shugong Xu

    Abstract: Beamforming (BF) is essential for enhancing system capacity in fifth generation (5G) and beyond wireless networks, yet exhaustive beam training in ultra-massive multiple-input multiple-output (MIMO) systems incurs substantial overhead. To address this challenge, we propose a deep learning based framework that leverages position-aware features to improve beam prediction accuracy while reducing trai… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

  48. arXiv:2602.05391  [pdf, ps, other

    cs.CV

    Dataset Distillation via Relative Distribution Matching and Cognitive Heritage

    Authors: Qianxin Xia, Jiawei Du, Yuhan Zhang, Jielei Wang, Guoming Lu

    Abstract: Dataset distillation seeks to synthesize a highly compact dataset that achieves performance comparable to the original dataset on downstream tasks. For the classification task that use pre-trained self-supervised models as backbones, previous linear gradient matching optimizes synthetic images by encouraging them to mimic the gradient updates induced by real images on the linear classifier. Howeve… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

  49. arXiv:2602.04705  [pdf, ps, other

    cs.CL

    ERNIE 5.0 Technical Report

    Authors: Haifeng Wang, Hua Wu, Tian Wu, Yu Sun, Jing Liu, Dianhai Yu, Yanjun Ma, Jingzhou He, Zhongjun He, Dou Hong, Qiwen Liu, Shuohuan Wang, Junyuan Shang, Zhenyu Zhang, Yuchen Ding, Jinle Zeng, Jiabin Yang, Liang Shen, Ruibiao Chen, Weichong Yin, Siyu Ding, Dai Dai, Shikun Feng, Siqi Bao, Bolei He , et al. (413 additional authors not shown)

    Abstract: In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practi… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  50. arXiv:2601.22776  [pdf, ps, other

    cs.AI

    TSPO: Breaking the Double Homogenization Dilemma in Multi-turn Search Policy Optimization

    Authors: Shichao Ma, Zhiyuan Ma, Ming Yang, Xiaofan Li, Xing Wu, Jintao Du, Yu Cheng, Weiqiang Wang, Qiliang Liu, Zhengyang Zhou, Yang Wang

    Abstract: Multi-turn tool-integrated reasoning enables Large Language Models (LLMs) to solve complex tasks through iterative information retrieval. However, current reinforcement learning (RL) frameworks for search-augmented reasoning predominantly rely on sparse outcome-level rewards, leading to a "Double Homogenization Dilemma." This manifests as (1) Process homogenization, where the thinking, reasoning,… ▽ More

    Submitted 6 April, 2026; v1 submitted 30 January, 2026; originally announced January 2026.