Skip to main content

Showing 1–50 of 1,141 results for author: Xu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.07888  [pdf, ps, other

    cs.LG

    Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs

    Authors: Binxing Xu, Hao Gu, Lujun Li, Hao Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Xintong Yang, Chao Li, Sirui Han, Yike Guo

    Abstract: Training LLMs at ultra-low precision remains a formidable challenge. Direct low-bit QAT often suffers from convergence instability and substantial training costs, exacerbated by quantization noise from heavy-tailed outlier channels and error accumulation across layers. To address these issues, we present Bit-by-Bit, a progressive QAT framework with outlier channel splitting. Our approach integrate… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  2. arXiv:2604.07853  [pdf, ps, other

    cs.LG cs.AI

    QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch

    Authors: Hao Gu, Hao Wang, Jiacheng Liu, Lujun Li, Qiyuan Zhu, Bei Liu, Binxing Xu, Lei Wang, Xintong Yang, Sida Lin, Sirui Han, Yike Guo

    Abstract: Large language model (LLM) reinforcement learning (RL) pipelines are often bottlenecked by rollout generation, making end-to-end training slow. Recent work mitigates this by running rollouts with quantization to accelerate decoding, which is the most expensive stage of the RL loop. However, these setups destabilize optimization by amplifying the training-inference gap: rollouts are operated at low… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  3. arXiv:2604.07720  [pdf, ps, other

    cs.AI

    Towards Knowledgeable Deep Research: Framework and Benchmark

    Authors: Wenxuan Liu, Zixuan Li, Bai Long, Chunmao Zhang, Fenghui Zhang, Zhuo Chen, Wei Li, Yuxin Zuo, Fei Wang, Bingbing Xu, Xuhui Jiang, Jin Zhang, Xiaolong Jin, Jiafeng Guo, Tat-Seng Chua, Xueqi Cheng

    Abstract: Deep Research (DR) requires LLM agents to autonomously perform multi-step information seeking, processing, and reasoning to generate comprehensive reports. In contrast to existing studies that mainly focus on unstructured web content, a more challenging DR task should additionally utilize structured knowledge to provide a solid data foundation, facilitate quantitative computation, and lead to in-d… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  4. arXiv:2604.06685  [pdf, ps, other

    cs.CL cs.AI

    ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding

    Authors: Xuanle Zhao, Xinyuan Cai, Xiang Cheng, Xiuyi Chen, Bo Xu

    Abstract: While Vision-Language Models (VLMs) have demonstrated significant potential in chemical visual understanding, current models are predominantly optimized for direct visual question-answering tasks. This paradigm often results in "black-box" systems that fail to utilize the inherent capability of Large Language Models (LLMs) to infer underlying reaction mechanisms. In this work, we introduce ChemVLR… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: Accepted by ACL 2026 Findings, Preprint Version

  5. arXiv:2604.06633  [pdf, ps, other

    cs.CR cs.CL cs.SE

    Argus: Reorchestrating Static Analysis via a Multi-Agent Ensemble for Full-Chain Security Vulnerability Detection

    Authors: Zi Liang, Qipeng Xie, Jun He, Bohuan Xue, Weizheng Wang, Yuandao Cai, Fei Luo, Boxian Zhang, Haibo Hu, Kaishun Wu

    Abstract: Recent advancements in Large Language Models (LLMs) have sparked interest in their application to Static Application Security Testing (SAST), primarily due to their superior contextual reasoning capabilities compared to traditional symbolic or rule-based methods. However, existing LLM-based approaches typically attempt to replace human experts directly without integrating effectively with existing… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  6. arXiv:2604.05751  [pdf

    eess.SP cs.LG cs.SD

    Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction

    Authors: Mohammed Salah Al-Radhi, Géza Németh, Andon Tchechmedjiev, Binbin Xu

    Abstract: This chapter presents a novel approach to brain-to-speech (BTS) synthesis from intracranial electroencephalography (iEEG) data, emphasizing prosody-aware feature engineering and advanced transformer-based models for high-fidelity speech reconstruction. Driven by the increasing interest in decoding speech directly from brain activity, this work integrates neuroscience, artificial intelligence, and… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: OpenAccess chapter: 10.1007/978-3-032-10561-5_16. In: Curry, E., et al. Artificial Intelligence, Data and Robotics (2026)

  7. arXiv:2604.04771  [pdf, ps, other

    cs.CV cs.CL

    MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

    Authors: Bin Wang, Tianyao He, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Tao Chu, Yuan Qu, Zhenjiang Jin, Weijun Zeng, Ziyang Miao, Bangrui Xu, Junbo Niu, Mengzhang Cai, Jiantao Qiu, Qintong Zhang, Dongsheng Ma, Yuefeng Sun, Hejun Dong, Wenzheng Zhang, Jutao Xiao, Jiayong Shi, Pengyu Liao, Xiaomeng Zhao, Huaping Zhong, Liqun Wei , et al. (18 additional authors not shown)

    Abstract: Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training… ▽ More

    Submitted 9 April, 2026; v1 submitted 6 April, 2026; originally announced April 2026.

    Comments: Technical Report

  8. arXiv:2604.04399  [pdf, ps, other

    cs.AI

    GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

    Authors: Yuwen Zhai, Runze Li, Liang Wang, Nian Shi, Liwu Xu, Wei Zhang, Ran Lin, Bo Xu, Benlei Cui

    Abstract: Evaluating GUI agents presents a distinct challenge: trajectories are long, visually grounded, and open-ended, yet evaluation must be both accurate and interpretable. Existing approaches typically apply a single holistic judgment over the entire action-observation sequence-a strategy that proves unreliable on long-horizon tasks and yields binary verdicts offering no insight into where or why an ag… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  9. arXiv:2604.03610  [pdf, ps, other

    cs.SE

    DebugHarness: Emulating Human Dynamic Debugging for Autonomous Program Repair

    Authors: Maolin Sun, Yibiao Yang, Xuanlin Liu, Yuming Zhou, Baowen Xu

    Abstract: Patching severe security flaws in complex software remains a major challenge. While automated tools like fuzzers efficiently discover bugs, fixing deep-rooted low-level faults (e.g., use-after-free and memory corruption) still requires labor-intensive manual analysis by experts. Emerging Large Language Model (LLM) agents attempt to automate this pipeline, but they typically treat bug fixing as a p… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

    Comments: 15 pages, 6 figures

  10. arXiv:2604.03572  [pdf, ps, other

    cs.CV physics.optics

    Physics-Informed Untrained Learning for RGB-Guided Superresolution Single-Pixel Hyperspectral Imaging

    Authors: Hao Zhang, Bilige Xu, Lichen Wei, Xu Ma, Wenyi Ren

    Abstract: Single-pixel imaging (SPI) offers a cost-effective route to hyperspectral acquisition but struggles to recover high-fidelity spatial and spectral details under extremely low sampling rates, a severely ill-posed inverse problem. While deep learning has shown potential, existing data-driven methods demand large-scale pretraining datasets that are often impractical in hyperspectral imaging. To overco… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 9 pages, 13 figures, 5 tables

  11. arXiv:2604.03559  [pdf, ps, other

    cs.GT eess.SY

    Fair Aggregation in Virtual Power Plants

    Authors: Liudong Chen, Hyemi Kim, Adam N. Elmachtoub, Bolun Xu

    Abstract: A virtual power plant (VPP) is operated by an aggregator that acts as a market intermediary, aggregating consumers to participate in wholesale power markets. By setting incentive prices, the aggregator induces consumers to sell energy and profits by providing this aggregated energy to the market. This supply is enabled by consumers' flexibility to adjust electricity consumption in response to mark… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  12. arXiv:2604.02765  [pdf, ps, other

    cs.LG

    Towards Realistic Class-Incremental Learning with Free-Flow Increments

    Authors: Zhiming Xu, Baile Xu, Jian Zhao, Furao Shen, Suorong Yang

    Abstract: Class-incremental learning (CIL) is typically evaluated under predefined schedules with equal-sized tasks, leaving more realistic and complex cases unexplored. However, a practical CIL system should learns immediately when any number of new classes arrive, without forcing fixed-size tasks. We formalize this setting as Free-Flow Class-Incremental Learning (FFCIL), where data arrives as a more reali… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 15pages, 5figures, 3 tables

  13. arXiv:2604.02349  [pdf, ps, other

    cs.LG cs.AI

    OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

    Authors: Yiqin Yang, Hao Hu, Yihuan Mao, Jin Zhang, Chengjie Wu, Yuhua Jiang, Xu Yang, Runpeng Xie, Yi Fan, Bo Liu, Yang Gao, Bo Xu, Chongjie Zhang

    Abstract: Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in various real-world applications. However, obtaining human feedback for preferences can be expensive and time-consuming, which forms a strong barrier for PbRL. In this work, we address the problem of low query efficiency in offline PbRL, pinpoin… ▽ More

    Submitted 18 February, 2026; originally announced April 2026.

    Journal ref: ICLR-2026

  14. arXiv:2604.01608  [pdf, ps, other

    cs.AI

    From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial?

    Authors: Binyan Xu, Dong Fang, Haitao Li, Kehuan Zhang

    Abstract: Multi-agent systems (MAS) tackle complex tasks by distributing expertise, though this often comes at the cost of heavy coordination overhead, context fragmentation, and brittle phase ordering. Distilling a MAS into a single-agent skill can bypass these costs, but this conversion lacks a principled answer for when and what to distill. Instead, the empirical outcome is surprisingly inconsistent: ski… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

    Comments: 20 pages, 7 figures, 11 tables

  15. arXiv:2603.28686  [pdf, ps, other

    cs.SE

    C2RustXW: Program-Structure-Aware C-to-Rust Translation via Program Analysis and LLM

    Authors: Yanyan Yan, Yang Feng, Jiangshan Liu, Di Liu, Zixi Liu, Hao Teng, Baowen Xu

    Abstract: The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and practical adoption. Moreover, manual post-processing of such outputs is labor-inte… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  16. arXiv:2603.28414  [pdf, ps, other

    cs.CV

    Unified Restoration-Perception Learning: Maritime Infrared-Visible Image Fusion and Segmentation

    Authors: Weichao Cai, Weiliang Huang, Biao Xue, Chao Huang, Fei Yuan, Bob Zhang

    Abstract: Marine scene understanding and segmentation plays a vital role in maritime monitoring and navigation safety. However, prevalent factors like fog and strong reflections in maritime environments cause severe image degradation, significantly compromising the stability of semantic perception. Existing restoration and enhancement methods typically target specific degradations or focus solely on visual… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  17. arXiv:2603.26250  [pdf, ps, other

    cs.CV

    Real-Time Branch-to-Tool Distance Estimation for Autonomous UAV Pruning: Benchmarking Five DEFOM-Stereo Variants from Simulation to Jetson Deployment

    Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

    Abstract: Autonomous tree pruning with unmanned aerial vehicles (UAVs) is a safety-critical real-world task: the onboard perception system must estimate the metric distance from a cutting tool to thin tree branches in real time so that the UAV can approach, align, and actuate the pruner without collision. We address this problem by training five variants of DEFOM-Stereo - a recent foundation-model-based ste… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  18. arXiv:2603.25133  [pdf, ps, other

    cs.AI

    RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following

    Authors: Tianjun Pan, Xuan Lin, Wenyan Yang, Qianyu He, Shisong Chen, Licai Qi, Wanqing Xu, Hongwei Feng, Bo Xu, Yanghua Xiao

    Abstract: Rubric-based evaluation has become a prevailing paradigm for evaluating instruction following in large language models (LLMs). Despite its widespread use, the reliability of these rubric-level evaluations remains unclear, calling for meta-evaluation. However, prior meta-evaluation efforts largely focus on the response level, failing to assess the fine-grained judgment accuracy that rubric-based ev… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: 9 pages, 5 figures

  19. arXiv:2603.24517  [pdf, ps, other

    cs.LG

    AVO: Agentic Variation Operators for Autonomous Evolutionary Search

    Authors: Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, Wei Liu, Fung Xie, Michael Lightstone, Humphrey Shi

    Abstract: Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO instantiates variation as a self-directed agent loop that can consult the curre… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

  20. Trends in Equal-Contribution Authorship: A Large-Scale Bibliometric Analysis of Biomedical Literature

    Authors: Binbin Xu

    Abstract: Equal-contribution authorship, in which two or more authors are designated as having contributed equally, is increasingly common in scientific publishing. Using approximately 480,000 tagged records from PubMed and PMC (2010-2024), we examine temporal trends, journal-level patterns, geographic distributions, and byline positions of equal-contributing authors. Results show a sharp rise after 2017, w… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Journal ref: Quantitative Science Studies, 1-15 (2026)

  21. arXiv:2603.22844  [pdf, ps, other

    cs.AI

    PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

    Authors: Zining Fang, Cheng Xue, Chunhui Liu, Bin Xu, Ming Chen, Xiaowei Hu

    Abstract: Surgical smoke severely degrades intraoperative video quality, obscuring anatomical structures and limiting surgical perception. Existing learning-based desmoking approaches rely on scarce paired supervision and deterministic restoration pipelines, making it difficult to perform exploration or reinforcement-driven refinement under real surgical conditions. We propose PhySe-RPO, a diffusion restora… ▽ More

    Submitted 7 April, 2026; v1 submitted 24 March, 2026; originally announced March 2026.

    Comments: 12 pages,7figures,published to CVPR

  22. arXiv:2603.21475  [pdf, ps, other

    cs.AI

    Unified-MAS: Universally Generating Domain-Specific Nodes for Empowering Automatic Multi-Agent Systems

    Authors: Hehai Lin, Yu Yan, Zixuan Wang, Bo Xu, Sudong Wang, Weiquan Huang, Ruochen Zhao, Minzhi Li, Chengwei Qin

    Abstract: Automatic Multi-Agent Systems (MAS) generation has emerged as a promising paradigm for solving complex reasoning tasks. However, existing frameworks are fundamentally bottlenecked when applied to knowledge-intensive domains (e.g., healthcare and law). They either rely on a static library of general nodes like Chain-of-Thought, which lack specialized expertise, or attempt to generate nodes on the f… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: Code is available at https://github.com/linhh29/Unified-MAS

  23. arXiv:2603.21046  [pdf, ps, other

    cs.CV cs.AI

    SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments

    Authors: Wen Jiang, Kangyao Huang, Li Wang, Wang Xu, Wei Fan, Jinyuan Liu, Shaoyu Liu, Hanfang Liang, Hongwei Duan, Bin Xu, Xiangyang Ji

    Abstract: UAVs play an important role in applications such as autonomous exploration, disaster response, and infrastructure inspection. However, UAV VLN in complex 3D environments remains challenging. A key difficulty is the structural representation mismatch between 2D visual perception and the 3D trajectory decision space, which limits spatial reasoning. To this end, we propose SpatialFly, a geometry-guid… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

  24. arXiv:2603.20637  [pdf, ps, other

    cs.SE cs.AI cs.CR

    AEGIS: From Clues to Verdicts -- Graph-Guided Deep Vulnerability Reasoning via Dialectics and Meta-Auditing

    Authors: Sen Fang, Weiyuan Ding, Zhezhen Cao, Zhou Yang, Bowen Xu

    Abstract: Large Language Models (LLMs) are increasingly adopted for vulnerability detection, yet their reasoning remains fundamentally unsound. We identify a root cause shared by both major mitigation paradigms (agent-based debate and retrieval augmentation): reasoning in an ungrounded deliberative space that lacks a bounded, hypothesis-specific evidence base. Without such grounding, agents fabricate cross-… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: 29 pages, 6 figures, 3 tables

  25. arXiv:2603.19639  [pdf, ps, other

    cs.AI

    HyEvo: Self-Evolving Hybrid Agentic Workflows for Efficient Reasoning

    Authors: Beibei Xu, Yutong Ye, Chuyun Shen, Yingbo Zhou, Cheng Chen, Mingsong Chen

    Abstract: Although agentic workflows have demonstrated strong potential for solving complex tasks, existing automated generation methods remain inefficient and underperform, as they rely on predefined operator libraries and homogeneous LLM-only workflows in which all task-level computation is performed through probabilistic inference. To address these limitations, we propose HyEvo, an automated workflow-gen… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

  26. It Depends: Re_Authoring Play Through Clinical Reasoning in Wearable AR Rehab Games

    Authors: Binyan Xu, Wei Wu, Soonhyeon Kweon, Casper Harteveld, Leanne Chukoskie

    Abstract: Augmented reality games hold promise for rehabilitation, yet most remain confined to laboratory studies with limited clinical uptake. Recent advances in spatial computing, especially lightweight, glasses_form_factor AR, create a timely opportunity to embed rehabilitative play into clinical practice and daily contexts. To investigate this potential, we systematically reviewed 132 applications and c… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Journal ref: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26), April 13_17, 2026, Barcelona, Spain

  27. arXiv:2603.18815  [pdf, ps, other

    cs.AI

    ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

    Authors: Hao Zhang, Mingjie Liu, Shaokun Zhang, Songyang Han, Jian Hu, Zhenghui Jin, Yuchi Zhang, Shizhe Diao, Ximing Lu, Binfeng Xu, Zhiding Yu, Jan Kautz, Yi Dong

    Abstract: Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and mai… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  28. arXiv:2603.18697  [pdf, ps, other

    cs.LG

    OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation

    Authors: Chen Sun, Beilin Xu, Boheng Tan, Jiacheng Wang, Yuefeng Sun, Rite Bo, Ying He, Yaqiang Zang, Pinghua Gong

    Abstract: In industrial commodity recommendation systems, the representation quality of Item-Id vocabularies directly impacts the scalability and generalization ability of recommendation models. A key challenge is that traditional Item-Id vocabularies, when subjected to sparse scaling, suffer from low-frequency information interference, which restricts their expressive power for massive item sets and leads… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 5 pages, 4 figures

  29. arXiv:2603.18539  [pdf, ps, other

    cs.NI cs.LG

    iSatCR: Graph-Empowered Joint Onboard Computing and Routing for LEO Data Delivery

    Authors: Jiangtao Luo, Bingbing Xu, Shaohua Xia, Yongyi Ran

    Abstract: Sending massive Earth observation data produced by low Earth orbit (LEO) satellites back to the ground for processing consumes a large amount of on-orbit bandwidth and exacerbates the space-to-ground link bottleneck. Most prior work has concentrated on optimizing the routing of raw data within the constellation, yet cannot cope with the surge in data volume. Recently, advances in onboard computing… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 14 pages, 9 figures

  30. arXiv:2603.16458  [pdf

    cs.NI eess.SY

    Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and Optimization

    Authors: Linghao Zhang, Haitao Zhao, Bo Xu, Hongbo Zhu, Xianbin Wang

    Abstract: Space-air-ground integrated networks (SAGIN) promise ubiquitous 6G connectivity but face significant resource management challenges due to heterogeneous infrastructure, dynamic topologies, and stringent quality-of-service (QoS) requirements. Conventional model-driven approaches struggle with scalability and adaptability in such complex environments. This paper presents an agentic artificial intell… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: eg.: 7 pages, 6 figures

  31. arXiv:2603.16450  [pdf, ps, other

    cs.DB

    MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning

    Authors: Beicheng Xu, Lingching Tung, Yuchen Wang, Yupeng Lu, Bin Cui

    Abstract: Apache Spark SQL is a cornerstone of modern big data analytics.However,optimizing Spark SQL performance is challenging due to its vast configuration space and the prohibitive cost of evaluating massive workloads. Existing tuning methods predominantly rely on full-fidelity evaluations, which are extremely time-consuming,often leading to suboptimal performance within practical budgets.While multi-fi… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  32. arXiv:2603.16209  [pdf, ps, other

    cs.CE

    Physics-guided diffusion models for inverse design of disordered metamaterials

    Authors: Ziyuan Xie, Weipeng Xu, Dazhi Zhao, Wenchang Zhang, Daoyang Dong, Bingbing Xu, Ning Liu, Sheng Mao, Tianju Xue

    Abstract: Disordered metamaterials are promising for programming physical properties across diverse applications, yet their inverse design remains challenging due to the non-intuitive structure-property relationships and large design spaces. Recent generative approaches, particularly diffusion models, have shown potential in high-dimensional inverse design tasks. However, existing methods typically rely on… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: 30 pages, 13 figures

  33. arXiv:2603.15304  [pdf, ps, other

    cs.CV

    UE5-Forest: A Photorealistic Synthetic Stereo Dataset for UAV Forestry Depth Estimation

    Authors: Yida Lin, Bing Xue, Mengjie Zhang, Sam Schofield, Richard Green

    Abstract: Dense ground-truth disparity maps are practically unobtainable in forestry environments, where thin overlapping branches and complex canopy geometry defeat conventional depth sensors -- a critical bottleneck for training supervised stereo matching networks for autonomous UAV-based pruning. We present UE5-Forest, a photorealistic synthetic stereo dataset built entirely in Unreal Engine 5 (UE5). One… ▽ More

    Submitted 27 March, 2026; v1 submitted 13 March, 2026; originally announced March 2026.

  34. arXiv:2603.14523  [pdf, ps, other

    cs.CV cs.AI cs.RO

    VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning

    Authors: Chaoyang Wang, Wenrui Bao, Sicheng Gao, Bingxin Xu, Yu Tian, Yogesh S. Rawat, Yunhao Ge, Yuzhang Shang

    Abstract: Vision-Language-Action (VLA) models have shown promising capabilities for embodied intelligence, but most existing approaches rely on text-based chain-of-thought reasoning where visual inputs are treated as static context. This limits the ability of the model to actively revisit the environment and resolve ambiguities during long-horizon tasks. We propose VLA-Thinker, a thinking-with-image reasoni… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: We introduce VLA-Thinker, the first VLA model capable of thinking-with-image reasoning, which models visual perception as a dynamically invocable reasoning action, enabling Multimodal Embodied Chain-of-Thought

  35. arXiv:2603.13787  [pdf, ps, other

    cs.CV

    Advancing Cancer Prognosis with Hierarchical Fusion of Genomic, Proteomic and Pathology Imaging Data from a Systems Biology Perspective

    Authors: Junjie Zhou, Bao Xue, Meiling Wang, Wei Shao, Daoqiang Zhang

    Abstract: To enhance the precision of cancer prognosis, recent research has increasingly focused on multimodal survival methods by integrating genomic data and histology images. However, current approaches overlook the fact that the proteome serves as an intermediate layer bridging genomic alterations and histopathological features while providing complementary biological information essential for survival… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

  36. Reimagining Wearable AR Gesture Design: Physical Therapy Reasoning in Everyday Contexts

    Authors: Wei Wu, Binyan Xu, Soonhyeon Kweon, Yujie Wang, Leanne Chukoskie, Casper Harteveld

    Abstract: Lightweight augmented reality (AR) glasses are increasingly entering everyday use, extending interaction design beyond short, isolated sessions. However, most existing gesture vocabularies are inherited from VR headsets or early AR goggles. These systems tend to prioritize recognizer accuracy while overlooking fatigue, sustainability, and social legibility in daily contexts. To address this gap, w… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Journal ref: Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26), April 13--17, 2026, Barcelona, Spain

  37. arXiv:2603.12963  [pdf, ps, other

    cs.CL

    Long-form RewardBench: Evaluating Reward Models for Long-form Generation

    Authors: Hui Huang, Yancheng He, Wei Liu, Muyun Yang, Jiaheng Liu, Kehai Chen, Bing Xu, Conghui Zhu, Hailong Cao, Tiejun Zhao

    Abstract: The widespread adoption of reinforcement learning-based alignment highlights the growing importance of reward models. Various benchmarks have been built to evaluate reward models in various domains and scenarios. However, a significant gap remains in assessing reward models for long-form generation, despite its critical role in real-world applications. To bridge this, we introduce Long-form Reward… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: Accepted by AAAI2026

  38. arXiv:2603.08091  [pdf, ps, other

    cs.CL

    Toward Robust LLM-Based Judges: Taxonomic Bias Evaluation and Debiasing Optimization

    Authors: Hongli Zhou, Hui Huang, Rui Zhang, Kehai Chen, Bing Xu, Conghui Zhu, Tiejun Zhao, Muyun Yang

    Abstract: Large language model (LLM)-based judges are widely adopted for automated evaluation and reward modeling, yet their judgments are often affected by judgment biases. Accurately evaluating these biases is essential for ensuring the reliability of LLM-based judges. However, existing studies typically investigate limited biases under a single judge formulation, either generative or discriminative, lack… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  39. arXiv:2603.07462  [pdf, ps, other

    cs.AI

    Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

    Authors: Binxia Xu, Xiaoliang Luo, Luke Dickens, Robert M. Mok

    Abstract: Determining whether AI systems process information similarly to humans is central to cognitive science and trustworthy AI. While modern AI models match human accuracy on standard tasks, such parity does not guarantee that their underlying decision-making strategies are aligned with human information processing. Assessing performance using i) error alignment metrics to compare how humans and models… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

  40. arXiv:2603.07433  [pdf, ps, other

    cs.LG cs.CV

    Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

    Authors: Suorong Yang, Fangjian Su, Hai Gan, Ziqi Ye, Jie Li, Baile Xu, Furao Shen, Soujanya Poria

    Abstract: Dynamic Data selection aims to accelerate training by prioritizing informative samples during online training. However, existing methods typically rely on task-specific handcrafted metrics or static/snapshot-based criteria to estimate sample importance, limiting scalability across learning paradigms and making it difficult to capture the evolving utility of data throughout training. To address thi… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

  41. Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL

    Authors: Bingfeng Chen, Shaobin Shi, Yongqi Luo, Boyan Xu, Ruichu Cai, Zhifeng Hao

    Abstract: Generative language models have shown significant potential in single-turn Text-to-SQL. However, their performance does not extend equivalently to multi-turn Text-to-SQL. This is primarily due to generative language models' inadequacy in handling the complexities of context information and dynamic schema linking in multi-turn interactions. In this paper, we propose a framework named Track-SQL, whi… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

    Comments: Accepted at the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025), Long Paper, 19 pages

    Journal ref: Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 10690-10708. Association for Computational Linguistics, 2025

  42. arXiv:2603.04800  [pdf, ps, other

    cs.CV

    MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

    Authors: Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao

    Abstract: Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To addre… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026

  43. arXiv:2603.02945  [pdf, ps, other

    cs.CL

    ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation

    Authors: Bo Xu, Haotian Wu, Hehai Lin, Weiquan Huang, Beier Zhu, Yao Shu, Chengwei Qin

    Abstract: Model merging aims to combine multiple task-specific expert models into a single model while preserving generalization across diverse tasks. However, interference among experts, especially when they are trained on different objectives, often leads to significant performance degradation. Despite recent progress, resolving this interference without data access, retraining, or architectural modificat… ▽ More

    Submitted 8 April, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026 (Main Track)

  44. arXiv:2603.02267  [pdf, ps, other

    cs.LG cs.AI

    Boosting Meta-Learning for Few-Shot Text Classification via Label-guided Distance Scaling

    Authors: Yunlong Gao, Xinyue Liu, Yingbo Wang, Linlin Zong, Bo Xu

    Abstract: Few-shot text classification aims to recognize unseen classes with limited labeled text samples. Existing approaches focus on boosting meta-learners by developing complex algorithms in the training stage. However, the labeled samples are randomly selected during the testing stage, so they may not provide effective supervision signals, leading to misclassification. To address this issue, we propose… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

  45. arXiv:2602.24275  [pdf, ps, other

    cs.CV

    Hierarchical Action Learning for Weakly-Supervised Action Segmentation

    Authors: Junxian Huang, Ruichu Cai, Hao Zhu, Juntao Fang, Boyan Xu, Weilin Chen, Zijian Li, Shenghua Gao

    Abstract: Humans perceive actions through key transitions that structure actions across multiple abstraction levels, whereas machines, relying on visual features, tend to over-segment. This highlights the difficulty of enabling hierarchical reasoning in video understanding. Interestingly, we observe that lower-level visual and high-level action latent variables evolve at different rates, with low-level visu… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

    Journal ref: CVPR2026

  46. arXiv:2602.23557  [pdf

    eess.IV cs.AI cs.CV

    Hierarchical Multi-Scale Graph Learning with Knowledge-Guided Attention for Whole-Slide Image Survival Analysis

    Authors: Bin Xu, Yufei Zhou, Boling Song, Jingwen Sun, Yang Bian, Cheng Lu, Ye Wu, Jianfei Tu, Xiangxue Wang

    Abstract: We propose a Hierarchical Multi-scale Knowledge-aware Graph Network (HMKGN) that models multi-scale interactions and spatially hierarchical relationships within whole-slide images (WSIs) for cancer prognostication. Unlike conventional attention-based MIL, which ignores spatial organization, or graph-based MIL, which relies on static handcrafted graphs, HMKGN enforces a hierarchical structure with… ▽ More

    Submitted 2 March, 2026; v1 submitted 26 February, 2026; originally announced February 2026.

    Comments: 4 pages, 1 figure, 2 tables, ISBI 2026

    MSC Class: 68T01 ACM Class: I.5.1; I.5.2; I.5.4

  47. arXiv:2602.23224  [pdf, ps, other

    cs.CV cs.RO

    UniScale: Unified Scale-Aware 3D Reconstruction for Multi-View Understanding via Prior Injection for Robotic Perception

    Authors: Mohammad Mahdavian, Gordon Tan, Binbin Xu, Yuan Ren, Dongfeng Bai, Bingbing Liu

    Abstract: We present UniScale, a unified, scale-aware multi-view 3D reconstruction framework for robotic applications that flexibly integrates geometric priors through a modular, semantically informed design. In vision-based robotic navigation, the accurate extraction of environmental structure from raw image sequences is critical for downstream tasks. UniScale addresses this challenge with a single feed-fo… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

  48. arXiv:2602.23061  [pdf, ps, other

    cs.IR cs.AI cs.CL cs.DB cs.LG

    MoDora: Tree-Based Semi-Structured Document Analysis System

    Authors: Bangrui Xu, Qihang Yao, Zirui Tang, Xuanhe Zhou, Yeye He, Shihan Yu, Qianqian Xu, Bin Wang, Guoliang Li, Conghui He, Fan Wu

    Abstract: Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various and often irregular layouts. These documents are widely observed across domains and account for a large portion of real-world data. However, existing methods struggle to support natural language question answering over these documents due to three main technical… ▽ More

    Submitted 27 February, 2026; v1 submitted 26 February, 2026; originally announced February 2026.

    Comments: Extension of our SIGMOD 2026 paper. Please refer to source code available at https://github.com/weAIDB/MoDora

  49. arXiv:2602.22732  [pdf, ps, other

    cs.IR cs.LG

    Generative Recommendation for Large-Scale Advertising

    Authors: Ben Xue, Dan Liu, Lixiang Wang, Mingjie Sun, Peng Wang, Pengfei Zhang, Shaoyun Shi, Tianyu Xu, Yunhao Sha, Zhiqiang Liu, Bo Kong, Bo Wang, Hang Yang, Jieting Xue, Junhao Wang, Shengyu Wang, Shuping Hui, Wencai Ye, Xiao Lin, Yongzhi Li, Yuhang Chen, Zhihui Yin, Quan Chen, Shiyang Wen, Wenjin Wu , et al. (5 additional authors not shown)

    Abstract: Generative recommendation has recently attracted widespread attention in industry due to its potential for scaling and stronger model capacity. However, deploying real-time generative recommendation in large-scale advertising requires designs beyond large-language-model (LLM)-style training and serving recipes. We present a production-oriented generative recommender co-designed across architecture… ▽ More

    Submitted 1 April, 2026; v1 submitted 26 February, 2026; originally announced February 2026.

    Comments: 13 pages, 6 figures, under review

  50. arXiv:2602.21951  [pdf, ps, other

    cs.CL

    RADAR: Reasoning as Discrimination with Aligned Representations for LLM-based Knowledge Graph Reasoning

    Authors: Bo Xue, Yuan Jin, Luoyi Fu, Jiaxin Ding, Xinbing Wang

    Abstract: Knowledge graph reasoning (KGR) infers missing facts, with recent advances increasingly harnessing the semantic priors and reasoning abilities of Large Language Models (LLMs). However, prevailing generative paradigms are prone to memorizing surface-level co-occurrences rather than learning genuine relational semantics, limiting out-of-distribution generalization. To address this, we propose RADAR,… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.