Skip to main content

Showing 1–50 of 1,502 results for author: Zhang, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.13602  [pdf, ps, other

    cs.LG

    Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

    Authors: Xiaohua Wang, Muzhao Tian, Yuqi Zeng, Zisu Huang, Jiakang Yuan, Bowen Chen, Jingwen Xu, Mingbo Zhou, Wenhao Liu, Muling Wu, Zhengkang Guo, Qi Qian, Yifei Wang, Feiran Zhang, Ruicheng Yin, Shihan Dou, Changze Lv, Tao Chen, Kaitao Song, Xu Tan, Tao Gui, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking, where models exploit imperfections in learned reward signals to maximize proxy objectives without fu… ▽ More

    Submitted 15 April, 2026; originally announced April 2026.

    Comments: 42 pages, 5 figures, 2 tables

  2. MSGS: Multispectral 3D Gaussian Splatting

    Authors: Iris Zheng, Guojun Tang, Alexander Doronin, Paul Teal, Fang-Lue Zhang

    Abstract: We present a multispectral extension to 3D Gaussian Splatting (3DGS) for wavelength-aware view synthesis. Each Gaussian is augmented with spectral radiance, represented via per-band spherical harmonics, and optimized under a dual-loss supervision scheme combining RGB and multispectral signals. To improve rendering fidelity, we perform spectral-to-RGB conversion at the pixel level, allowing richer… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: Published in IEEE ISMAR 2025 Adjunct

    ACM Class: I.3.7; I.4.8; I.2.10

    Journal ref: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR) Adjunct, 2025

  3. arXiv:2604.13333  [pdf, ps, other

    cs.CV cs.GR

    SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting

    Authors: Iris Zheng, Guojun Tang, Alexander Doronin, Paul Teal, Fang-Lue Zhang

    Abstract: We present SSD-GS, a physically-based relighting framework built upon 3D Gaussian Splatting (3DGS) that achieves high-quality reconstruction and photorealistic relighting under novel lighting conditions. In physically-based relighting, accurately modeling light-material interactions is essential for faithful appearance reproduction. However, existing 3DGS-based relighting methods adopt coarse shad… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

    Comments: Accepted to ICLR 2026. Code available at: https://github.com/irisfreesiri/SSD-GS

    ACM Class: I.3.7; I.4.8; I.2.10

  4. arXiv:2604.12648  [pdf, ps, other

    cs.LG cs.AI

    TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

    Authors: Fan Zhang, Shiming Fan, Hua Wang

    Abstract: Despite the recent success of large language models (LLMs) in time-series forecasting, most existing methods still adopt a Deep Synchronous Fusion strategy, where dense interactions between textual and temporal features are enforced at every layer of the network. This design overlooks the inherent granularity mismatch between modalities and leads to what we term semantic perceptual dissonance: hig… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  5. arXiv:2604.12436  [pdf, ps, other

    cs.RO

    D-BDM: A Direct and Efficient Boundary-Based Occupancy Grid Mapping Framework for LiDARs

    Authors: Benxu Tang, Yixi Cai, Fanze Kong, Longji Yin, Fu Zhang

    Abstract: Efficient and scalable 3D occupancy mapping is essential for autonomous robot applications in unknown environments. However, traditional occupancy grid representations suffer from two fundamental limitations. First, explicitly storing all voxels in three-dimensional space leads to prohibitive memory consumption. Second, exhaustive ray casting incurs high update latency. A recent representation all… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  6. arXiv:2604.12374  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

    Authors: NVIDIA, :, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang , et al. (522 additional authors not shown)

    Abstract: We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, a… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  7. arXiv:2604.11615  [pdf, ps, other

    cs.AR cs.AI cs.DC cs.LG

    CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

    Authors: Jinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang

    Abstract: Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design overhead. Tight coupling with the CPU pipeline complicates integration across diverse CPUs, while fine-grained synchronous instructions hinder the development of high-performance kernels. This paper pr… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: Accepted to DAC 2026

  8. arXiv:2604.10551  [pdf, ps, other

    cs.CV

    NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Models: Datasets, Methods and Results

    Authors: Xin Li, Jiachao Gong, Xijun Wang, Shiyao Xiong, Bingchen Li, Suhang Yao, Chao Zhou, Zhibo Chen, Radu Timofte, Yuxiang Chen, Shibo Yin, Yilian Zhong, Yushun Fang, Xilei Zhu, Yahui Wang, Chen Lu, Meisong Zheng, Xiaoxu Chen, Jing Yang, Zhaokun Hu, Jiahui Liu, Ying Chen, Haoran Bai, Sibin Deng, Shengxi Li , et al. (53 additional authors not shown)

    Abstract: This paper presents an overview of the NTIRE 2026 Challenge on Short-form UGC Video Restoration in the Wild with Generative Models. This challenge utilizes a new short-form UGC (S-UGC) video restoration benchmark, termed KwaiVIR, which is contributed by USTC and Kuaishou Technology. It contains both synthetically distorted videos and real-world short-form UGC videos in the wild. For this edition,… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: Accepted by CVPR 2026 workshop; NTIRE 2026

  9. arXiv:2604.10532  [pdf, ps, other

    cs.CV

    The Second Challenge on Real-World Face Restoration at NTIRE 2026: Methods and Results

    Authors: Jingkai Wang, Jue Gong, Zheng Chen, Kai Liu, Jiatong Li, Yulun Zhang, Radu Timofte, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Jiajia Liu, Yingsi Chen, Yijiao Liu, Hui Li, Yu Wang, Congchao Zhu, Alexandru-Gabriel Lefterache, Anamaria Radoi, Chuanyue Yan, Tao Lu, Yanduo Zhang, Kanghui Zhao, Jiaming Wang, Yuqi Li , et al. (28 additional authors not shown)

    Abstract: This paper provides a review of the NTIRE 2026 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural and realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources… ▽ More

    Submitted 15 April, 2026; v1 submitted 12 April, 2026; originally announced April 2026.

    Comments: NTIRE 26: https://cvlai.net/ntire/2026 . NTIRE Real-World Face Restoration: https://ntire-face.github.io/2026/ . CVPR 2026 Workshop

  10. SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models

    Authors: Han Liu, Haotian Gao, Xiaotong Zhang, Changya Li, Feng Zhang, Wei Wang, Fenglong Ma, Hong Yu

    Abstract: Large language models (LLMs) have shown remarkable performance in various domains, but they are constrained by massive computational and storage costs. Quantization, an effective technique for compressing models to fit resource-limited devices while preserving generative quality, encompasses two primary methods: quantization aware training (QAT) and post-training quantization (PTQ). QAT involves a… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: Accepted to KDD 2025. 12 pages, 10 figures

    ACM Class: I.2.7

  11. arXiv:2604.07894  [pdf, ps, other

    cs.CL cs.AI

    TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

    Authors: Xinliang Frederick Zhang, Lu Wang

    Abstract: Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-eff… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  12. arXiv:2604.07809  [pdf, ps, other

    cs.LG cs.AI

    PolicyLong: Towards On-Policy Context Extension

    Authors: Junlong Jia, Ziyang Chen, Xing Wu, Chaochen Gao, TingHao Yu, Feng Zhang, Songlin Hu

    Abstract: Extending LLM context windows is hindered by scarce high-quality long-context data. Recent methods synthesize data with genuine long-range dependencies via information-theoretic verification, selecting contexts that reduce a base model's predictive entropy. However, their single-pass offline construction with a fixed model creates a fundamental off-policy gap: the static screening landscape misali… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: Work in progress. Correspondence to ucaswu@tencent.com or wuxing@iie.ac.cn

  13. arXiv:2604.07720  [pdf, ps, other

    cs.AI

    Towards Knowledgeable Deep Research: Framework and Benchmark

    Authors: Wenxuan Liu, Zixuan Li, Long Bai, Chunmao Zhang, Fenghui Zhang, Zhuo Chen, Wei Li, Yuxin Zuo, Fei Wang, Bingbing Xu, Xuhui Jiang, Jin Zhang, Xiaolong Jin, Jiafeng Guo, Tat-Seng Chua, Xueqi Cheng

    Abstract: Deep Research (DR) requires LLM agents to autonomously perform multi-step information seeking, processing, and reasoning to generate comprehensive reports. In contrast to existing studies that mainly focus on unstructured web content, a more challenging DR task should additionally utilize structured knowledge to provide a solid data foundation, facilitate quantitative computation, and lead to in-d… ▽ More

    Submitted 10 April, 2026; v1 submitted 8 April, 2026; originally announced April 2026.

  14. arXiv:2604.06829  [pdf, ps, other

    cs.CL cs.AI

    WRAP++: Web discoveRy Amplified Pretraining

    Authors: Jiang Zhou, Yunhao Wang, Xing Wu, Tinghao Yu, Feng Zhang

    Abstract: Synthetic data rephrasing has emerged as a powerful technique for enhancing knowledge acquisition during large language model (LLM) pretraining. However, existing approaches operate at the single-document level, rewriting individual web pages in isolation. This confines synthesized examples to intra-document knowledge, missing cross-document relationships and leaving facts with limited associative… ▽ More

    Submitted 9 April, 2026; v1 submitted 8 April, 2026; originally announced April 2026.

    Comments: Work in progress. Correspondence to ucaswu@tencent.com or wuxing@iie.ac.cn

  15. arXiv:2604.06736  [pdf, ps, other

    cs.CL cs.DB

    SQLStructEval: Structural Evaluation of LLM Text-to-SQL Generation

    Authors: Yixi Zhou, Fan Zhang, Zhiqiao Guo, Yu Chen, Haipeng Zhang, Preslav Nakov, Zhuohan Xie

    Abstract: Despite strong performance on Text-to-SQL benchmarks, it remains unclear whether LLM-generated SQL programs are structurally reliable. In this work, we investigate the structural behavior of LLM-generated SQL queries and introduce SQLStructEval, a framework for analyzing program structures through canonical abstract syntax tree (AST) representations. Our experiments on the Spider benchmark show th… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 17 pages, including figures and tables

  16. arXiv:2604.06284  [pdf, ps, other

    cs.CR cs.AI

    ClawLess: A Security Model of AI Agents

    Authors: Hongyi Lu, Nian Liu, Shuai Wang, Fengwei Zhang

    Abstract: Autonomous AI agents powered by Large Language Models can reason, plan, and execute complex tasks, but their ability to autonomously retrieve information and run code introduces significant security risks. Existing approaches attempt to regulate agent behavior through training or prompting, which does not offer fundamental security guarantees. We present ClawLess, a security framework that enforce… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  17. arXiv:2604.06185  [pdf, ps, other

    cs.HC cs.AI cs.CL

    Benchmarking LLM Tool-Use in the Wild

    Authors: Peijie Yu, Wei Liu, Yifan Yang, Jinjian Li, Zelong Zhang, Xiao Feng, Feng Zhang

    Abstract: Fulfilling user needs through Large Language Model multi-turn, multi-step tool-use is rarely a straightforward process. Real user interactions are inherently wild, being intricate, messy, and flexible. We identify three key challenges from user behaviour: compositional tasks that demand efficient orchestration of tool-call topologies, implicit intent spread across dialogue turns that require conte… ▽ More

    Submitted 13 February, 2026; originally announced April 2026.

    Comments: accepted by ICLR 2026

  18. arXiv:2604.05966  [pdf, ps, other

    cs.CL

    FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures

    Authors: Fan Zhang, Mingzi Song, Rania Elbadry, Yankai Chen, Shaobo Wang, Yixi Zhou, Xunwen Zheng, Yueru He, Yuyang Dai, Georgi Georgiev, Ayesha Gull, Muhammad Usman Safder, Fan Wu, Liyuan Meng, Fengxian Ji, Junning Zhao, Xueqing Peng, Jimin Huang, Yu Chen, Xue, Liu, Preslav Nakov, Zhuohan Xie

    Abstract: Financial reporting systems increasingly use large language models (LLMs) to extract and summarize corporate disclosures. However, most assume a single-market setting and do not address structural differences across jurisdictions. Variations in accounting taxonomies, tagging infrastructures (e.g., XBRL vs. PDF), and aggregation conventions make cross-jurisdiction reporting a semantic alignment and… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: 9 pages, including figures and tables

  19. arXiv:2604.05212  [pdf, ps, other

    cs.CV

    Boxer: Robust Lifting of Open-World 2D Bounding Boxes to 3D

    Authors: Daniel DeTone, Tianwei Shen, Fan Zhang, Lingni Ma, Julian Straub, Richard Newcombe, Jakob Engel

    Abstract: Detecting and localizing objects in space is a fundamental computer vision problem. While much progress has been made to solve 2D object detection, 3D object localization is much less explored and far from solved, especially for open-world categories. To address this research challenge, we propose Boxer, an algorithm to estimate static 3D bounding boxes (3DBBs) from 2D open-vocabulary object detec… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    Comments: project page: http://facebookresearch.github.io/boxer

  20. arXiv:2604.05195  [pdf, ps, other

    cs.LG

    Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem

    Authors: Shihong Huang, Shengjie Wang, Lei Gao, Hong Ma, Zhanluo Zhang, Feng Zhang, Weihua Zhou

    Abstract: Unlike traditional homogeneous routing problems, the Heterogeneous Fleet Vehicle Routing Problem (HFVRP) involves heterogeneous fixed costs, variable travel costs, and capacity constraints, rendering solution quality highly sensitive to vehicle selection. Furthermore, real-world logistics applications often impose additional complex constraints, markedly increasing computational complexity. Howeve… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  21. arXiv:2604.04193  [pdf, ps, other

    cs.CR cs.GT

    Perils of Parallelism: Transaction Fee Mechanisms under Execution Uncertainty

    Authors: Sarisht Wadhwa, Aviv Yaish, Fan Zhang, Kartik Nayak

    Abstract: Modern blockchains increasingly rely on parallel execution to improve throughput. We show several industry and academic transaction fee mechanisms (TFMs) struggle to simultaneously account for execution parallelism while remaining performant and fair. First, if parallelism affects fees, adversarial protocol manipulations that offset possible benefits to throughput by introducing fake transactions… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  22. RUQuant: Towards Refining Uniform Quantization for Large Language Models

    Authors: Han Liu, Haotian Gao, Changya Li, Feng Zhang, Xiaotong Zhang, Wei Wang, Hong Yu

    Abstract: The increasing size and complexity of large language models (LLMs) have raised significant challenges in deployment efficiency, particularly under resource constraints. Post-training quantization (PTQ) has emerged as a practical solution by compressing models without requiring retraining. While existing methods focus on uniform quantization schemes for both weights and activations, they often suff… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

    Comments: Accepted to KDD 2026. 12 pages, 9 figures

    ACM Class: I.2.7

  23. arXiv:2604.02692  [pdf, ps, other

    cs.CV

    Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing

    Authors: Fuyuan Liu, Dianyu Yu, He Ren, Nayu Liu, Xiaomian Kang, Delai Qiu, Fa Zhang, Genpeng Zhen, Shengping Liu, Jiaen Liang, Wei Huang, Yining Wang, Junnan Zhu

    Abstract: Accurate document parsing requires both robust content recognition and a stable parser interface. In explicit Document Layout Analysis (DLA) pipelines, downstream parsers do not consume the full detector output. Instead, they operate on a retained and serialized set of layout instances. However, on dense pages with overlapping regions and ambiguous boundaries, unstable layout hypotheses can make t… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  24. arXiv:2604.02346  [pdf, ps, other

    cs.LG cs.AI cs.SE q-bio.BM

    DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery

    Authors: Tianyu Liu, Sihan Jiang, Fan Zhang, Kunyang Sun, Teresa Head-Gordon, Hongyu Zhao

    Abstract: Large language models (LLMs) are in the ascendancy for research in drug discovery, offering unprecedented opportunities to reshape drug research by accelerating hypothesis generation, optimizing candidate prioritization, and enabling more scalable and cost-effective drug discovery pipelines. However there is currently a lack of objective assessments of LLM performance to ascertain their advantages… ▽ More

    Submitted 11 February, 2026; originally announced April 2026.

    Comments: 29 pages, 6 figures

  25. arXiv:2604.02160  [pdf, ps, other

    cs.CV

    CoRegOVCD: Consistency-Regularized Open-Vocabulary Change Detection

    Authors: Weidong Tang, Hanbin Sun, Zihan Li, Yikai Wang, Feifan Zhang

    Abstract: Remote sensing change detection (CD) aims to identify where land-cover semantics change across time, but most existing methods still assume a fixed label space and therefore cannot answer arbitrary user-defined queries. Open-vocabulary change detection (OVCD) instead asks for the change mask of a queried concept. In the fully training-free setting, however, dense concept responses are difficult to… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  26. arXiv:2604.01707  [pdf, ps, other

    cs.CL cs.DB

    Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

    Authors: Yanchen Wu, Tenghui Lin, Yingli Zhou, Fangyuan Zhang, Qintian Guo, Xun Zhou, Sibo Wang, Xilin Liu, Yuchi Ma, Yixiang Fang

    Abstract: Memory emerges as the core module in the large language model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative reasoning and self-evolution. A number of memory methods have been proposed in the literature. However, these methods have not been systematically and comprehensivel… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  27. arXiv:2604.00411  [pdf, ps, other

    cs.CR cs.IT

    Efficient DPF-based Error-Detecting Information-Theoretic Private Information Retrieval Over Rings

    Authors: Pengzhen Ke, Liang Feng Zhang, Huaxiong Wang, Li-Ping Wang

    Abstract: Authenticated private information retrieval (APIR) is the state-of-the-art error-detecting private information retrieval (ED-PIR), using Distributed Point Functions (DPFs) for subpolynomial complexity and privacy. However, its finite field structure restricts it to prime-order DPFs, leading to prohibitively large key sizes under information-theoretic settings, while its dual-DPF-key design introdu… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

    Comments: 15 pages, 4 figures, 2 tables. Accepted for publication in Cybersecurity, in press

  28. arXiv:2604.00234  [pdf, ps, other

    cs.GT

    Blockspace Under Pressure: An Analysis of Spam MEV on High-Throughput Blockchains

    Authors: Wenhao Wang, Aditya Saraf, Lioba Heimbach, Kushal Babel, Fan Zhang

    Abstract: On high-throughput, low-fee blockchains, a qualitatively new form of maximal extractable value (MEV) has emerged: searchers submit large volumes of speculative transactions, whose profitability is resolved only at execution time. We refer to this as spam MEV. On major rollups, it can at times consume more than half of block gas, even though only a small fraction of probes ultimately results in a t… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

  29. arXiv:2603.29931  [pdf, ps, other

    cs.CV

    Gloria: Consistent Character Video Generation via Content Anchors

    Authors: Yuhang Yang, Fan Zhang, Huaijin Pi, Shuai Guo, Guowei Xu, Wei Zhai, Yang Cao, Zheng-Jun Zha

    Abstract: Digital characters are central to modern media, yet generating character videos with long-duration, consistent multi-view appearance and expressive identity remains challenging. Existing approaches either provide insufficient context to preserve identity or leverage non-character-centric information as the memory, leading to suboptimal consistency. Recognizing that character video generation inher… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR2026 Main, project: https://yyvhang.github.io/Gloria_Page/

  30. arXiv:2603.29828  [pdf, ps, other

    cs.AI cs.CL

    Owl-AuraID 1.0: An Intelligent System for Autonomous Scientific Instrumentation and Scientific Data Analysis

    Authors: Han Deng, Anqi Zou, Hanling Zhang, Ben Fei, Chengyu Zhang, Haobo Wang, Xinru Guo, Zhenyu Li, Xuzhu Wang, Peng Yang, Fujian Zhang, Weiyu Guo, Xiaohong Shao, Zhaoyang Liu, Shixiang Tang, Zhihui Wang, Wanli Ouyang

    Abstract: Scientific discovery increasingly depends on high-throughput characterization, yet automation is hindered by proprietary GUIs and the limited generalizability of existing API-based systems. We present Owl-AuraID, a software-hardware collaborative embodied agent system that adopts a GUI-native paradigm to operate instruments through the same interfaces as human experts. Its skill-centric framework… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

    Comments: 17 pages

  31. arXiv:2603.29224  [pdf, ps, other

    cs.LG cs.AI

    Derived Fields Preserve Fine-Scale Detail in Budgeted Neural Simulators

    Authors: Wenshuo Wang, Fan Zhang

    Abstract: Fine-scale-faithful neural simulation under fixed storage budgets remains challenging. Many existing methods reduce high-frequency error by improving architectures, training objectives, or rollout strategies. However, under budgeted coarsen-quantize-decode pipelines, fine detail can already be lost when the carried state is constructed. In the canonical periodic incompressible Navier-Stokes settin… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  32. Silent Guardians: Independent and Secure Decision Tree Evaluation Without Chatter

    Authors: Jinyuan Li, Liang Feng Zhang

    Abstract: As machine learning as a service (MLaaS) gains increasing popularity, it raises two critical challenges: privacy and verifiability. For privacy, clients are reluctant to disclose sensitive private information to access MLaaS, while model providers must safeguard their proprietary models. For verifiability, clients lack reliable mechanisms to ensure that cloud servers execute model inference correc… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: accepted by IEEE TDSC

  33. arXiv:2603.27913  [pdf, ps, other

    cs.CV

    Spatial Orthogonal Refinement for Robust RGB-Event Visual Object Tracking

    Authors: Dexing Huang, Shiao Wang, Fan Zhang, Xiao Wang

    Abstract: Robust visual object tracking (VOT) remains challenging in high-speed motion scenarios, where conventional RGB sensors suffer from severe motion blur and performance degradation. Event cameras, with microsecond temporal resolution and high dynamic range, provide complementary structural cues that can potentially compensate for these limitations. However, existing RGB-Event fusion methods typically… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: Joint International Conference on Automation-Intelligence-Safety and International Symposium on Autonomous Systems 2026 (ICAIS and ISAS 2026)

  34. arXiv:2603.26125  [pdf, ps, other

    cs.IT eess.SP

    CL-SEC: Cross-Layer Semantic Error Correction Empowered by Language Models

    Authors: Yirun Wang, Yuyang Du, Soung Chang Liew, Yuchen Pan, Feifan Zhang, Lihao Zhang

    Abstract: Achieving reliable communication has long been a fundamental challenge in networked systems. Semantic Error Correction (SEC) leverages the semantic understanding capabilities of language models (LMs) to perform application-layer error correction, complementing conventional channel decoding. While promising, existing SEC approaches rely solely on context captured by LMs at the application layer, ig… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  35. arXiv:2603.25399  [pdf, ps, other

    cs.CV cs.RO

    LaMP: Learning Vision-Language-Action Policies with 3D Scene Flow as Latent Motion Prior

    Authors: Xinkai Wang, Chenyi Wang, Yifu Xu, Mingzhe Ye, Fu-Cheng Zhang, Jialin Tian, Xinyu Zhan, Lifeng Zhu, Cewu Lu, Lixin Yang

    Abstract: We introduce \textbf{LaMP}, a dual-expert Vision-Language-Action framework that embeds dense 3D scene flow as a latent motion prior for robotic manipulation. Existing VLA models regress actions directly from 2D semantic visual features, forcing them to learn complex 3D physical interactions implicitly. This implicit learning strategy degrades under unfamiliar spatial dynamics. LaMP addresses this… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  36. arXiv:2603.25025  [pdf, ps, other

    cs.AI

    System-Anchored Knee Estimation for Low-Cost Context Window Selection in PDE Forecasting

    Authors: Wenshuo Wang, Fan Zhang

    Abstract: Autoregressive neural PDE simulators predict the evolution of physical fields one step at a time from a finite history, but low-cost context-window selection for such simulators remains an unformalized problem. Existing approaches to context-window selection in time-series forecasting include exhaustive validation, direct low-cost search, and system-theoretic memory estimation, but they are either… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  37. arXiv:2603.23937  [pdf, ps, other

    cs.CL cs.LG

    Dialogue to Question Generation for Evidence-based Medical Guideline Agent Development

    Authors: Zongliang Ji, Ziyang Zhang, Xincheng Tan, Matthew Thompson, Anna Goldenberg, Carl Yang, Rahul G. Krishnan, Fan Zhang

    Abstract: Evidence-based medicine (EBM) is central to high-quality care, but remains difficult to implement in fast-paced primary care settings. Physicians face short consultations, increasing patient loads, and lengthy guideline documents that are impractical to consult in real time. To address this gap, we investigate the feasibility of using large language models (LLMs) as ambient assistants that surface… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

    Comments: 9 pages. To appear in Proceedings of Machine Learning Research (PMLR), Machine Learning for Health (ML4H) Symposium 2025

    Journal ref: Proceedings of Machine Learning Research 2025

  38. arXiv:2603.22641  [pdf, ps, other

    cs.CV

    Q-Tacit: Image Quality Assessment via Latent Visual Reasoning

    Authors: Yuxuan Jiang, Yixuan Li, Hanwei Zhu, Siyue Teng, Fan Zhang, David Bull

    Abstract: Vision-Language Model (VLM)-based image quality assessment (IQA) has been significantly advanced by incorporating Chain-of-Thought (CoT) reasoning. Recent work has refined image quality reasoning by applying reinforcement learning (RL) and leveraging active visual tools. However, such strategies are typically language-centric, with visual information being treated as static preconditions. Quality-… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

  39. arXiv:2603.22531  [pdf, ps, other

    cs.CV

    UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

    Authors: Kaizhen Tan, Fan Zhang

    Abstract: Sidewalk width is an important indicator of pedestrian accessibility, comfort, and network quality, yet large-scale width data remain scarce in most cities. Existing approaches typically rely on costly field surveys, high-resolution overhead imagery, or simplified geometric assumptions that limit scalability or introduce systematic error. To address this gap, we present UrbanVGGT, a measurement pi… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

  40. arXiv:2603.22264  [pdf, ps, other

    cs.RO

    UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos

    Authors: Gu Zhang, Qicheng Xu, Haozhe Zhang, Jianhan Ma, Long He, Yiming Bao, Zeyu Ping, Zhecheng Yuan, Chenhao Lu, Chengbo Yuan, Tianhai Liang, Xiaoyu Tian, Maanping Shao, Feihong Zhang, Mingyu Ding, Yang Gao, Hao Zhao, Hang Zhao, Huazhe Xu

    Abstract: Dexterous manipulation remains challenging due to the cost of collecting real-robot teleoperation data, the heterogeneity of hand embodiments, and the high dimensionality of control. We present UniDex, a robot foundation suite that couples a large-scale robot-centric dataset with a unified vision-language-action (VLA) policy and a practical human-data capture setup for universal dexterous hand con… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  41. Memory-Efficient Boundary Map for Large-Scale Occupancy Grid Mapping

    Authors: Benxu Tang, Yunfan Ren, Yixi Cai, Fanze Kong, Wenyi Liu, Fangcheng Zhu, Longji Yin, Liuyu Shi, Fu Zhang

    Abstract: Determining the occupancy status of locations in the environment is a fundamental task for safety-critical robotic applications. Traditional occupancy grid mapping methods subdivide the environment into a grid of voxels, each associated with one of three occupancy states: free, occupied, or unknown. These methods explicitly maintain all voxels within the mapped volume and determine the occupancy s… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Journal ref: Benxu Tang, et al. The International Journal of Robotics Research, published online 2026

  42. arXiv:2603.21530  [pdf, ps, other

    cs.SE cs.AI

    LLM-Based Test Case Generation in DBMS through Monte Carlo Tree Search

    Authors: Yujia Chen, Yingli Zhou, Fangyuan Zhang, Cuiyun Gao

    Abstract: Database Management Systems (DBMSs) are fundamental infrastructure for modern data-driven applications, where thorough testing with high-quality SQL test cases is essential for ensuring system reliability. Traditional approaches such as fuzzing can be effective for specific DBMSs, but adapting them to different proprietary dialects requires substantial manual effort. Large Language Models (LLMs) p… ▽ More

    Submitted 22 March, 2026; originally announced March 2026.

    Comments: Accepted to ICSE 2026 Industry Challenge Track

  43. arXiv:2603.18988  [pdf, ps, other

    cs.RO

    MERGE: Guided Vision-Language Models for Multi-Actor Event Reasoning and Grounding in Human-Robot Interaction

    Authors: Joerg Deigmoeller, Nakul Agarwal, Stephan Hasler, Daniel Tanneberg, Anna Belardinelli, Reza Ghoddoosian, Chao Wang, Felix Ocker, Fan Zhang, Behzad Dariush, Michael Gienger

    Abstract: We introduce MERGE, a system for situational grounding of actors, objects, and events in dynamic human-robot group interactions. Effective collaboration in such settings requires consistent situational awareness, built on persistent representations of people and objects and an episodic abstraction of events. MERGE achieves this by uniquely identifying physical instances of actors (humans or robots… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  44. arXiv:2603.18606  [pdf, ps, other

    cs.SE

    SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization

    Authors: Lei Yu, Peng Wang, Jingyuan Zhang, Xin Wang, Jia Xu, Li Yang, Changzhi Deng, Jiajia Ma, Fengjun Zhang

    Abstract: SQL query comprehension is a significant challenge due to complex syntax, diverse join types, and deep nesting. Many queries lack adequate comments, severely hindering code readability, maintainability, and knowledge transfer. Automated SQL comment generation faces two main challenges: limited datasets that inadequately represent complex real-world queries, and Large Language Models' (LLMs) insuff… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Accepted to ICPC 2026

  45. arXiv:2603.18555  [pdf, ps, other

    cs.RO

    Inductance-Based Force Self-Sensing in Fiber-Reinforced Pneumatic Twisted-and-Coiled Actuators

    Authors: Yunsong Zhang, Tianlin Li, Mingyang Yang, Feitian Zhang

    Abstract: Fiber-reinforced pneumatic twisted-and-coiled actuators (FR-PTCAs) offer high power density and compliance but their strong hysteresis and lack of intrinsic proprioception limit effective closed-loop control. This paper presents a self-sensing FR-PTCA integrated with a conductive nickel wire that enables intrinsic force estimation and indirect displacement inference via inductance feedback. Experi… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  46. arXiv:2603.18237  [pdf, ps, other

    cs.LG cs.AI

    Gradient-Informed Temporal Sampling Improves Rollout Accuracy in PDE Surrogate Training

    Authors: Wenshuo Wang, Fan Zhang

    Abstract: Researchers train neural simulators on uniformly sampled numerical simulation data. But under the same budget, does systematically sampled data provide the most effective information? A fundamental yet unformalized problem is how to sample training data for neural simulators so as to maximize rollout accuracy. Existing data sampling methods either tend to collapse into locally high-information-den… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  47. arXiv:2603.17944  [pdf, ps, other

    cs.CV

    TransText: Alpha-as-RGB Representation for Transparent Text Animation

    Authors: Fei Zhang, Zijian Zhou, Bohao Tang, Sen He, Hang Li, Zhe Wang, Soubhik Sanyal, Pengfei Liu, Viktar Atliha, Tao Xiang, Frost Xu, Semih Gunel

    Abstract: We introduce the first method, to the best of our knowledge, for adapting image-to-video models to layer-aware text (glyph) animation, a capability critical for practical dynamic visual design. Existing approaches predominantly handle the transparency-encoding (alpha channel) as an extra latent dimension appended to the RGB space, necessitating the reconstruction of the underlying RGB-centric vari… ▽ More

    Submitted 19 March, 2026; v1 submitted 18 March, 2026; originally announced March 2026.

    Comments: 19 pages, publication review

  48. arXiv:2603.15797  [pdf, ps, other

    cs.LG cs.AI

    OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning

    Authors: Hao Wu, Yongheng Zhang, Yuan Gao, Fan Xu, Fan Zhang, Ruobing Xie, Ruijian Gou, Yuxuan Liang, Xiaomeng Huang, Xian Wu

    Abstract: Large Language Models (LLMs) have demonstrated exceptional logical reasoning capabilities but frequently struggle with the continuous spatiotemporal dynamics governed by Partial Differential Equations (PDEs), often resulting in non-physical hallucinations. Existing approaches typically resort to costly, domain-specific fine-tuning, which severely limits cross-domain generalization and interpretabi… ▽ More

    Submitted 18 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  49. arXiv:2603.14297  [pdf, ps, other

    cs.CV

    RL-ScanIQA: Reinforcement-Learned Scanpaths for Blind 360°Image Quality Assessment

    Authors: Yujia Wang, Yuyan Li, Jiuming Liu, Fang-Lue Zhang, Xinhu Zheng, Neil. A Dodgson

    Abstract: Blind 360°image quality assessment (IQA) aims to predict perceptual quality for panoramic images without a pristine reference. Unlike conventional planar images, 360°content in immersive environments restricts viewers to a limited viewport at any moment, making viewing behaviors critical to quality perception. Although existing scanpath-based approaches have attempted to model viewing behaviors by… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  50. arXiv:2603.13961  [pdf, ps, other

    cs.CV

    USIS-PGM: Photometric Gaussian Mixtures for Underwater Salient Instance Segmentation

    Authors: Lin Hong, Xiangtong Yao, Mürüvvet Bozkurt, Xin Wang, Fumin Zhang

    Abstract: Underwater salient instance segmentation (USIS) is crucial for marine robotic systems, as it enables both underwater salient object detection and instance-level mask prediction for visual scene understanding. Compared with its terrestrial counterpart, USIS is more challenging due to the underwater image degradation. To address this issue, this paper proposes USIS-PGM, a single-stage framework for… ▽ More

    Submitted 16 March, 2026; v1 submitted 14 March, 2026; originally announced March 2026.