Skip to main content

Showing 1–50 of 574 results for author: Xiao, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.08404  [pdf, ps, other

    cs.LG stat.ML

    Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization

    Authors: Simon Zhang, Ryan P. DeMilt, Kun Jin, Cathy H. Xia

    Abstract: Out-of-distribution (OoD) generalization occurs when representation learning encounters a distribution shift. This occurs frequently in practice when training and testing data come from different environments. Covariate shift is a type of distribution shift that occurs only in the input data, while the concept distribution stays invariant. We propose RIA - Regularization for Invariance with Advers… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: 21 pages, 3 figures, accepted at ICML SCIS 2023

  2. arXiv:2604.08042  [pdf, ps, other

    cs.CV cs.AI

    3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

    Authors: Hongcan Xiao, Xinyue Xiao, Yilin Wang, Yue Zhang, Yonggang Qi

    Abstract: Sketching in 3D space enables expressive reasoning about shape, structure, and spatial relationships, yet generating 3D sketches through natural language remains a major challenge. In this work, we introduce 3DrawAgent, a training-free, language-driven framework for 3D sketch generation that leverages large language models (LLMs) to sequentially draw 3D Bezier curves under geometric feedback. Unli… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: CVPR 2026 Highlight

  3. arXiv:2604.04729  [pdf, ps, other

    cs.GT

    A Complete Characterization of Convexity in Flow Games

    Authors: Han Xiao, Luying Zhang, Qizhi Fang

    Abstract: We investigate the convexity of cooperative games arising from network flow problems. While it is well-known that flow games are totally balanced, a complete characterization of their convexity has remained an open problem. In this paper, we provide a necessary and sufficient characterization of the networks that induce convex flow games. We show that a flow game is convex if and only if the under… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    MSC Class: 05C57; 91A12; 91A43; 91A46

  4. arXiv:2604.04630  [pdf, ps, other

    cs.CV

    Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers

    Authors: Jiancheng Wang, Lidan Liang, Yong Wang, Zengzhen Su, Haifeng Xia, Yuanting Yan, Wei Wang

    Abstract: Visual language model (VLM) is rapidly being integrated into safety-critical systems such as autonomous driving, making it an important attack surface for potential backdoor attacks. Existing backdoor attacks mainly rely on unimodal, explicit, and easily detectable triggers, making it difficult to construct both covert and stable attack channels in autonomous driving scenarios. GLA introduces two… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

    Comments: This is a submission to the "Pattern Analysis and Applications". The manuscript includes 14 pages and 6 figures. All authors have approved the submission, and there is no conflict of interest to declare

  5. arXiv:2604.03120  [pdf, ps, other

    cs.CV cs.RO

    SCC-Loc: A Unified Semantic Cascade Consensus Framework for UAV Thermal Geo-Localization

    Authors: Xiaoran Zhang, Yu Liu, Jinyu Liang, Kangqiushi Li, Zhiwei Huang, Huaxin Xiao

    Abstract: Cross-modal Thermal Geo-localization (TG) provides a robust, all-weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, profound thermal-visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse-to-fine registration. To dismantle this bottleneck, we propose SCC-Loc, a unified Sema… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

    Comments: 15 pages, 4 figures. Submitted to IEEE J-STARS

  6. arXiv:2603.30016  [pdf, ps, other

    cs.CR cs.AI

    Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

    Authors: Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh

    Abstract: AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates a… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

  7. arXiv:2603.26532  [pdf, ps, other

    cs.IT

    Security-Spectral Efficiency Tradeoff in STAR-RIS RSMA: A Max-Min Fairness Framework

    Authors: Huiyun Xia, Yijie Mao, Sai Xu, Shuai Han, Hongbo Zhu

    Abstract: Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) enable full-space coverage but also expose wireless transmissions to security from multiple spatial directions. This paper investigates a STAR-RIS-assisted secure RSMA system where both internal and external eavesdroppers may coexist in the transmission and reflection regions. In such a scenario, the RSMA co… ▽ More

    Submitted 31 March, 2026; v1 submitted 27 March, 2026; originally announced March 2026.

  8. arXiv:2603.25373  [pdf, ps, other

    cs.LG

    Hessian-informed machine learning interatomic potential towards bridging theory and experiments

    Authors: Bangchen Yin, Jian Ouyang, Zhen Fan, Kailai Lin, Hanshi Hu, Dingshun Lv, Weiluo Ren, Hai Xiao, Ji Chen, Changsu Cao

    Abstract: Local curvature of potential energy surfaces is critical for predicting certain experimental observables of molecules and materials from first principles, yet it remains far beyond reach for complex systems. In this work, we introduce a Hessian-informed Machine Learning Interatomic Potential (Hi-MLIP) that captures such curvature reliably, thereby enabling accurate analysis of associated thermodyn… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: 13 pages, 4 figures

  9. arXiv:2603.23491  [pdf, ps, other

    cs.CV

    Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

    Authors: Brian Chao, Lior Yariv, Howard Xiao, Gordon Wetzstein

    Abstract: Diffusion and flow matching models have unlocked unprecedented capabilities for creative content creation, such as interactive image and streaming video generation. The growing demand for higher resolutions, frame rates, and context lengths, however, makes efficient generation increasingly challenging, as computational complexity grows quadratically with the number of generated tokens. Our work se… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: Project website at https://bchao1.github.io/foveated-diffusion

  10. Cross-Granularity Representations for Biological Sequences: Insights from ESM and BiGCARP

    Authors: Hanlin Xiao, Rainer Breitling, Eriko Takano, Mauricio A. Álvarez

    Abstract: Recent advances in general-purpose foundation models have stimulated the development of large biological sequence models. While natural language shows symbolic granularity (characters, words, sentences), biological sequences exhibit hierarchical granularity whose levels (nucleotides, amino acids, protein domains, genes) further encode biologically functional information. In this paper, we investig… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: 9 pages, 4 figures, published in 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

    Journal ref: Proc. IEEE BIBM (2025) 6936-6943

  11. arXiv:2603.18062  [pdf, ps, other

    cs.CV cs.AI

    S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition

    Authors: Naichuan Zheng, Hailun Xia, Zepeng Sun, Weiyi Li, Yujia Wang

    Abstract: Skeleton-based action recognition is crucial for multimedia applications but heavily relies on power-hungry Artificial Neural Networks (ANNs), limiting their deployment on resource-constrained edge devices. Spiking Neural Networks (SNNs) provide an energy-efficient alternative; however, existing spiking models for skeleton data often compromise the intrinsic sparsity of SNNs by resorting to dense… ▽ More

    Submitted 19 March, 2026; v1 submitted 17 March, 2026; originally announced March 2026.

  12. arXiv:2603.17354  [pdf, ps, other

    cs.LG cs.CL

    Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

    Authors: Hengyuan Zhang, Xinrong Chen, Zunhai Su, Xiao Liang, Jing Xiong, Wendong Xu, He Xiao, Chaofan Tao, Wei Zhang, Ruobing Xie, Lei Jiang, Hayden Kwok-Hay So, Ngai Wong

    Abstract: Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules uniformly and rely on a single numerical property when estimating sensitivity, overlooking their distinct operational roles and structural characteristics. To address… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  13. arXiv:2603.16455  [pdf, ps, other

    cs.CV

    Evo-Retriever: LLM-Guided Curriculum Evolution with Viewpoint-Pathway Collaboration for Multimodal Document Retrieval

    Authors: Weiqing Li, Jinyue Guo, Yaqi Wang, Haiyang Xiao, Yuewei Zhang, Guohua Liu, Hao Henry Wang

    Abstract: Visual-language models (VLMs) excel at data mappings, but real-world document heterogeneity and unstructuredness disrupt the consistency of cross-modal embeddings. Recent late-interaction methods enhance image-text alignment through multi-vector representations, yet traditional training with limited samples and static strategies cannot adapt to the model's dynamic evolution, causing cross-modal re… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR2026

  14. arXiv:2603.14422  [pdf, ps, other

    cs.LG cs.AI cs.IR

    MBD: A Model-Based Debiasing Framework Across User, Content, and Model Dimensions

    Authors: Yuantong Li, Lei Yuan, Zhihao Zheng, Weimiao Wu, Songbin Liu, Jeong Min Lee, Ali Selman Aydin, Shaofeng Deng, Junbo Chen, Xinyi Zhang, Hongjing Xia, Sam Fieldman, Matthew Kosko, Wei Fu, Du Zhang, Peiyu Yang, Albert Jin Chung, Xianlei Qiu, Miao Yu, Zhongwei Teng, Hao Chen, Sunny Baek, Hui Tang, Yang Lv, Renze Wang , et al. (5 additional authors not shown)

    Abstract: Modern recommendation systems rank candidates by aggregating multiple behavioral signals through a value model. However, many commonly used signals are inherently affected by heterogeneous biases. For example, watch time naturally favors long-form content, loop rate favors short - form content, and comment probability favors videos over images. Such biases introduce two critical issues: (1) value… ▽ More

    Submitted 15 March, 2026; originally announced March 2026.

  15. arXiv:2603.13920  [pdf

    cond-mat.mtrl-sci cs.LG

    Generative Inverse Design of Cold Metals for Low-Power Electronics

    Authors: Kedeng Wu, Yucheng Zhu, Yan Chen, Bizhu Zhang, Shuyu Liu, Xiaobin Deng, Yabei Wu, Liangliang Zhu, Hang Xiao

    Abstract: Cold metals are a class of metals with an intrinsic energy gap located close to the Fermi level, which enables cold-carrier injection for steep-slope transistors and is therefore promising for low-power electronic applications. High-throughput screening has revealed 252 three-dimensional (3D) cold metals in the Materials Project database, but database searches are inherently limited to known compo… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

  16. arXiv:2603.11927  [pdf, ps, other

    cs.MA

    CogSearch: A Cognitive-Aligned Multi-Agent Framework for Proactive Decision Support in E-Commerce Search

    Authors: Zhouwei Zhai, Mengxiang Chen, Haoyun Xia, Jin Li, Renquan Zhou, Min Yang

    Abstract: Modern e-commerce search engines, largely rooted in passive retrieval-and-ranking models, frequently fail to support complex decision-making, leaving users overwhelmed by cognitive friction. In this paper, we introduce CogSearch, a novel cognitive-oriented multi-agent framework that reimagines e-commerce search as a proactive decision support system. By synergizing four specialized agents, CogSear… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  17. arXiv:2603.10619  [pdf, ps, other

    cs.CL

    Disentangling Similarity and Relatedness in Topic Models

    Authors: Hanlin Xiao, Mauricio A. Álvarez, Rainer Breitling

    Abstract: The recent advancement of large language models has spurred a growing trend of integrating pre-trained language model (PLM) embeddings into topic models, fundamentally reshaping how topics capture semantic structure. Classical models such as Latent Dirichlet Allocation (LDA) derive topics from word co-occurrence statistics, whereas PLM-augmented models anchor these statistics to pre-trained embedd… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: 22 pages, 6 figures, 14 tables

  18. arXiv:2603.08322  [pdf, ps, other

    cs.AI cs.HC math.CO

    Agentic Neurosymbolic Collaboration for Mathematical Discovery: A Case Study in Combinatorial Design

    Authors: Hai Xia, Carla P. Gomes, Bart Selman, Stefan Szeider

    Abstract: We study mathematical discovery through the lens of neurosymbolic reasoning, where an AI agent powered by a large language model (LLM), coupled with symbolic computation tools, and human strategic direction, jointly produced a new result in combinatorial design theory. The main result of this human-AI collaboration is a tight lower bound on the imbalance of Latin squares for the notoriously diffic… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  19. arXiv:2603.08013  [pdf, ps, other

    cs.AI

    PIRA-Bench: A Transition from Reactive GUI Agents to GUI-based Proactive Intent Recommendation Agents

    Authors: Yuxiang Chai, Shunye Tang, Han Xiao, Rui Liu, Hongsheng Li

    Abstract: Current Graphical User Interface (GUI) agents operate primarily under a reactive paradigm: a user must provide an explicit instruction for the agent to execute a task. However, an intelligent AI assistant should be proactive, which is capable of anticipating user intentions directly from continuous visual inputs, such as mobile or desktop screenshots, and offering timely recommendations without ex… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  20. arXiv:2603.06600  [pdf, ps, other

    cs.LG cs.AI

    FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

    Authors: Jiajun Xu, Jiageng Mao, Ang Qi, Weiduo Yuan, Alexander Romanus, Helen Xia, Vitor Campagnolo Guizilini, Yue Wang

    Abstract: Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is critical for ensuring the reliability and safety of AI systems. In this paper, we propose an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities. The core of this approach lies in fuzz testing and reinforce… ▽ More

    Submitted 17 February, 2026; originally announced March 2026.

    Comments: 18 pages, 4 figures. † These authors jointly supervised this work: Jiageng Mao and Yue Wang

  21. VizCrit: Exploring Strategies for Displaying Computational Feedback in a Visual Design Tool

    Authors: Mingyi Li, Mengyi Chen, Sarah Luo, Yining Cao, Haijun Xia, Maitraye Das, Steven P. Dow, Jane L. E

    Abstract: Visual design instructors often provide multi-modal feedback, mixing annotations with text. Prior theory emphasizes the importance of actionable feedback, where "actionability" lies on a spectrum--from surfacing relevant design concepts to suggesting concrete fixes. How might creativity tools implement annotations that support such feedback, and how does the actionability of feedback impact novice… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  22. arXiv:2603.04035  [pdf, ps, other

    cs.LG

    mlx-vis: GPU-Accelerated Dimensionality Reduction and Visualization on Apple Silicon

    Authors: Han Xiao

    Abstract: mlx-vis implements eight dimensionality reduction methods -- UMAP, t-SNE, PaCMAP, LocalMAP, TriMap, DREAMS, CNE, MMAE -- and NNDescent k-NN graph construction entirely in MLX for Apple Silicon Metal GPU. A built-in GPU renderer produces scatter plots and smooth animations via hardware H.264 encoding. On Fashion-MNIST (70K points, M3 Ultra), seven of eight methods embed in 2.0-4.7s and render 800-f… ▽ More

    Submitted 20 March, 2026; v1 submitted 4 March, 2026; originally announced March 2026.

    Comments: 8 pages, 8 figures. Software: https://github.com/hanxiao/mlx-vis. v3: VRAM optimization, updated benchmarks, added LocalMAP and MMAE methods

  23. arXiv:2603.01499  [pdf, ps, other

    cs.CR cs.AI

    Towards Privacy-Preserving LLM Inference via Covariant Obfuscation (Technical Report)

    Authors: Yu Lin, Qizhi Zhang, Wenqiang Ruan, Daode Zhang, Jue Hong, Ye Wu, Hanning Xia, Yunlong Mao, Sheng Zhong

    Abstract: The rapid development of large language models (LLMs) has driven the widespread adoption of cloud-based LLM inference services, while also bringing prominent privacy risks associated with the transmission and processing of private data in remote inference. For privacy-preserving LLM inference technologies to be practically applied in industrial scenarios, three core requirements must be satisfied… ▽ More

    Submitted 30 March, 2026; v1 submitted 2 March, 2026; originally announced March 2026.

  24. arXiv:2603.00610  [pdf, ps, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

    Authors: Yinghao Ma, Haiwen Xia, Hewei Gao, Weixiong Chen, Yuxin Ye, Yuchen Yang, Sungkyun Chang, Mingshuo Ding, Yizhi Li, Ruibin Yuan, Simon Dixon, Emmanouil Benetos

    Abstract: While music generation models have evolved to handle complex multimodal inputs mixing text, lyrics, and reference audio, evaluation mechanisms have lagged behind. In this paper, we bridge this critical gap by establishing a comprehensive ecosystem for music reward modeling under Compositional Multimodal Instruction (CMI), where the generated music may be conditioned on text descriptions, lyrics, a… ▽ More

    Submitted 4 March, 2026; v1 submitted 28 February, 2026; originally announced March 2026.

  25. arXiv:2603.00575  [pdf, ps, other

    cs.AI cs.SE

    SWE-Hub: A Unified Production System for Scalable, Executable Software Engineering Tasks

    Authors: Yucheng Zeng, Shupeng Li, Daxiang Dong, Ruijie Xu, Zimo Chen, Liwei Zheng, Yuxuan Li, Zhe Zhou, Haotian Zhao, Lun Tian, Heng Xiao, Tianshu Zhu, Longkun Hao, Jianmin Wu

    Abstract: Progress in software-engineering agents is increasingly constrained by the scarcity of executable, scalable, and realistic data for training and evaluation. This scarcity stems from three fundamental challenges in existing pipelines: environments are brittle and difficult to reproduce across languages; synthesizing realistic, system-level bugs at scale is computationally expensive; and existing da… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

  26. arXiv:2602.23490  [pdf, ps, other

    cs.HC

    Tidynote: Always-Clear Notebook Authoring

    Authors: Ruanqianqian Huang, Brian Hempel, Yining Cao, James D. Hollan, Haijun Xia, Sorin Lerner

    Abstract: Recent work identified clarity as one of the top quality attributes that notebook users value, but notebooks lack support for maintaining clarity throughout the exploratory phases of the notebook authoring workflow. We propose always-clear notebook authoring that supports both clarity and exploration, and present a Jupyter implementation called Tidynote. The key to Tidynote is three-fold: (1) a sc… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

    Comments: Accepted at CHI 2026

  27. arXiv:2602.22785  [pdf, ps, other

    cs.CV

    SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation

    Authors: Ling Wang, Hao-Xiang Guo, Xinzhou Wang, Fuchun Sun, Kai Sun, Pengkun Liu, Hang Xiao, Zhong Wang, Guangyuan Fu, Eric Li, Yang Liu, Yikai Wang

    Abstract: We introduce SceneTransporter, an end-to-end framework for structured 3D scene generation from a single image. While existing methods generate part-level 3D objects, they often fail to organize these parts into distinct instances in open-world scenes. Through a debiased clustering probe, we reveal a critical insight: this failure stems from the lack of structural constraints within the model's int… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

    Comments: published at iclr 2026

  28. arXiv:2602.22555  [pdf, ps, other

    cs.LG cs.AI

    Autoregressive Visual Decoding from EEG Signals

    Authors: Sicheng Dai, Hongwang Xiao, Shan Yu, Qiwei Ye

    Abstract: Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal resolution. However, current approaches face significant challenges in bridging the modality gap between EEG and image data. These methods typically rely on complex adaptation processes involving multiple stages, making it hard to maintain consistency an… ▽ More

    Submitted 9 March, 2026; v1 submitted 25 February, 2026; originally announced February 2026.

    Journal ref: ICLR 2026

  29. arXiv:2602.21788  [pdf, ps, other

    cs.DC cs.LG

    DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism

    Authors: Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li

    Abstract: Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. Existing training frameworks predominantly rely on static parallelism strategies, which suffer from severe load imbalance, redundant communication, and suboptimal hardware utilization under data heterogeneity. In this work, we propose Dynam… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  30. arXiv:2602.15547  [pdf, ps, other

    cs.CL

    jina-embeddings-v5-text: Task-Targeted Embedding Distillation

    Authors: Mohammad Kalim Akram, Saba Sturua, Nastia Havriushenko, Quentin Herreros, Michael Günther, Maximilian Werk, Han Xiao

    Abstract: Text embedding models are widely used for semantic similarity tasks, including information retrieval, clustering, and classification. General-purpose models are typically trained with single- or multi-stage processes using contrastive loss functions. We introduce a novel training regimen that combines model distillation techniques with task-specific contrastive loss to produce compact, high-perfor… ▽ More

    Submitted 17 February, 2026; originally announced February 2026.

    Comments: 14 pages, 8 figures. Model weights: https://huggingface.co/collections/jinaai/jina-embeddings-v5-text

  31. arXiv:2602.13726  [pdf, ps, other

    cs.CV

    RGA-Net: A Vision Enhancement Framework for Robotic Surgical Systems Using Reciprocal Attention Mechanisms

    Authors: Quanjun Li, Weixuan Li, Han Xia, Junhua Zhou, Chi-Man Pun, Xuhang Chen

    Abstract: Robotic surgical systems rely heavily on high-quality visual feedback for precise teleoperation; yet, surgical smoke from energy-based devices significantly degrades endoscopic video feeds, compromising the human-robot interface and surgical outcomes. This paper presents RGA-Net (Reciprocal Gating and Attention-fusion Network), a novel deep learning framework specifically designed for smoke remova… ▽ More

    Submitted 14 February, 2026; originally announced February 2026.

    Comments: Accepted by ICRA2026

  32. arXiv:2602.11547  [pdf, ps, other

    eess.IV cs.MM

    H.265/HEVC Video Steganalysis Based on CU Block Structure Gradients and IPM Mapping

    Authors: Xiang Zhang, Haiyang Xia, Ziwen He, Wenbin Huang, Fei Peng, Zhangjie Fu

    Abstract: Existing H.265/HEVC video steganalysis research mainly focuses on detecting the steganography based on motion vectors, intra prediction modes, and transform coefficients. However, there is currently no effective steganalysis method capable of detecting steganography based on Coding Unit (CU) block structure. To address this issue, we propose, for the first time, a H.265/HEVC video steganalysis alg… ▽ More

    Submitted 15 March, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

  33. arXiv:2602.11047  [pdf, ps, other

    cs.CL

    Embedding Inversion via Conditional Masked Diffusion Language Models

    Authors: Han Xiao

    Abstract: We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes with no access to the target encoder at inference time. On 32-token sequences across th… ▽ More

    Submitted 18 February, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

    Comments: 8 pages, 3 figures, 4 tables. Code and demo: https://github.com/jina-ai/embedding-inversion-demo

  34. arXiv:2602.10485  [pdf, ps, other

    cs.AI

    Abstraction Generation for Generalized Planning with Pretrained Large Language Models

    Authors: Zhenhe Cui, Huaxiang Xia, Hangjun Shen, Kailun Luo, Yong He, Wei Liang

    Abstract: Qualitative Numerical Planning (QNP) serves as an important abstraction model for generalized planning (GP), which aims to compute general plans that solve multiple instances at once. Recent works show that large language models (LLMs) can function as generalized planners. This work investigates whether LLMs can serve as QNP abstraction generators for GP problems and how to fix abstractions via au… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

  35. arXiv:2602.10116  [pdf, ps, other

    cs.CV cs.RO

    SAGE: Scalable Agentic 3D Scene Generation for Embodied AI

    Authors: Hongchi Xia, Xuan Li, Zhaoshuo Li, Qianli Ma, Jiashu Xu, Ming-Yu Liu, Yin Cui, Tsung-Yi Lin, Wei-Chiu Ma, Shenlong Wang, Shuran Song, Fangyin Wei

    Abstract: Real-world data collection for embodied agents remains costly and unsafe, calling for scalable, realistic, and simulator-ready 3D environments. However, existing scene-generation systems often rely on rule-based or task-specific pipelines, yielding artifacts and physically invalid scenes. We present SAGE, an agentic framework that, given a user-specified embodied task (e.g., "pick up a bowl and pl… ▽ More

    Submitted 20 February, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

    Comments: Project Page: https://research.nvidia.com/labs/dir/sage/

  36. arXiv:2602.09401  [pdf, ps, other

    cs.IR

    SARM: LLM-Augmented Semantic Anchor for End-to-End Live-Streaming Ranking

    Authors: Ruochen Yang, Yueyang Liu, Zijie Zhuang, Changxin Lao, Yuhui Zhang, Jiangxia Cao, Jia Xu, Xiang Chen, Haoke Xiao, Xiangyu Wu, Xiaoyou Zhou, Xiao Lv, Shuang Yang, Tingwen Liu, Zhaojie Liu, Han Li, Kun Gai

    Abstract: Large-scale live-streaming recommendation requires precise modeling of non-stationary content semantics under strict real-time serving constraints. In industrial deployment, two common approaches exhibit fundamental limitations: discrete semantic abstractions sacrifice descriptive precision through clustering, while dense multimodal embeddings are extracted independently and remain weakly aligned… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

  37. arXiv:2602.09375  [pdf, ps, other

    cs.LG

    Latent Poincaré Shaping for Agentic Reinforcement Learning

    Authors: Hanchen Xia, Baoyou Chen, Zelin Zang, Yutang Ge, Guojiang Zhao, Siyu Zhu

    Abstract: We propose LaPha, a method for training AlphaZero-like LLM agents in a Poincaré latent space. Under LaPha, the search process can be visualized as a tree rooted at the prompt and growing outward from the origin toward the boundary of the Poincaré ball, where negative curvature provides exponentially increasing capacity with radius. Using hyperbolic geodesic distance to rule-verified correctness, w… ▽ More

    Submitted 10 March, 2026; v1 submitted 9 February, 2026; originally announced February 2026.

  38. arXiv:2602.08245  [pdf, ps, other

    cs.RO cs.AI

    STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction

    Authors: Jinhao Li, Yuxuan Cong, Yingqiao Wang, Hao Xia, Shan Huang, Yijia Zhang, Ningyi Xu, Guohao Dai

    Abstract: Diffusion policies have recently emerged as a powerful paradigm for visuomotor control in robotic manipulation due to their ability to model the distribution of action sequences and capture multimodality. However, iterative denoising leads to substantial inference latency, limiting control frequency in real-time closed-loop systems. Existing acceleration methods either reduce sampling steps, bypas… ▽ More

    Submitted 8 February, 2026; originally announced February 2026.

    Comments: 13 pages, 9 figures

  39. arXiv:2602.06206  [pdf, ps, other

    cs.IT

    UAV-Enabled Short-Packet Communication via Fluid Antenna Systems

    Authors: Xusheng Zhu, Kai-Kit Wong, Hanjiang Hong, Han Xiao, Hao Xu, Tuo Wu, Chan-Byoung Chae

    Abstract: This paper develops a framework for analyzing UAV-enabled short-packet communication, leveraging fluid antenna system (FAS)-assisted relaying networks. Operating in the short-packet regime and focusing on challenging urban environments, we derive novel, closed-form expressions for the block error rate (BLER). This is achieved by modeling the spatially correlated Nakagami-$m$ fading link via a trac… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

  40. arXiv:2602.06075  [pdf, ps, other

    cs.DC

    MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

    Authors: Guangyi Liu, Pengxiang Zhao, Yaozhen Liang, Qinyi Luo, Shunye Tang, Yuxiang Chai, Weifeng Lin, Han Xiao, WenHao Wang, Siheng Chen, Zhengxi Lu, Gao Wu, Hao Wang, Liang Liu, Yong Liu

    Abstract: Current mobile GUI agent benchmarks systematically fail to assess memory capabilities, with only 5.2-11.8% memory-related tasks and no cross-session learning evaluation. We introduce MemGUI-Bench, a comprehensive memory-centric benchmark with pass@k and staged LLM-as-judge evaluation. Our contributions include: (1) a systematic memory taxonomy analyzing 11 agents across 5 architectures; (2) 128 ta… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

    Comments: https://lgy0404.github.io/MemGUI-Bench/

  41. arXiv:2602.05832  [pdf, ps, other

    cs.CV

    UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents

    Authors: Han Xiao, Guozhi Wang, Hao Wang, Shilong Liu, Yuxiang Chai, Yue Pan, Yufeng Zhou, Xiaoxin Chen, Yafei Wen, Hongsheng Li

    Abstract: Online Reinforcement Learning (RL) offers a promising paradigm for enhancing GUI agents through direct environment interaction. However, its effectiveness is severely hindered by inefficient credit assignment in long-horizon tasks and repetitive errors across tasks due to the lack of experience transfer. To address these challenges, we propose UI-Mem, a novel framework that enhances GUI online RL… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

    Comments: 23 pages, 16 figures. Project page: https://ui-mem.github.io

  42. arXiv:2602.04163  [pdf, ps, other

    cs.LG

    BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models

    Authors: Junyu Chen, Jungang Li, Jing Xiong, Wenjie Wang, Qingyao Yang, He Xiao, Zhen Li, Taiqiang Wu, Mengzhao Chen, Zhen Peng, Chaofan Tao, Long Shi, Hongxia Yang, Ngai Wong

    Abstract: Large language model (LLM) inference is often bounded by memory footprint and memory bandwidth in resource-constrained deployments, making quantization a fundamental technique for efficient serving. While post-training quantization (PTQ) maintains high fidelity at 4-bit, it deteriorates at 2-3 bits. Fundamentally, existing methods enforce a shape-invariant quantization grid (e.g., the fixed unifor… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  43. arXiv:2602.03696  [pdf, ps, other

    cs.LG cs.CL

    Conflict-Resolving and Sharpness-Aware Minimization for Generalized Knowledge Editing with Multiple Updates

    Authors: Duy Nguyen, Hanqi Xiao, Archiki Prasad, Elias Stengel-Eskin, Hyunji Lee, Mohit Bansal

    Abstract: Large language models (LLMs) rely on internal knowledge to solve many downstream tasks, making it crucial to keep them up to date. Since full retraining is expensive, prior work has explored efficient alternatives such as model editing and parameter-efficient fine-tuning. However, these approaches often break down in practice due to poor generalization across inputs, limited stability, and knowled… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

    Comments: 22 pages, 8 figures. Code link: https://github.com/duykhuongnguyen/CoRSA

  44. arXiv:2602.03438  [pdf, ps, other

    cond-mat.mtrl-sci cs.LG

    Acceleration of Atomistic NEGF: Algorithms, Parallelization, and Machine Learning

    Authors: Mathieu Luisier, Nicolas Vetsch, Alexander Maeder, Vincent Maillou, Anders Winka, Leonard Deuschle, Chen Hao Xia, Manasa Kaniselvan, Marko Mladenovic, Jiang Cao, Alexandros Nikolaos Ziogas

    Abstract: The Non-equilibrium Green's function (NEGF) formalism is a particularly powerful method to simulate the quantum transport properties of nanoscale devices such as transistors, photo-diodes, or memory cells, in the ballistic limit of transport or in the presence of various scattering sources such as electronphonon, electron-photon, or even electron-electron interactions. The inclusion of all these m… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  45. arXiv:2602.03302  [pdf

    cs.CV cs.AI

    Full end-to-end diagnostic workflow automation of 3D OCT via foundation model-driven AI for retinal diseases

    Authors: Jinze Zhang, Jian Zhong, Li Lin, Jiaxiong Li, Ke Ma, Naiyang Li, Meng Li, Yuan Pan, Zeyu Meng, Mengyun Zhou, Shang Huang, Shilong Yu, Zhengyu Duan, Sutong Li, Honghui Xia, Juping Liu, Dan Liang, Yantao Wei, Xiaoying Tang, Jin Yuan, Peng Xiao

    Abstract: Optical coherence tomography (OCT) has revolutionized retinal disease diagnosis with its high-resolution and three-dimensional imaging nature, yet its full diagnostic automation in clinical practices remains constrained by multi-stage workflows and conventional single-slice single-task AI models. We present Full-process OCT-based Clinical Utility System (FOCUS), a foundation model-driven framework… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  46. arXiv:2602.03183  [pdf, ps, other

    cs.CL cs.AI

    Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch

    Authors: Hyunwoo Kim, Niloofar Mireshghallah, Michael Duan, Rui Xin, Shuyue Stella Li, Jaehun Jung, David Acuna, Qi Pang, Hanshen Xiao, G. Edward Suh, Sewoong Oh, Yulia Tsvetkov, Pang Wei Koh, Yejin Choi

    Abstract: Research involving privacy-sensitive data has always been constrained by data scarcity, standing in sharp contrast to other areas that have benefited from data scaling. This challenge is becoming increasingly urgent as modern AI agents--such as OpenClaw and Gemini Agent--are granted persistent access to highly sensitive personal information. To tackle this longstanding bottleneck and the rising ri… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

    Comments: For code and data, see https://privasis.github.io

  47. arXiv:2602.01277  [pdf, ps, other

    cs.CV

    TF-Lane: Traffic Flow Module for Robust Lane Perception

    Authors: Yihan Xie, Han Xia, Zhen Yang

    Abstract: Autonomous driving systems require robust lane perception capabilities, yet existing vision-based detection methods suffer significant performance degradation when visual sensors provide insufficient cues, such as in occluded or lane-missing scenarios. While some approaches incorporate high-definition maps as supplementary information, these solutions face challenges of high subscription costs and… ▽ More

    Submitted 1 February, 2026; originally announced February 2026.

    Comments: 9 pages, 7 figures, 7 tables

    ACM Class: I.4.8

  48. arXiv:2602.00079  [pdf, ps, other

    cs.LG cs.CV

    Embedding Compression via Spherical Coordinates

    Authors: Han Xiao

    Abstract: We present an $ε$-bounded compression method for unit-norm embeddings that achieves 1.5$\times$ compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit vectors concentrate around $π/2$, causing IEEE 754 exponents to collapse to a single value and high-order mantissa bits to become predictable, enabling entropy coding of b… ▽ More

    Submitted 25 March, 2026; v1 submitted 21 January, 2026; originally announced February 2026.

    Comments: Accepted at ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). 13 pages, 2 figures. Code: https://github.com/jina-ai/jzip

    MSC Class: 68T50 ACM Class: I.2.7

  49. arXiv:2602.00019  [pdf, ps, other

    q-bio.BM cs.AI

    AutoBinder Agent: An MCP-Based Agent for End-to-End Protein Binder Design

    Authors: Fukang Ge, Jiarui Zhu, Linjie Zhang, Haowen Xiao, Xiangcheng Bao, Fangnan Xie, Danyang Chen, Yanrui Lu, Yuting Wang, Ziqian Guan, Lin Gu, Jinhao Bi, Yingying Zhu

    Abstract: Modern AI technologies for drug discovery are distributed across heterogeneous platforms-including web applications, desktop environments, and code libraries-leading to fragmented workflows, inconsistent interfaces, and high integration overhead. We present an agentic end-to-end drug design framework that leverages a Large Language Model (LLM) in conjunction with the Model Context Protocol (MCP) t… ▽ More

    Submitted 16 January, 2026; originally announced February 2026.

    Comments: 4 pages, 3 figures

  50. arXiv:2601.22573  [pdf, ps, other

    cs.CV

    DELNet: Continuous All-in-One Weather Removal via Dynamic Expert Library

    Authors: Shihong Liu, Kun Zuo, Hanguang Xiao

    Abstract: All-in-one weather image restoration methods are valuable in practice but depend on pre-collected data and require retraining for unseen degradations, leading to high cost. We propose DELNet, a continual learning framework for weather image restoration. DELNet integrates a judging valve that measures task similarity to distinguish new from known tasks, and a dynamic expert library that stores expe… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

    Comments: Accepted by the ICASSP conference, not yet officially published