Skip to main content

Showing 1–50 of 600 results for author: Hu, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.14141  [pdf, ps, other

    cs.CV

    Geometric Context Transformer for Streaming 3D Reconstruction

    Authors: Lin-Zhuo Chen, Jian Gao, Yihang Chen, Ka Leong Cheng, Yipengjing Sun, Liangxiao Hu, Nan Xue, Xing Zhu, Yujun Shen, Yao Yao, Yinghao Xu

    Abstract: Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data,… ▽ More

    Submitted 15 April, 2026; originally announced April 2026.

    Comments: Project page: https://technology.robbyant.com/lingbot-map Code: https://github.com/robbyant/lingbot-map

  2. arXiv:2604.12668  [pdf, ps, other

    cs.CV

    OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner

    Authors: Haoyang Jiang, Zekun Wang, Mingyang Yi, Xiuyu Li, Lanqing Hu, Junxian Cai, Qingbin Liu, Xi Chen, Ju Fan

    Abstract: The Diffusion Probabilistic Model (DPM) achieves remarkable performance in image generation, while its increasing parameter size and computational overhead hinder its deployment in practical applications. To improve this, the existing literature focuses on obtaining a smaller model with a fixed architecture through model compression. However, in practice, DPMs usually need to be deployed on variou… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  3. arXiv:2604.08995  [pdf, ps, other

    cs.CV

    Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

    Authors: Zile Wang, Zexiang Liu, Jiaxing Li, Kaichen Huang, Baixin Xu, Fei Kang, Mengyin An, Peiyu Wang, Biao Jiang, Yichen Wei, Yidan Xietian, Jiangbo Pei, Liang Hu, Boyi Jiang, Hua Xue, Zidong Wang, Haofeng Sun, Wei Li, Wanli Ouyang, Xianglong He, Yang Liu, Yangguang Li, Yahui Zhou

    Abstract: With the advancement of interactive video generation, diffusion models have increasingly demonstrated their potential as world models. However, existing approaches still struggle to simultaneously achieve memory-enabled long-term temporal consistency and high-resolution real-time generation, limiting their applicability in real-world scenarios. To address this, we present Matrix-Game 3.0, a memory… ▽ More

    Submitted 12 April, 2026; v1 submitted 10 April, 2026; originally announced April 2026.

    Comments: Project page: https://matrix-game-v3.github.io/

  4. arXiv:2604.08083  [pdf, ps, other

    cs.SE

    Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation

    Authors: Li Hu, Xiuwei Shang, Jieke Shi, Shaoyin Cheng, Junqi Zhang, Gangyang Li, Zhou Yang, Weiming Zhang, David Lo

    Abstract: Deobfuscating binary code remains a fundamental challenge in reverse engineering, as obfuscation is widely used to hinder analysis and conceal program logic. Although large language models (LLMs) have shown promise in recovering semantics from obfuscated binaries, a systematic evaluation of their effectiveness is still lacking. In this work, we present BinDeObfBench, the first comprehensive benchm… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

  5. arXiv:2604.07331  [pdf, ps, other

    cs.RO cs.AI cs.CV

    RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

    Authors: Wenjing Margaret Mao, Jefferson Ng, Luyang Hu, Daniel Gehrig, Antonio Loquercio

    Abstract: Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to occlusion, and global consistency. We introduce RoSHI, a hybrid wearable that fuses low-cost sparse IMUs with the Project Aria glasses to estimate the full 3D pose and body shape of the wearer in a metr… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

    Comments: 8 pages, 4 figures. *Equal contribution by first three authors. Project webpage: https://roshi-mocap.github.io/

  6. arXiv:2604.05853   

    cs.CV

    Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

    Authors: Zonghao Ying, Haowen Dai, Lianyu Hu, Zonglei Jing, Quanchen Zou, Yaodong Yang, Aishan Liu, Xianglong Liu

    Abstract: Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and formalize the inscriptive jailbreak, where an adversary coerces a T2I system into generating images containing harmful textual payloads (e.g., fraudulent documents) embedded within visually benign scenes. Unlike traditional depictive jailbreaks that elicit… ▽ More

    Submitted 8 April, 2026; v1 submitted 7 April, 2026; originally announced April 2026.

    Comments: Withdrawn for extensive revisions and inclusion of new experimental results

  7. arXiv:2604.02368  [pdf, ps, other

    cs.AI cs.CL

    Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

    Authors: Xue Liu, Xin Ma, Yuxin Ma, Yongchang Peng, Duo Wang, Zhoufutu Wen, Ge Zhang, Kaiyuan Zhang, Xinyu Chen, Tianci He, Jiani Hou, Liang Hu, Ziyun Huang, Yongzhe Hui, Jianpeng Jiao, Chennan Ju, Yingru Kong, Yiran Li, Mengyun Liu, Luyao Ma, Fei Ni, Yiqing Ni, Yueyan Qiu, Yanle Ren, Zilin Shi , et al. (9 additional authors not shown)

    Abstract: As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks suffer from narrow domain coverage, reliance on generalist tasks, or self-evaluation biases. To bridge this gap, we present XpertBench, a high-fidelity be… ▽ More

    Submitted 7 April, 2026; v1 submitted 27 March, 2026; originally announced April 2026.

  8. arXiv:2603.23886  [pdf, ps, other

    cs.RO cs.AI

    AgentChemist: A Multi-Agent Experimental Robotic Platform Integrating Chemical Perception and Precise Control

    Authors: Xiangyi Wei, Fei Wang, Haotian Zhang, Xin An, Haitian Zhu, Lianrui Hu, Yang Li, Changbo Wang, Xiao He

    Abstract: Chemical laboratory automation has long been constrained by rigid workflows and poor adaptability to the long-tail distribution of experimental tasks. While most automated platforms perform well on a narrow set of standardized procedures, real laboratories involve diverse, infrequent, and evolving operations that fall outside predefined protocols. This mismatch prevents existing systems from gener… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  9. arXiv:2603.19957  [pdf, ps, other

    cs.CV cs.AI cs.LG

    HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

    Authors: Ruicheng Yuan, Zhenxuan Zhang, Anbang Wang, Liwei Hu, Xiangqian Hua, Yaya Peng, Jiawei Luo, Guang Yang

    Abstract: Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing pathology vision-language models (VLMs) reduce this output to a flat label or free-form text. We present HiPath, a lightweight VLM framework built on frozen UNI2 and Qwen3 backbones that treats structured repor… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: 10 pages, 1 figures, 3 tables

  10. arXiv:2603.19144  [pdf, ps, other

    cs.CL cs.AI

    UGID: Unified Graph Isomorphism for Debiasing Large Language Models

    Authors: Zikang Ding, Junchi Yao, Junhao Li, Yi Zhang, Wenbo Jiang, Hongbo Liu, Lijie Hu

    Abstract: Large language models (LLMs) exhibit pronounced social biases. Output-level or data-optimization--based debiasing methods cannot fully resolve these biases, and many prior works have shown that biases are embedded in internal representations. We propose \underline{U}nified \underline{G}raph \underline{I}somorphism for \underline{D}ebiasing large language models (\textit{\textbf{UGID}}), an interna… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  11. arXiv:2603.18793  [pdf, ps, other

    cs.CR cs.AI

    Functional Subspace Watermarking for Large Language Models

    Authors: Zikang Ding, Junhao Li, Suling Wu, Junchi Yao, Hongbo Liu, Lijie Hu

    Abstract: Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  12. arXiv:2603.18329  [pdf, ps, other

    cs.AI

    FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering

    Authors: Zikang Ding, Qiying Hu, Yi Zhang, Hongji Li, Junchi Yao, Hongbo Liu, Lijie Hu

    Abstract: Inference-time steering is widely regarded as a lightweight and parameter-free mechanism for controlling large language model (LLM) behavior, and prior work has often suggested that simple activation-level interventions can reliably induce targeted behavioral changes. However, such conclusions are typically drawn under relatively relaxed evaluation settings that overlook deployment constraints, ca… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  13. arXiv:2603.15401  [pdf, ps, other

    cs.SE cs.AI

    SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

    Authors: Tingxu Han, Yi Zhang, Wei Song, Chunrong Fang, Zhenyu Chen, Youcheng Sun, Lijie Hu

    Abstract: Agent skills, structured procedural knowledge packages injected at inference time, are increasingly used to augment LLM agents on software engineering tasks. However, their real utility in end-to-end development settings remains unclear. We present SWE-Skills-Bench, the first requirement-driven benchmark that isolates the marginal utility of agent skills in real-world software engineering (SWE). I… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  14. arXiv:2603.14807  [pdf, ps, other

    cs.CV cs.RO

    HiMemVLN: Enhancing Reliability of Open-Source Zero-Shot Vision-and-Language Navigation with Hierarchical Memory System

    Authors: Kailin Lyu, Kangyi Wu, Pengna Li, Xiuyu Hu, Qingyi Si, Cui Miao, Ning Yang, Zihang Wang, Long Xiao, Lianyu Hu, Jingyuan Sun, Ce Hao

    Abstract: LLM-based agents have demonstrated impressive zero-shot performance in vision-language navigation (VLN) tasks. However, most zero-shot methods primarily rely on closed-source LLMs as navigators, which face challenges related to high token costs and potential data leakage risks. Recent efforts have attempted to address this by using open-source LLMs combined with a spatiotemporal CoT framework, but… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: 9 pages, 7 figures

  15. arXiv:2603.13397  [pdf, ps, other

    cs.CV

    TennisExpert: Towards Expert-Level Analytical Sports Video Understanding

    Authors: Zhaoyu Liu, Xi Weng, Lianyu Hu, Zhe Hou, Kan Jiang, Jin Song Dong, Yang Liu

    Abstract: Tennis is one of the most widely followed sports, generating extensive broadcast footage with strong potential for professional analysis, automated coaching, and real-time commentary. However, automatic tennis understanding remains underexplored due to two key challenges: (1) the lack of large-scale benchmarks with fine-grained annotations and expert-level commentary, and (2) the difficulty of bui… ▽ More

    Submitted 17 March, 2026; v1 submitted 11 March, 2026; originally announced March 2026.

  16. arXiv:2603.12298  [pdf, ps, other

    cs.LG cs.AI

    Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency

    Authors: Xinyan Jiang, Wenjing Yu, Di Wang, Lijie Hu

    Abstract: Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing methods deriving vectors from static activation differences are susceptible to high-dimensional noise and layer-wise semantic drift, often capturing spurious correlations rather than the target intent. To address this, we propose Global Evolutionary Refi… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  17. arXiv:2603.11325  [pdf, ps, other

    cs.CV

    Towards Trustworthy Selective Generation: Reliability-Guided Diffusion for Ultra-Low-Field to High-Field MRI Synthesis

    Authors: Zhenxuan Zhang, Peiyuan Jing, Ruicheng Yuan, Liwei Hu, Anbang Wang, Fanwen Wang, Yinzhe Wu, Kh Tohidul Islam, Zhaolin Chen, Zi Wang, Peter Lally, Guang Yang

    Abstract: Low-field to high-field MRI synthesis has emerged as a cost-effective strategy to enhance image quality under hardware and acquisition constraints, particularly in scenarios where access to high-field scanners is limited or impractical. Despite recent progress in diffusion models, diffusion-based approaches often struggle to balance fine-detail recovery and structural fidelity. In particular, the… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  18. arXiv:2603.10771  [pdf, ps, other

    cs.CL

    Word Recovery in Large Language Models Enables Character-Level Tokenization Robustness

    Authors: Zhipeng Yang, Shu Yang, Lijie Hu, Di Wang

    Abstract: Large language models (LLMs) trained with canonical tokenization exhibit surprising robustness to non-canonical inputs such as character-level tokenization, yet the mechanisms underlying this robustness remain unclear. We study this phenomenon through mechanistic interpretability and identify a core process we term word recovery. We first introduce a decoding-based method to detect word recovery,… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  19. arXiv:2603.10384  [pdf, ps, other

    cs.AI

    Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

    Authors: Xinyan Jiang, Ninghao Liu, Di Wang, Lijie Hu

    Abstract: Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically grounded geometric kinematics. By decomposing reasoning traces into Progress (displacement) and Stability (curvature), we reveal a distinct topological divergence: correct reasoning manifests as high-… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

    Comments: Under review

  20. arXiv:2603.08486  [pdf, ps, other

    cs.CV cs.AI

    Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images

    Authors: Qishun Yang, Shu Yang, Lijie Hu, Di Wang

    Abstract: Multimodal large language models (MLLMs) face safety misalignment, where visual inputs enable harmful outputs. To address this, existing methods require explicit safety labels or contrastive data; yet, threat-related concepts are concrete and visually depictable, while safety concepts, like helpfulness, are abstract and lack visual referents. Inspired by the Self-Fulfilling mechanism underlying em… ▽ More

    Submitted 14 April, 2026; v1 submitted 9 March, 2026; originally announced March 2026.

  21. arXiv:2603.06651  [pdf, ps, other

    cs.LG cs.AI

    Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds

    Authors: Liwei Hu, Guangyao Li, Wenyong Wang, Xiaoming Zhang, Yu Xiang

    Abstract: Euclidean gradient descent algorithms barely capture the geometry of objective function-induced hypersurfaces and risk driving update trajectories off the hypersurfaces. Riemannian gradient descent algorithms address these issues but fail to represent complex hypersurfaces via a single classic manifold. We propose geodesic gradient descent (GGD), a generic and learning-rate-free Riemannian gradien… ▽ More

    Submitted 27 February, 2026; originally announced March 2026.

  22. arXiv:2603.06397  [pdf, ps, other

    cs.IR cs.LG

    Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion

    Authors: Pengcheng Jiang, Judith Yue Li, Moonkyung Ryu, R. Lily Hu, Kun Su, Zhong Yi Wan, Liam Hebert, Hao Peng, Jiawei Han, Dima Kuzmin, Craig Boutilier

    Abstract: Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties (e.g., diversity, coverage, complementarity, coherence) while remaining grounded with respect to a fixed database. Set-valued objectives are typically non-decomposable and are not captured by existing supervised (query, content) datasets whi… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

  23. arXiv:2603.04800  [pdf, ps, other

    cs.CV

    MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

    Authors: Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao

    Abstract: Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To addre… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026

  24. arXiv:2603.04464  [pdf, ps, other

    cs.LG cs.AI

    Understanding the Dynamics of Demonstration Conflict in In-Context Learning

    Authors: Difan Jiao, Di Wang, Lijie Hu

    Abstract: In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts, we study demonstration-dependent tasks requiring models to infer underlying patterns, a process we characterize as rule infer… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: 19 pages,12 figures,4 tables

  25. arXiv:2603.02565  [pdf, ps, other

    cs.IR cs.CL cs.LG

    FlashEvaluator: Expanding Search Space with Parallel Evaluation

    Authors: Chao Feng, Yuanhao Pu, Chenghao Zhang, Shanqi Liu, Shuchang Liu, Xiang Li, Yongqi Liu, Lantao Hu, Kaiqiao Zhan, Han Li, Kun Gai

    Abstract: The Generator-Evaluator (G-E) framework, i.e., evaluating K sequences from a generator and selecting the top-ranked one according to evaluator scores, is a foundational paradigm in tasks such as Recommender Systems (RecSys) and Natural Language Processing (NLP). Traditional evaluators process sequences independently, suffering from two major limitations: (1) lack of explicit cross-sequence compari… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 23 pages, 2 figures

  26. arXiv:2603.02561  [pdf, ps, other

    cs.IR cs.CV cs.LG

    SOLAR: SVD-Optimized Lifelong Attention for Recommendation

    Authors: Chenghao Zhang, Chao Feng, Yuanhao Pu, Xunyong Yang, Wenhui Yu, Xiang Li, Yongqi Liu, Lantao Hu, Kaiqiao Zhan, Han Li, Kun Gai

    Abstract: Attention mechanism remains the defining operator in Transformers since it provides expressive global credit assignment, yet its $O(N^2 d)$ time and memory cost in sequence length $N$ makes long-context modeling expensive and often forces truncation or other heuristics. Linear attention reduces complexity to $O(N d^2)$ by reordering computation through kernel feature maps, but this reformulation d… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 18 pages, 4 figures

  27. arXiv:2603.00053  [pdf, ps, other

    cs.LG cs.AI

    Mag-Mamba: Modeling Coupled spatiotemporal Asymmetry for POI Recommendation

    Authors: Zhuoxuan Li, Tangwei Ye, Jieyuan Pei, Haina Liang, Zhongyuan Lai, Zihan Liu, Yiming Wu, Qi Zhang, Liang Hu

    Abstract: Next Point-of-Interest (POI) recommendation is a critical task in location-based services, yet it faces the fundamental challenge of coupled spatiotemporal asymmetry inherent in urban mobility. Specifically, transition intents between locations exhibit high asymmetry and are dynamically conditioned on time. Existing methods, typically built on graph or sequence backbones, rely on symmetric operato… ▽ More

    Submitted 10 February, 2026; originally announced March 2026.

    Comments: 14 pages, 7 figures

  28. arXiv:2602.22960  [pdf, ps, other

    cs.CV

    UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models

    Authors: Tianxing Xu, Zixuan Wang, Guangyuan Wang, Li Hu, Zhongyi Zhang, Peng Zhang, Bang Zhang, Song-Hai Zhang

    Abstract: World models based on video generation demonstrate remarkable potential for simulating interactive environments but face persistent difficulties in two key areas: maintaining long-term content consistency when scenes are revisited and enabling precise camera control from user-provided inputs. Existing methods based on explicit 3D reconstruction often compromise flexibility in unbounded scenarios a… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

    Comments: Project Page: https://humanaigc.github.io/ucm-webpage/

  29. arXiv:2602.22769  [pdf, ps, other

    cs.AI cs.LG

    AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

    Authors: Yujie Zhao, Boqin Yuan, Junbo Huang, Haocheng Yuan, Zhongming Yu, Haozhou Xu, Lanxiang Hu, Abhilash Shankarampeta, Zimeng Huang, Wentao Ni, Yuandong Tian, Jishen Zhao

    Abstract: Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for achieving strong performance. However, a significant gap exists between practical applications and current evaluation standards for agent memory: existing benchmarks primarily focus on dialogue-centric, human-agent interactions. In reality, agent m… ▽ More

    Submitted 3 March, 2026; v1 submitted 26 February, 2026; originally announced February 2026.

  30. arXiv:2602.22584  [pdf, ps, other

    cs.CL

    Towards Faithful Industrial RAG: A Reinforced Co-adaptation Framework for Advertising QA

    Authors: Wenwei Li, Ming Xu, Tianle Xia, Lingxiang Hu, Yiding Sun, Linfang Shang, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang

    Abstract: Industrial advertising question answering (QA) is a high-stakes task in which hallucinated content, particularly fabricated URLs, can lead to financial loss, compliance violations, and legal risk. Although Retrieval-Augmented Generation (RAG) is widely adopted, deploying it in production remains challenging because industrial knowledge is inherently relational, frequently updated, and insufficient… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  31. arXiv:2602.22576  [pdf, ps, other

    cs.CL cs.IR cs.LG

    Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training

    Authors: Tianle Xia, Ming Xu, Lingxiang Hu, Yiding Sun, Wenwei Li, Linfang Shang, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang

    Abstract: Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, yet traditional single-round retrieval struggles with complex multi-step reasoning. Agentic RAG addresses this by enabling LLMs to dynamically decide when and what to retrieve, but current RL-based training methods suffer from sparse outcome rewards that discard intermediate signals and… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  32. arXiv:2602.21327  [pdf, ps, other

    cs.LG cs.AI cs.CY

    Equitable Evaluation via Elicitation

    Authors: Elbert Du, Cynthia Dwork, Lunjia Hu, Reid McIlroy-Young, Han Shao, Linjun Zhang

    Abstract: Individuals with similar qualifications and skills may vary in their demeanor, or outward manner: some tend toward self-promotion while others are modest to the point of omitting crucial information. Comparing the self-descriptions of equally qualified job-seekers with different self-presentation styles is therefore problematic. We build an interactive AI for skill elicitation that provides accu… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

    Comments: 27 pages, 3 figures, 2 tables

  33. arXiv:2602.18888  [pdf, ps, other

    cs.CE

    Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis

    Authors: Ziquan Zhu, Hanruo Zhu, Siyuan Lu, Xiang Li, Yanda Meng, Gaojie Jin, Lu Yin, Lijie Hu, Di Wang, Lu Liu, Tianjin Huang

    Abstract: Adapters have become a widely adopted strategy for efficient fine-tuning of large pretrained models, particularly in resource-constrained settings. However, their performance under extreme data scarcity, common in medical imaging due to high annotation costs, privacy regulations, and fragmented datasets, remains underexplored. In this work, we present the first comprehensive study of adapter-based… ▽ More

    Submitted 21 February, 2026; originally announced February 2026.

  34. arXiv:2602.17577  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Simultaneous Blackwell Approachability and Applications to Multiclass Omniprediction

    Authors: Lunjia Hu, Kevin Tian, Chutong Yang

    Abstract: Omniprediction is a learning problem that requires suboptimality bounds for each of a family of losses $\mathcal{L}$ against a family of comparator predictors $\mathcal{C}$. We initiate the study of omniprediction in a multiclass setting, where the comparator family $\mathcal{C}$ may be infinite. Our main result is an extension of the recent binary omniprediction algorithm of [OKK25] to the multic… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

  35. arXiv:2602.14257  [pdf, ps, other

    cs.CL cs.AI cs.IR cs.LG

    AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents

    Authors: Lingxiang Hu, Yiding Sun, Tianle Xia, Wenwei Li, Ming Xu, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang

    Abstract: While Large Language Model (LLM) agents have achieved remarkable progress in complex reasoning tasks, evaluating their performance in real-world environments has become a critical problem. Current benchmarks, however, are largely restricted to idealized simulations, failing to address the practical demands of specialized domains like advertising and marketing analytics. In these fields, tasks are… ▽ More

    Submitted 15 February, 2026; originally announced February 2026.

    Comments: 15 pages, 11 figures

  36. arXiv:2602.11812  [pdf, ps, other

    cs.AI

    Predicting LLM Output Length via Entropy-Guided Representations

    Authors: Huanyi Xie, Yubin Chen, Liangyu Wang, Lijie Hu, Di Wang

    Abstract: The long-tailed distribution of sequence lengths in LLM serving and reinforcement learning (RL) sampling causes significant computational waste due to excessive padding in batched inference. Existing methods rely on auxiliary models for static length prediction, but they incur high overhead, generalize poorly, and fail in stochastic "one-to-many" sampling scenarios. We introduce a lightweight fram… ▽ More

    Submitted 2 April, 2026; v1 submitted 12 February, 2026; originally announced February 2026.

  37. arXiv:2602.10388  [pdf, ps, other

    cs.CL cs.AI

    Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

    Authors: Zhongzhi Li, Xuansheng Wu, Yijiang Li, Lijie Hu, Ninghao Liu

    Abstract: The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics that capture linguistic variation, but such metrics provide only weak signals for the task-relevant features that determine downstream performance. In this work, we introduce Fea… ▽ More

    Submitted 12 February, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

  38. arXiv:2602.08862  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Near-optimal Swap Regret Minimization for Convex Losses

    Authors: Lunjia Hu, Jon Schneider, Yifan Wu

    Abstract: We give a randomized online algorithm that guarantees near-optimal $\widetilde O(\sqrt T)$ expected swap regret against any sequence of $T$ adaptively chosen Lipschitz convex losses on the unit interval. This improves the previous best bound of $\widetilde O(T^{2/3})$ and answers an open question of Fishelson et al. [2025b]. In addition, our algorithm is efficient: it runs in $\mathsf{poly}(T)$ ti… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

  39. arXiv:2602.06449  [pdf, ps, other

    cs.CL

    Evaluating an evidence-guided reinforcement learning framework in aligning light-parameter large language models with decision-making cognition in psychiatric clinical reasoning

    Authors: Xinxin Lin, Guangxin Dai, Yi Zhong, Xiang Li, Xue Xiao, Yixin Zhang, Zhengdong Wu, Yongbo Zheng, Runchuan Zhu, Ming Zhao, Huizi Yu, Shuo Wu, Jun Zhao, Lingming Hu, Yumei Wang, Ping Yin, Joey W. Y. Chan, Ngan Yin Chan, Sijing Chen, Yun Kwok Wing, Lin Lu, Xin Ma, Lizhou Fan

    Abstract: Large language models (LLMs) hold transformative potential for medical decision support yet their application in psychiatry remains constrained by hallucinations and superficial reasoning. This limitation is particularly acute in light-parameter LLMs which are essential for privacy-preserving and efficient clinical deployment. Existing training paradigms prioritize linguistic fluency over structur… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

    Comments: 21 pages, 8 figures

    ACM Class: I.2.7

  40. arXiv:2602.05811  [pdf, ps, other

    cs.AI

    STProtein: predicting spatial protein expression from multi-omics data

    Authors: Zhaorui Jiang, Yingfang Yuan, Lei Hu, Wei Pang

    Abstract: The integration of spatial multi-omics data from single tissues is crucial for advancing biological research. However, a significant data imbalance impedes progress: while spatial transcriptomics data is relatively abundant, spatial proteomics data remains scarce due to technical limitations and high costs. To overcome this challenge we propose STProtein, a novel framework leveraging graph neural… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

    Comments: STProtein: predicting spatial protein expression from multi-omics data is accepted SPARTA_AAAI2026 Oral GitHub: https://github.com/zhaorui-bi/STProtein

  41. arXiv:2602.05729  [pdf, ps, other

    cs.CV cs.LG

    Adaptive Global and Fine-Grained Perceptual Fusion for MLLM Embeddings Compatible with Hard Negative Amplification

    Authors: Lexiang Hu, Youze Xue, Dian Li, Gang Liu, Zhouchen Lin

    Abstract: Multimodal embeddings serve as a bridge for aligning vision and language, with the two primary implementations -- CLIP-based and MLLM-based embedding models -- both limited to capturing only global semantic information. Although numerous studies have focused on fine-grained understanding, we observe that complex scenarios currently targeted by MLLM embeddings often involve a hybrid perceptual patt… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

  42. arXiv:2602.04256  [pdf, ps, other

    cs.RO cs.AI

    AppleVLM: End-to-end Autonomous Driving with Advanced Perception and Planning-Enhanced Vision-Language Models

    Authors: Yuxuan Han, Kunyuan Wu, Qianyi Shao, Renxiang Xiao, Zilu Wang, Cansen Jiang, Yi Xiao, Liang Hu, Yunjiang Lou

    Abstract: End-to-end autonomous driving has emerged as a promising paradigm integrating perception, decision-making, and control within a unified learning framework. Recently, Vision-Language Models (VLMs) have gained significant attention for their potential to enhance the robustness and generalization of end-to-end driving models in diverse and unseen scenarios. However, existing VLM-based approaches stil… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  43. arXiv:2602.02160  [pdf, ps, other

    cs.CL

    D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

    Authors: Bowen Xu, Shaoyu Wu, Hao Jiang, Kai Liu, Xin Chen, Lulu Hu, Bin Yang

    Abstract: Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRMs lack the capability of sub-task decomposition in complex tool use scenarios, leading to Lazy Reasoning. To address this, we propose a two-stage training framework D-CORE~(\underline{\textbf{D}}ecomposing task… ▽ More

    Submitted 2 February, 2026; originally announced February 2026.

  44. arXiv:2602.01834  [pdf, ps, other

    cs.RO

    Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models

    Authors: Siqi Wen, Shu Yang, Shaopeng Fu, Jingfeng Zhang, Lijie Hu, Di Wang

    Abstract: Vision Language Action (VLA) models close the perception action loop by translating multimodal instructions into executable behaviors, but this very capability magnifies safety risks: jailbreaks that merely yield toxic text in LLMs can trigger unsafe physical actions in embodied systems. Existing defenses alignment, filtering, or prompt hardening intervene too late or at the wrong modality, leavin… ▽ More

    Submitted 23 March, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

  45. arXiv:2602.00782  [pdf, ps, other

    q-bio.BM cs.AI

    Controlling Repetition in Protein Language Models

    Authors: Jiahao Zhang, Zeqing Zhang, Di Wang, Lijie Hu

    Abstract: Protein language models (PLMs) have enabled advances in structure prediction and de novo protein design, yet they frequently collapse into pathological repetition during generation. Unlike in text, where repetition merely reduces readability, in proteins it undermines structural confidence and functional viability. To unify this problem, we present the first systematic study of repetition in PLMs.… ▽ More

    Submitted 31 January, 2026; originally announced February 2026.

    Comments: Published as a conference paper at ICLR 2026

  46. arXiv:2602.00329  [pdf, ps, other

    cs.LG cs.AI

    In-Run Data Shapley for Adam Optimizer

    Authors: Meng Ding, Zeqing Zhang, Di Wang, Lijie Hu

    Abstract: Reliable data attribution is essential for mitigating bias and reducing computational waste in modern machine learning, with the Shapley value serving as the theoretical gold standard. While recent "In-Run" methods bypass the prohibitive cost of retraining by estimating contributions dynamically, they heavily rely on the linear structure of Stochastic Gradient Descent (SGD) and fail to capture the… ▽ More

    Submitted 8 March, 2026; v1 submitted 30 January, 2026; originally announced February 2026.

    Comments: 16 pages

  47. arXiv:2601.23149  [pdf, ps, other

    cs.SD

    Hearing is Believing? Evaluating and Analyzing Audio Language Model Sycophancy with SYAUDIO

    Authors: Junchi Yao, Lokranjan Lakshmikanthan, Annie Zhao, Danielle Zhao, Shu Yang, Zikang Ding, Di Wang, Lijie Hu

    Abstract: Audio Language Models (ALMs) have recently shown strong capabilities in unified reasoning over speech, sound, and natural language; yet they inherit behavioral issues observed in Large Language Models, including sycophancy--the tendency to agree with user assertions even when they contradict objective evidence. While sycophancy has been extensively studied in text and vision-language models, its m… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

  48. arXiv:2601.20215  [pdf, ps, other

    cs.IR

    Towards End-to-End Alignment of User Satisfaction via Questionnaire in Video Recommendation

    Authors: Na Li, Jiaqi Yu, Minzhi Xie, Tiantian He, Xiaoxiao Xu, Zixiu Wang, Lantao Hu, Yongqi Liu, Han Li, Kaiqiao Zhan, Kun Gai

    Abstract: Short-video recommender systems typically optimize ranking models using dense user behavioral signals, such as clicks and watch time. However, these signals are only indirect proxies of user satisfaction and often suffer from noise and bias. Recently, explicit satisfaction feedback collected through questionnaires has emerged as a high-quality direct alignment supervision, but is extremely sparse… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

  49. arXiv:2601.19675  [pdf, ps, other

    cs.LG

    LoPRo: Enhancing Low-Rank Quantization via Permuted Block-Wise Rotation

    Authors: Hongyaoxing Gu, Lijuan Hu, Liye Yu, Haowei Li, Fangfang Liu

    Abstract: Post-training quantization (PTQ) enables effective model compression while preserving relatively high accuracy. Current weight-only PTQ methods primarily focus on the challenging sub-3-bit regime, where approaches often suffer significant accuracy degradation, typically requiring fine-tuning to achieve competitive performance. In this work, we revisit the fundamental characteristics of weight quan… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

  50. arXiv:2601.19257  [pdf, ps, other

    q-bio.BM cs.AI cs.LG

    PCEvo: Path-Consistent Molecular Representation via Virtual Evolutionary

    Authors: Kun Li, Longtao Hu, Yida Xiong, Jiajun Yu, Hongzhi Zhang, Jiameng Chen, Xiantao Cai, Jia Wu, Wenbin Hu

    Abstract: Molecular representation learning aims to learn vector embeddings that capture molecular structure and geometry, thereby enabling property prediction and downstream scientific applications. In many AI for science tasks, labeled data are expensive to obtain and therefore limited in availability. Under the few-shot setting, models trained with scarce supervision often learn brittle structure-propert… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

    Comments: 10 pages, 4 figures, 5 tables