Skip to main content

Showing 1–50 of 1,080 results for author: Lin, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.14054  [pdf, ps, other

    cs.LG cs.CL

    $π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

    Authors: Yaocheng Zhang, Yuanheng Zhu, Wenyue Chong, Songjun Tu, Qichao Zhang, Jiajun Chai, Xiaohan Wang, Wei Lin, Guojun Yin, Dongbin Zhao

    Abstract: Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse rewards, weak credit assignment, and limited labeled data. Self-play offers a scalable route to reduce data dependence, but conventional self-play optimizes students only through sparse outcome rewards, leading to low learning efficiency. In… ▽ More

    Submitted 15 April, 2026; originally announced April 2026.

    Comments: 26 pages, 12 figures

  2. arXiv:2604.12400  [pdf, ps, other

    cs.NI

    Throughput Characterization of Wireless CSMA Networks With Arbitrary Sensing and Interference Topologies

    Authors: Xinghua Sun, Wenhai Lin, Ruike Zhou

    Abstract: The performance analysis of wireless CSMA networks is notoriously difficult due to the intricate sensing and interference relationships among links. Even the fundamental problem of throughput characterization remains open when sensing and interference topologies are both arbitrary. In this paper, we develop a new analytical framework for throughput characterization in wireless CSMA networks with a… ▽ More

    Submitted 14 April, 2026; originally announced April 2026.

  3. arXiv:2604.11095  [pdf, ps, other

    cs.LG cs.AI

    Bottleneck Tokens for Unified Multimodal Retrieval

    Authors: Siyu Sun, Jing Ren, Zhaohe Liao, Dongxiao Mao, Xiangyuan Ren, Yiyi Zhang, Haohua Zhao, Weixiong Lin, Jiang Shaohua, Liqing Zhang, Yuchao Zheng

    Abstract: Adapting decoder-only multimodal large language models (MLLMs) for unified multimodal retrieval faces two structural gaps. First, existing methods rely on implicit pooling, which overloads the hidden state of a standard vocabulary token (e.g., <EOS>) as the sequence-level representation, a mechanism never designed for information aggregation. Second, contrastive fine-tuning specifies what the embe… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

  4. arXiv:2604.10923  [pdf, ps, other

    cs.CL cs.AI

    Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation

    Authors: Zihao Cheng, Zeming Liu, Yingyu Shan, Xinyi Wang, Xiangrong Zhu, Yunpu Ma, Hongru Wang, Yuhang Guo, Wei Lin, Yunhong Wang

    Abstract: While large language model--powered agents can self-evolve by accumulating experience or by dynamically creating new assets (i.e., tools or expert agents), existing frameworks typically treat these two evolutionary processes in isolation. This separation overlooks their intrinsic interdependence: the former is inherently bounded by a manually predefined static toolset, while the latter generates n… ▽ More

    Submitted 12 April, 2026; originally announced April 2026.

    Comments: Accepted by ACL 2026 Main

  5. arXiv:2604.10425  [pdf, ps, other

    cs.CV

    DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain

    Authors: Song Jin, Juntian Zhang, Xun Zhang, Zeying Tian, Fei Jiang, Guojun Yin, Wei Lin, Yong Liu, Rui Yan

    Abstract: Recent advancements in Vision-Language Models (VLMs) have revolutionized general visual understanding. However, their application in the food domain remains constrained by benchmarks that rely on coarse-grained categories, single-view imagery, and inaccurate metadata. To bridge this gap, we introduce DiningBench, a hierarchical, multi-view benchmark designed to evaluate VLMs across three levels of… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: ACL 2026 Main

  6. arXiv:2604.10208  [pdf, ps, other

    cs.LG

    Mild Over-Parameterization Benefits Asymmetric Tensor PCA

    Authors: Shihong Ding, Weicheng Lin, Cong Fang

    Abstract: Asymmetric Tensor PCA (ATPCA) is a prototypical model for studying the trade-offs between sample complexity, computation, and memory. Existing algorithms for this problem typically require at least $d^{\left\lceil\overline{k}/2\right\rceil}$ state memory cost to recover the signal, where $d$ is the vector dimension and $\overline{k}$ is the tensor order. We focus on the setting where… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

  7. arXiv:2604.10056  [pdf, ps, other

    cs.CV

    U$^{2}$Flow: Uncertainty-Aware Unsupervised Optical Flow Estimation

    Authors: Xunpei Sun, Wenwei Lin, Yi Chang, Gang Chen

    Abstract: Unsupervised optical flow methods typically lack reliable uncertainty estimation, limiting their robustness and interpretability. We propose U$^{2}$Flow, the first recurrent unsupervised framework that jointly estimates optical flow and per-pixel uncertainty. The core innovation is a decoupled learning strategy that derives uncertainty supervision from augmentation consistency via a Laplace-based… ▽ More

    Submitted 11 April, 2026; originally announced April 2026.

    Comments: Accepted as an oral presentation at CVPR 2026

  8. arXiv:2604.09421  [pdf, ps, other

    eess.IV cs.CV cs.MM

    Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

    Authors: Junqi Liu, Yun Zhang, Xiaoxia Huang, Long Xu, Weisi Lin

    Abstract: Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario. To address this issue, we propose a Multi-Task JRD (MT-JRD) dataset and an Attribute-assisted MT-JRD (AMT-JRD) model for Video Coding for Machines (VCM), enhancing both prediction accuracy and coding efficiency. First, we construc… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

    Comments: Submitted to IEEE Transactions on Circuits and Systems for Video Technology

  9. arXiv:2604.06658  [pdf

    cs.CV

    GPAFormer: Graph-guided Patch Aggregation Transformer for Efficient 3D Medical Image Segmentation

    Authors: Chung-Ming Lo, I-Yun Liu, Wei-Yang Lin

    Abstract: Deep learning has been widely applied to 3D medical image segmentation tasks. However, due to the diversity of imaging modalities, the high-dimensional nature of the data, and the heterogeneity of anatomical structures, achieving both segmentation accuracy and computational efficiency in multi-organ segmentation remains a challenge. This study proposed GPAFormer, a lightweight network architecture… ▽ More

    Submitted 8 April, 2026; originally announced April 2026.

  10. arXiv:2604.05347  [pdf, ps, other

    eess.IV cs.CV cs.MM

    CI-ICM: Channel Importance-driven Learned Image Coding for Machines

    Authors: Yun Zhang, Junle Liu, Huan Zhang, Zhaoqing Pan, Gangyi Jiang, Weisi Lin

    Abstract: Traditional human vision-centric image compression methods are suboptimal for machine vision centric compression due to different visual properties and feature characteristics. To address this problem, we propose a Channel Importance-driven learned Image Coding for Machines (CI-ICM), aiming to maximize the performance of machine vision tasks at a given bitrate constraint. First, we propose a Chann… ▽ More

    Submitted 6 April, 2026; originally announced April 2026.

  11. arXiv:2604.04135  [pdf, ps, other

    cs.CV

    NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

    Authors: Shuhong Liu, Chenyu Bao, Ziteng Cui, Xuangeng Chu, Bin Ren, Lin Gu, Xiang Chen, Mingrui Li, Long Ma, Marcos V. Conde, Radu Timofte, Yun Liu, Ryo Umagami, Tomohiro Hashimoto, Zijian Hu, Yuan Gan, Tianhan Xu, Yusuke Kurose, Tatsuya Harada, Junwei Yuan, Gengjia Chang, Xining Ge, Mache You, Qida Cao, Zeliang Li , et al. (81 additional authors not shown)

    Abstract: This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participa… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  12. arXiv:2604.00599  [pdf, ps, other

    cs.LG

    Predicting Dynamics of Ultra-Large Complex Systems by Inferring Governing Equations

    Authors: Qi Shao, Duxin Chen, Jiawen Chen, Yujie Zeng, Athen Ma, Wenwu Yu, Vito Latora, Wei Lin

    Abstract: Predicting the behavior of ultra-large complex systems, from climate to biological and technological networks, is a central unsolved challenge. Existing approaches face a fundamental trade-off: equation discovery methods provide interpretability but fail to scale, while neural networks scale but operate as black boxes and often lose reliability over long times. Here, we introduce the Sparse Identi… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

    Comments: 15 pages, 5 figures, under review

  13. arXiv:2604.00058  [pdf

    q-bio.GN cs.AI cs.LG

    GenoBERT: A Language Model for Accurate Genotype Imputation

    Authors: Lei Huang, Chuan Qiu, Kuan-Jui Su, Anqi Liu, Yun Gong, Weiqiang Lin, Lindong Jiang, Chen Zhao, Meng Song, Jeffrey Deng, Qing Tian, Zhe Luo, Ping Gong, Hui Shen, Chaoyang Zhang, Hong-Wen Deng

    Abstract: Genotype imputation enables dense variant coverage for genome-wide association and risk-prediction studies, yet conventional reference-panel methods remain limited by ancestry bias and reduced rare-variant accuracy. We present Genotype Bidirectional Encoder Representations from Transformers (GenoBERT), a transformer-based, reference-free framework that tokenizes phased genotypes and uses a self-at… ▽ More

    Submitted 31 March, 2026; originally announced April 2026.

  14. arXiv:2604.00022  [pdf, ps, other

    cs.CL cs.AI

    Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce

    Authors: Liang Chen, Qi Liu, Wenhuan Lin, Feng Liang

    Abstract: Multi-dimensional rubric-based dialogue evaluation is widely used to assess conversational AI, yet its criterion validity -- whether quality scores are associated with the downstream outcomes they are meant to serve -- remains largely untested. We address this gap through a two-phase study on a major Chinese matchmaking platform, testing a 7-dimension evaluation rubric (implemented via LLM-as-Judg… ▽ More

    Submitted 11 March, 2026; originally announced April 2026.

  15. DipGuava: Disentangling Personalized Gaussian Features for 3D Head Avatars from Monocular Video

    Authors: Jeonghaeng Lee, Seok Keun Choi, Zhixuan Li, Weisi Lin, Sanghoon Lee

    Abstract: While recent 3D head avatar creation methods attempt to animate facial dynamics, they often fail to capture personalized details, limiting realism and expressiveness. To fill this gap, we present DipGuava (Disentangled and Personalized Gaussian UV Avatar), a novel 3D Gaussian head avatar creation method that successfully generates avatars with personalized attributes from monocular video. DipGuava… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: AAAI 2026

  16. arXiv:2603.27013  [pdf, ps, other

    cs.GR

    PhySkin: Physics-based Bone-driven Neural Garment Simulation

    Authors: Astitva Srivastava, Hsiao-yu Chen, Ryan Goldade, Philipp Herholz, Zhongshi Jiang, Gene Wei-Chin Lin, Lingchen Yang, Nikolaos Sarafianos, Tuur Stuyck, Egor Larionov

    Abstract: Recent advances in digital avatar technology have enabled the generation of compelling virtual characters, but deploying these avatars on compute-constrained devices poses significant challenges for achieving realistic garment deformations. While physics-based simulations yield accurate results, they are computationally prohibitive for real-time applications. Conversely, linear blend skinning offe… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  17. arXiv:2603.26034  [pdf, ps, other

    cs.CL

    AgentCollab: A Self-Evaluation-Driven Collaboration Paradigm for Efficient LLM Agents

    Authors: Wenbo Gao, Renxi Liu, Xian Wang, Fang Guo, Shuai Yang, Xi Chen, Hui-Ling Zhen, Hanting Chen, Weizhe Lin, Xiaosong Li, Yaoyuan Wang

    Abstract: Autonomous agents powered by large language models (LLMs) perform complex tasks through long-horizon reasoning and tool interaction, where a fundamental trade-off arises between execution efficiency and reasoning robustness. Models at different capability-cost levels offer complementary advantages: lower-cost models enable fast execution but may struggle on difficult reasoning segments, while stro… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  18. arXiv:2603.24477  [pdf, ps, other

    cs.SE cs.LG

    Composer 2 Technical Report

    Authors: Cursor Research, :, Aaron Chan, Ahmed Shalaby, Alexander Wettig, Aman Sanger, Andrew Zhai, Anurag Ajay, Ashvin Nair, Charlie Snell, Chen Lu, Chen Shen, Emily Jia, Federico Cassano, Hanpeng Liu, Haoyu Chen, Henry Wildermuth, Jacob Jackson, Janet Li, Jediah Katz, Jiajun Yao, Joey Hejna, Josh Warner, Julius Vering, Kevin Frans , et al. (31 additional authors not shown)

    Abstract: Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learni… ▽ More

    Submitted 25 March, 2026; v1 submitted 25 March, 2026; originally announced March 2026.

  19. arXiv:2603.22241  [pdf, ps, other

    cs.CL

    MemDLM: Memory-Enhanced DLM Training

    Authors: Zehua Pei, Hui-Ling Zhen, Weizhe Lin, Sinno Jialin Pan, Yunhe Wang, Mingxuan Yuan, Bei Yu

    Abstract: Diffusion Language Models (DLMs) offer attractive advantages over Auto-Regressive (AR) models, such as full-attention parallel decoding and flexible generation. However, standard DLM training uses a static, single-step masked prediction objective that never exposes the model to the progressive denoising dynamics of inference, and forces all contextual information to be maintained purely through to… ▽ More

    Submitted 13 April, 2026; v1 submitted 23 March, 2026; originally announced March 2026.

  20. arXiv:2603.20897  [pdf, ps, other

    cs.CY cs.AI cs.AR

    The data heat island effect: quantifying the impact of AI data centers in a warming world

    Authors: Andrea Marinoni, Erik Cambria, Luca Dal Zilio, Weisi Lin, Mauro Dalla Mura, Jocelyn Chanussot, Edoardo Ragusa, Chi Yan Tso, Yihao Zhu, Benjamin Horton

    Abstract: The strong and continuous increase of AI-based services leads to the steady proliferation of AI data centres worldwide with the unavoidable escalation of their power consumption. It is unknown how this energy demand for computational purposes will impact the surrounding environment. Here, we focus our attention on the heat dissipation of AI hyperscalers. Taking advantage of land surface temperatur… ▽ More

    Submitted 1 April, 2026; v1 submitted 21 March, 2026; originally announced March 2026.

  21. arXiv:2603.19152  [pdf, ps, other

    cs.CL cs.AI

    VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models

    Authors: Chonghan Liu, Yimin Du, Qi An, Xin He, Cunqi Zhai, Fei Tan, Weijia Lin, Xiaochun Gong, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang

    Abstract: Large language models frequently exhibit suboptimal performance on low resource languages, primarily due to inefficient subword segmentation and systemic training data imbalances. In this paper, we propose Variable Entropy Policy Optimization (VEPO), which leverages Reinforcement Learning with Verifiable Rewards to incorporate deterministic structural constraints into the policy alignment process.… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: 23 pages. Includes figures and tables. Conference submission

  22. arXiv:2603.16935  [pdf, ps, other

    cs.CV cs.AI

    GenLie: A Global-Enhanced Lie Detection Network under Sparsity and Semantic Interference

    Authors: Zongshun Zhang, Yao Liu, Qiao Liu, Xuefeng Peng, Peiyuan Jiang, Jiaye Yang, Daibing Yao, Wei Lin

    Abstract: Video-based lie detection aims to identify deceptive behaviors from visual cues. Despite recent progress, its core challenge lies in learning sparse yet discriminative representations. Deceptive signals are typically subtle and short-lived, easily overwhelmed by redundant information, while individual and contextual variations introduce strong identity-related noise. To address this issue, we prop… ▽ More

    Submitted 14 March, 2026; originally announced March 2026.

    Comments: Accepted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026

  23. arXiv:2603.16241  [pdf, ps, other

    cs.CV

    Exclusivity-Guided Mask Learning for Semi-Supervised Crowd Instance Segmentation and Counting

    Authors: Jiyang Huang, Hongru Cheng, Wei Lin, Jia Wan, Antoni B. Chan

    Abstract: Semi-supervised crowd analysis is a prominent area of research, as unlabeled data are typically abundant and inexpensive to obtain. However, traditional point-based annotations constrain performance because individual regions are inherently ambiguous, and consequently, learning fine-grained structural semantics from sparse anno tations remains an unresolved challenge. In this paper, we first propo… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

  24. arXiv:2603.14951  [pdf, ps, other

    cs.CV

    GT-PCQA: Geometry-Texture Decoupled Point Cloud Quality Assessment with MLLM

    Authors: Guohua Zhang, Jian Jin, Meiqin Liu, Chao Yao, Weisi Lin, Yao Zhao

    Abstract: With the rapid advancement of Multi-modal Large Language Models (MLLMs), MLLM-based Image Quality Assessment (IQA) methods have shown promising generalization. However, directly extending these MLLM-based IQA methods to PCQA remains challenging. On the one hand, existing PCQA datasets are limited in scale, which hinders stable and effective instruction tuning of MLLMs. On the other hand, due to la… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  25. arXiv:2603.14944  [pdf, ps, other

    cs.LG

    Ultra-Early Prediction of Tipping Points: Integrating Dynamical Measures with Reservoir Computing

    Authors: Xin Li, Qunxi Zhu, Chengli Zhao, Bolin Zhao, Xue Zhang, Xiaojun Duan, Wei Lin

    Abstract: Complex dynamical systems-such as climate, ecosystems, and economics-can undergo catastrophic and potentially irreversible regime changes, often triggered by environmental parameter drift and stochastic disturbances. These critical thresholds, known as tipping points, pose a prediction problem of both theoretical and practical significance, yet remain largely unresolved. To address this, we articu… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  26. arXiv:2603.13842  [pdf, ps, other

    cs.RO cs.AI

    Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

    Authors: Zhexi Lian, Haoran Wang, Xuerun Yan, Weimeng Lin, Xianhong Zhang, Yongyu Chen, Jia Hu

    Abstract: End-to-end autonomous driving is typically built upon imitation learning (IL), yet its performance is constrained by the quality of human demonstrations. To overcome this limitation, recent methods incorporate reinforcement learning (RL) through sequential fine-tuning. However, such a paradigm remains suboptimal: sequential RL fine-tuning can introduce policy drift and often leads to a performance… ▽ More

    Submitted 9 April, 2026; v1 submitted 14 March, 2026; originally announced March 2026.

    Comments: 11 pages, 7 figures, 6 tables

  27. arXiv:2603.10702  [pdf, ps, other

    cs.CV

    UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations

    Authors: Yaqi Zhao, Wang Lin, Zijian Zhang, Miles Yang, Jingyuan Chen, Wentao Zhang, Zhao Zhong, Liefeng Bo

    Abstract: Current unified multimodal models typically rely on discrete visual tokenizers to bridge the modality gap. However, discretization inevitably discards fine-grained semantic information, leading to suboptimal performance in visual understanding tasks. Conversely, directly modeling continuous semantic representations (e.g., CLIP, SigLIP) poses significant challenges in high-dimensional generative mo… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  28. arXiv:2603.10578  [pdf, ps, other

    cs.CV cs.DB

    R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment

    Authors: Zhuangzi Li, Jian Jin, Shilv Cai, Weisi Lin

    Abstract: Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of rendering quality; and second existing CG quality assessment methods cannot provide reasonable text-based explanations. To address these issues, we first identify six k… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  29. arXiv:2603.09264  [pdf

    cs.MM

    TPIFM: A Task-Aware Model for Evaluating Perceptual Interaction Fluency in Remote AR Collaboration

    Authors: Jiarun Song, Ninghao Wan, Fuzheng Yang, Weisi Lin

    Abstract: Remote Collaborative Augmented Reality (RCAR) enables geographically distributed users to collaborate by integrating virtual and physical environments. However, because RCAR relies on real-time transmission, it is susceptible to delay and stalling impairments under constrained network conditions. Perceptual interaction fluency (PIF), defined as the perceived pace and responsiveness of collaboratio… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  30. arXiv:2603.09261  [pdf

    cs.HC cs.MM

    From Perception to Cognition: How Latency Affects Interaction Fluency and Social Presence in VR Conferencing

    Authors: Jiarun Song, Ninghao Wan, FuZheng Yang, Weisi Lin

    Abstract: Virtual reality (VR) conferencing has the potential to provide geographically dispersed users with an immersive environment, enabling rich social interactions and user experience using avatars. However, remote communication in VR inevitably introduces end-to-end (E2E) latency, which can significantly impact user experience. To clarify the impact of latency, we conducted subjective experiments to a… ▽ More

    Submitted 10 March, 2026; originally announced March 2026.

  31. arXiv:2603.08035  [pdf, ps, other

    cs.AI cs.LG

    CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

    Authors: Dengcan Liu, Fengkai Yang, Xiaohan Wang, Shurui Yan, Jiajun Chai, Jiahao Li, Yikun Ban, Zhendong Mao, Wei Lin, Guojun Yin

    Abstract: Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpretability and heavy reliance on costly expert annotations. While recent rubric-based approaches enhance evaluation transparency, they lack systematic quality control, yielding noisy and redundant criteria, failing to mitigate persistent biases (e.g.,… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  32. arXiv:2603.07545  [pdf, ps, other

    cs.CV cs.AI cs.LG

    DreamSAC: Learning Hamiltonian World Models via Symmetry Exploration

    Authors: Jinzhou Tang, Fan Feng, Minghao Fu, Wenjun Lin, Biwei Huang, Keze Wang

    Abstract: Learned world models excel at interpolative generalization but fail at extrapolative generalization to novel physical properties. This limitation arises because they learn statistical correlations rather than the environment's underlying generative rules, such as physical invariances and conservation laws. We argue that learning these invariances is key to robust extrapolation. To achieve this, we… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

    Comments: 19 pages, 5 figures

  33. arXiv:2603.07436  [pdf, ps, other

    cs.CV

    RPG-SAM: Reliability-Weighted Prototypes and Geometric Adaptive Threshold Selection for Training-Free One-Shot Polyp Segmentation

    Authors: Weikun Lin, Yunhao Bai, Yan Wang

    Abstract: Training-free one-shot segmentation offers a scalable alternative to expert annotations where knowledge is often transferred from support images and foundation models. But existing methods often treat all pixels in support images and query response intensities models in a homogeneous way. They ignore the regional heterogeity in support images and response heterogeity in query.To resolve this, we p… ▽ More

    Submitted 13 April, 2026; v1 submitted 7 March, 2026; originally announced March 2026.

    Comments: 8 pages, 3 figures

  34. arXiv:2603.06595  [pdf, ps, other

    cs.CL

    Rethinking Personalization in Large Language Models at the Token Level

    Authors: Chenheng Zhang, Yijun Lu, Lizhe Fang, Chunyuan Zheng, Jiajun Chai, Xiaohan Wang, Guojun Yin, Wei Lin, Yisen Wang, Zhouchen Lin

    Abstract: With large language models (LLMs) now performing strongly across diverse tasks, there is growing demand for them to personalize outputs for individual users. Personalization is typically framed as an additional layer on top of a base NLP task, requiring model responses to meet user-specific needs while still accomplishing the underlying task. From a token-level perspective, different tokens in a r… ▽ More

    Submitted 4 February, 2026; originally announced March 2026.

  35. arXiv:2603.05963  [pdf, ps, other

    cs.CV cs.AI

    Skeleton-to-Image Encoding: Enabling Skeleton Representation Learning via Vision-Pretrained Models

    Authors: Siyuan Yang, Jun Liu, Hao Cheng, Chong Wang, Shijian Lu, Hedvig Kjellstrom, Weisi Lin, Alex C. Kot

    Abstract: Recent advances in large-scale pretrained vision models have demonstrated impressive capabilities across a wide range of downstream tasks, including cross-modal and multi-modal scenarios. However, their direct application to 3D human skeleton data remains challenging due to fundamental differences in data format. Moreover, the scarcity of large-scale skeleton datasets and the need to incorporate s… ▽ More

    Submitted 6 March, 2026; originally announced March 2026.

    Comments: Submitted to IEEE TPAMI, under review

  36. arXiv:2603.04946  [pdf, ps, other

    cs.CL

    LocalSUG: Geography-Aware LLM for Query Suggestion in Local-Life Services

    Authors: Jinwen Chen, Shuai Gong, Shiwen Zhang, Zheng Zhang, Yachao Zhao, Lingxiang Wang, Haibo Zhou, Yuan Zhan, Wei Lin, Hainan Zhang

    Abstract: In local-life service platforms, the query suggestion module plays a crucial role in enhancing user experience by generating candidate queries based on user input prefixes, thus reducing user effort and accelerating search. Traditional multi-stage cascading systems rely heavily on historical top queries, limiting their ability to address long-tail demand. While LLMs offer strong semantic generaliz… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  37. arXiv:2603.04915  [pdf, ps, other

    cs.LG cs.AI cs.CR

    EVMbench: Evaluating AI Agents on Smart Contract Security

    Authors: Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, Olivia Watkins

    Abstract: Smart contracts on public blockchains now manage large amounts of value, and vulnerabilities in these systems can lead to substantial losses. As AI agents become more capable at reading, writing, and running code, it is natural to ask how well they can already navigate this landscape, both in ways that improve security and in ways that might increase risk. We introduce EVMbench, an evaluation that… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  38. arXiv:2603.03726  [pdf, ps, other

    cs.CV

    QD-PCQA: Quality-Aware Domain Adaptation for Point Cloud Quality Assessment

    Authors: Guohua Zhang, Jian Jin, Meiqin Liu, Chao Yao, Weisi Lin

    Abstract: No-Reference Point Cloud Quality Assessment (NR-PCQA) still struggles with generalization, primarily due to the scarcity of annotated point cloud datasets. Since the Human Visual System (HVS) drives perceptual quality assessment independently of media types, prior knowledge on quality learned from images can be repurposed for point clouds. This insight motivates adopting Unsupervised Domain Adapta… ▽ More

    Submitted 16 March, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  39. arXiv:2603.03447  [pdf, ps, other

    cs.CV

    Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

    Authors: Weicai Yan, Yuhong Dai, Qi Ran, Haodong Li, Wang Lin, Hao Liao, Xing Xie, Tao Jin, Jianxun Lian

    Abstract: Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) controlling both quality and quantity of generated content to meet real-time constraints. In this work, we instantiate AI companions through two gaming sce… ▽ More

    Submitted 22 March, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

  40. arXiv:2603.02908  [pdf, ps, other

    cs.AI

    SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

    Authors: Qi Zhang, Yifei Wang, Xiaohan Wang, Jiajun Chai, Guojun Yin, Wei Lin, Yisen Wang

    Abstract: In recent years, pre-trained large language models have achieved remarkable success across diverse tasks. Besides the pivotal role of self-supervised pre-training, their effectiveness in downstream applications also depends critically on the post-training process, which adapts models to task-specific data and objectives. However, this process inevitably introduces model shifts that can influence p… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

  41. arXiv:2603.02238  [pdf, ps, other

    cs.LG cs.FL cs.LO

    Length Generalization Bounds for Transformers

    Authors: Andy Yang, Pascal Bergsträßer, Georg Zetzsche, David Chiang, Anthony W. Lin

    Abstract: Length generalization is a key property of a learning algorithm that enables it to make correct predictions on inputs of any length, given finite training data. To provide such a guarantee, one needs to be able to compute a length generalization bound, beyond which the model is guaranteed to generalize. This paper concerns the open problem of the computability of such generalization bounds for CRA… ▽ More

    Submitted 13 February, 2026; originally announced March 2026.

  42. arXiv:2603.01683  [pdf, ps, other

    cs.CL cs.AI

    Surgical Post-Training: Cutting Errors, Keeping Knowledge

    Authors: Wenye Lin, Kai Han

    Abstract: Enhancing the reasoning capabilities of Large Language Models (LLMs) via post-training is often constrained by the trade-off between efficiency and catastrophic forgetting. While prior research emphasizes the role of on-policy data in mitigating forgetting, we uncover--and validate both theoretically and empirically--an overlooked yet critical mechanism: the implicit regularization inherent in Dir… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

    Comments: 15 pages

  43. arXiv:2603.00040  [pdf, ps, other

    cs.LG cs.AI

    Attn-QAT: 4-Bit Attention With Quantization-Aware Training

    Authors: Peiyuan Zhang, Matthew Noto, Wenxuan Tan, Chengquan Jiang, Will Lin, Wei Zhou, Hao Zhang

    Abstract: Achieving reliable 4-bit attention is a prerequisite for end-to-end FP4 computation on emerging FP4-capable GPUs, yet attention remains the main obstacle due to FP4's tiny dynamic range and attention's heavy-tailed activations. This paper presents the first systematic study of 4-bit quantization-aware training (QAT) for attention. We find that "drop-in" QAT, which naively combines an FP4 forward p… ▽ More

    Submitted 6 March, 2026; v1 submitted 8 February, 2026; originally announced March 2026.

  44. arXiv:2602.23374  [pdf, ps, other

    cs.IR cs.AI cs.CL

    Higress-RAG: A Holistic Optimization Framework for Enterprise Retrieval-Augmented Generation via Dual Hybrid Retrieval, Adaptive Routing, and CRAG

    Authors: Weixi Lin

    Abstract: The integration of Large Language Models (LLMs) into enterprise knowledge management systems has been catalyzed by the Retrieval-Augmented Generation (RAG) paradigm, which augments parametric memory with non-parametric external data. However, the transition from proof-of-concept to production-grade RAG systems is hindered by three persistent challenges: low retrieval precision for complex queries,… ▽ More

    Submitted 30 December, 2025; originally announced February 2026.

    Comments: 7 pages,5 figures, our submissions are not yet published

    ACM Class: H.3.3; I.2.7; D.2.11

  45. arXiv:2602.23153  [pdf, ps, other

    cs.CV cs.AI

    Efficient Encoder-Free Fourier-based 3D Large Multimodal Model

    Authors: Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Yiming Wang, Fabio Poiesi

    Abstract: Large Multimodal Models (LMMs) that process 3D data typically rely on heavy, pre-trained visual encoders to extract geometric features. While recent 2D LMMs have begun to eliminate such encoders for efficiency and scalability, extending this paradigm to 3D remains challenging due to the unordered and large-scale nature of point clouds. This leaves a critical unanswered question: How can we design… ▽ More

    Submitted 28 March, 2026; v1 submitted 26 February, 2026; originally announced February 2026.

    Journal ref: CVPR 2026 camera ready

  46. arXiv:2602.22659  [pdf, ps, other

    cs.CV cs.MM

    Scaling Audio-Visual Quality Assessment Dataset via Crowdsourcing

    Authors: Renyu Yang, Jian Jin, Lili Meng, Meiqin Liu, Yilin Wang, Balu Adsumilli, Weisi Lin

    Abstract: Audio-visual quality assessment (AVQA) research has been stalled by limitations of existing datasets: they are typically small in scale, with insufficient diversity in content and quality, and annotated only with overall scores. These shortcomings provide limited support for model development and multimodal perception research. We propose a practical approach for AVQA dataset construction. First,… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

    Comments: Accepted to ICASSP 2026. 5 pages (main paper) + 8 pages (supplementary material)

  47. arXiv:2602.20666  [pdf, ps, other

    cs.CV

    BoxSplitGen: A Generative Model for 3D Part Bounding Boxes in Varying Granularity

    Authors: Juil Koo, Wei-Tung Lin, Chanho Park, Chanhyeok Park, Minhyuk Sung

    Abstract: Human creativity follows a perceptual process, moving from abstract ideas to finer details during creation. While 3D generative models have advanced dramatically, models specifically designed to assist human imagination in 3D creation -- particularly for detailing abstractions from coarse to fine -- have not been explored. We propose a framework that enables intuitive and interactive 3D shape gene… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

    Comments: Project page: https://boxsplitgen.github.io

  48. arXiv:2602.19891  [pdf

    eess.IV cs.CV

    Using Unsupervised Domain Adaptation Semantic Segmentation for Pulmonary Embolism Detection in Computed Tomography Pulmonary Angiogram (CTPA) Images

    Authors: Wen-Liang Lin, Yun-Chien Cheng

    Abstract: While deep learning has demonstrated considerable promise in computer-aided diagnosis for pulmonary embolism (PE), practical deployment in Computed Tomography Pulmonary Angiography (CTPA) is often hindered by "domain shift" and the prohibitive cost of expert annotations. To address these challenges, an unsupervised domain adaptation (UDA) framework is proposed, utilizing a Transformer backbone and… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

  49. arXiv:2602.16473  [pdf, ps, other

    cs.LG cs.FL cs.LO

    Synthesis and Verification of Transformer Programs

    Authors: Hongjian Jiang, Matthew Hague, Philipp Rümmer, Anthony Widjaja Lin

    Abstract: C-RASP is a simple programming language that was recently shown to capture concepts expressible by transformers. In this paper, we develop new algorithmic techniques for automatically verifying C-RASPs. To this end, we establish a connection to the verification of synchronous dataflow programs in Lustre, which enables us to exploit state-of-the-art model checkers utilizing highly optimized SMT-sol… ▽ More

    Submitted 18 February, 2026; originally announced February 2026.

  50. arXiv:2602.14536  [pdf, ps, other

    cs.CL cs.AI

    Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets

    Authors: Yuchen Yang, Wenze Lin, Enhao Huang, Zhixuan Chu, Hongbin Zhou, Lan Tao, Yiming Li, Zhan Qin, Kui Ren

    Abstract: Large Language Models (LLMs) have seen remarkable advancements, achieving state-of-the-art results in diverse applications. Fine-tuning, an important step for adapting LLMs to specific downstream tasks, typically involves further training on corresponding datasets. However, a fundamental discrepancy exists between current fine-tuning datasets and the token-level optimization mechanism of LLMs: mos… ▽ More

    Submitted 5 April, 2026; v1 submitted 16 February, 2026; originally announced February 2026.