Skip to main content

Showing 1–50 of 781 results for author: Zheng, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.09514  [pdf, ps, other

    cs.CL cs.HC

    Many Ways to Be Fake: Benchmarking Fake News Detection Under Strategy-Driven AI Generation

    Authors: Xinyu Wang, Sai Koneru, Wenbo Zhang, Wenliang Zheng, Saksham Ranjan, Sarah Rajtmajer

    Abstract: Recent advances in large language models (LLMs) have enabled the large-scale generation of highly fluent and deceptive news-like content. While prior work has often treated fake news detection as a binary classification problem, modern fake news increasingly arises through human-AI collaboration, where strategic inaccuracies are embedded within otherwise accurate and credible narratives. These mix… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  2. arXiv:2604.07386  [pdf, ps, other

    cs.CR

    Label Leakage Attacks in Machine Unlearning: A Parameter and Inversion-Based Approach

    Authors: Weidong Zheng, Kongyang Chen, Yao Huang, Yuanwei Guo, Yatie Xiao

    Abstract: With the widespread application of artificial intelligence technologies in face recognition and other fields, data privacy security issues have received extensive attention, especially the \textit{right to be forgotten} emphasized by numerous privacy protection laws. Existing technologies have proposed various unlearning methods, but they may inadvertently leak the categories of unlearned data. Th… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  3. arXiv:2604.07361  [pdf, ps, other

    cs.LG

    BLEG: LLM Functions as Powerful fMRI Graph-Enhancer for Brain Network Analysis

    Authors: Rui Dong, Zitong Wang, Jiaxing Li, Weihuang Zheng, Youyong Kong

    Abstract: Graph Neural Networks (GNNs) have been widely used in diverse brain network analysis tasks based on preprocessed functional magnetic resonance imaging (fMRI) data. However, their performances are constrained due to high feature sparsity and inherent limitations of domain knowledge within uni-modal neurographs. Meanwhile, large language models (LLMs) have demonstrated powerful representation capabi… ▽ More

    Submitted 10 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

  4. arXiv:2604.06589  [pdf, ps, other

    cs.RO

    BiDexGrasp: Coordinated Bimanual Dexterous Grasps across Object Geometries and Sizes

    Authors: Mu Lin, Yi-Lin Wei, Jiaxuan Chen, Yuhao Lin, Shuoyu Chen, Jiangran Lyu, Jiayi Chen, Yansong Tang, He Wang, Wei-Shi Zheng

    Abstract: Bimanual dexterous grasping is a fundamental and promising area in robotics, yet its progress is constrained by the lack of comprehensive datasets and powerful generation models. In this work, we propose BiDexGrasp, consists of a large-scale bimanual dexterous grasp dataset and a novel generation model. For dataset, we propose a novel bimanual grasp synthesis pipeline to efficiently annotate physi… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

    Comments: Project Page: https://frenkielm.github.io/BiDexGrasp.github.io/

  5. arXiv:2604.05876  [pdf, ps, other

    cs.CL

    Mechanistic Circuit-Based Knowledge Editing in Large Language Models

    Authors: Tianyi Zhao, Yinhan He, Wendy Zheng, Chen Chen

    Abstract: Deploying Large Language Models (LLMs) in real-world dynamic environments raises the challenge of updating their pre-trained knowledge. While existing knowledge editing methods can reliably patch isolated facts, they frequently suffer from a "Reasoning Gap", where the model recalls the edited fact but fails to utilize it in multi-step reasoning chains. To bridge this gap, we introduce MCircKE (\un… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  6. arXiv:2604.04186  [pdf, ps, other

    cs.DS

    DAG Covers: The Steiner Point Effect

    Authors: Sujoy Bhore, Hsien-Chih Chang, Jonathan Conroy, Arnold Filtser, Eunjin Oh, Nicole Wein, Da Wei Zheng

    Abstract: Given a weighted digraph $G$, a $(t,g,μ)$-DAG cover is a collection of $g$ dominating DAGs $D_1,\dots,D_g$ such that all distances are approximately preserved: for every pair $(u,v)$ of vertices, $\min_id_{D_i}(u,v)\le t\cdot d_{G}(u,v)$, and the total number of non-$G$ edges is bounded by $|(\cup_i D_i)\setminus G|\le μ$. Assadi, Hoppenworth, and Wein [STOC 25] and Filtser [SODA 26] studied DAG c… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

  7. arXiv:2604.04160  [pdf, ps, other

    eess.AS cs.SD eess.SP

    AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis

    Authors: Tianhua Qi, Wenming Zheng, Björn W. Schuller, Zhaojie Luo, Haizhou Li

    Abstract: Emotion is essential in spoken communication, yet most existing frameworks in speech emotion modeling rely on predefined categories or low-dimensional continuous attributes, which offer limited expressive capacity. Recent advances in speech emotion captioning and synthesis have shown that textual descriptions provide a more flexible and interpretable alternative for representing affective characte… ▽ More

    Submitted 5 April, 2026; originally announced April 2026.

    Comments: Submitted to IEEE Transactions

  8. arXiv:2604.01457  [pdf, ps, other

    cs.CL

    Wired for Overconfidence: A Mechanistic Perspective on Inflated Verbalized Confidence in LLMs

    Authors: Tianyi Zhao, Yinhan He, Wendy Zheng, Yujie Zhang, Chen Chen

    Abstract: Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mech… ▽ More

    Submitted 1 April, 2026; originally announced April 2026.

  9. arXiv:2604.00813  [pdf, ps, other

    cs.CV cs.AI cs.RO

    DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale

    Authors: Sicheng Zuo, Zixun Xie, Wenzhao Zheng, Shaoqing Xu, Fang Li, Hanbing Li, Long Chen, Zhi-Xin Yang, Jiwen Lu

    Abstract: End-to-end autonomous driving has evolved from the conventional paradigm based on sparse perception into vision-language-action (VLA) models, which focus on learning language descriptions as an auxiliary task to facilitate planning. In this paper, we propose an alternative Vision-Geometry-Action (VGA) paradigm that advocates dense 3D geometry as the critical cue for autonomous driving. As vehicles… ▽ More

    Submitted 7 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

    Comments: Code is available at https://github.com/wzzheng/DVGT

  10. arXiv:2603.27896  [pdf, ps, other

    cs.SE

    Large Language Models in Game Development: Implications for Gameplay, Playability, and Player Experience

    Authors: Keeryn Johnson, Muhammad Ahmed, Charlie Lang, Sahib Thethi, Wilson Zheng, Ronnie de Souza Santos

    Abstract: This paper investigates how the integration of large language models influences gameplay, playability, and player experience in game development. We report a collaborative autoethnographic study of two game projects in which LLMs were embedded as architectural components. Reflective narratives and development artifacts were analyzed using gameplay, playability, and player experience as guiding con… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

  11. arXiv:2603.27599  [pdf, ps, other

    cs.CV

    You Only Erase Once: Erasing Anything without Bringing Unexpected Content

    Authors: Yixing Zhu, Qing Zhang, Wenju Xu, Wei-Shi Zheng

    Abstract: We present YOEO, an approach for object erasure. Unlike recent diffusion-based methods which struggle to erase target objects without generating unexpected content within the masked regions due to lack of sufficient paired training data and explicit constraint on content generation, our method allows to produce high-quality object erasure results free of unwanted objects or artifacts while faithfu… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR2026

  12. arXiv:2603.27516  [pdf, ps, other

    cs.CV

    SGS-Intrinsic: Semantic-Invariant Gaussian Splatting for Sparse-View Indoor Inverse Rendering

    Authors: Jiahao Niu, Rongjia Zheng, Wenju Xu, Wei-Shi Zheng, Qing Zhang

    Abstract: We present SGS-Intrinsic, an indoor inverse rendering framework that works well for sparse-view images. Unlike existing 3D Gaussian Splatting (3DGS) based methods that focus on object-centric reconstruction and fail to work under sparse view settings, our method allows to achieve high-quality geometry reconstruction and accurate disentanglement of material and illumination. The core idea is to con… ▽ More

    Submitted 31 March, 2026; v1 submitted 29 March, 2026; originally announced March 2026.

    Comments: CVPR2026

  13. arXiv:2603.26680  [pdf, ps, other

    cs.CL cs.AI

    AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

    Authors: Jianfei Xiao, Xiang Yu, Chengbing Wang, Wuqiang Zheng, Xinyu Lin, Kaining Liu, Hongxun Ding, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

    Abstract: As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has become a critical frontier. However, progress is currently bottlenecked by the absence of a gold-standard evaluation benchmark. Existing benchmarks either overlook personalized information management that is critical for personalization or rely heavily on synthetic dialogues, which exhibit an inherent distr… ▽ More

    Submitted 9 March, 2026; originally announced March 2026.

  14. arXiv:2603.26109  [pdf, ps, other

    cs.CV

    SDDF: Specificity-Driven Dynamic Focusing for Open-Vocabulary Camouflaged Object Detection

    Authors: Jiaming Liang, Yifeng Zhan, Chunlin Liu, Weihua Zheng, Bingye Peng, Qiwei Liang, Boyang Cai, Xiaochun Mai, Qiang Nie

    Abstract: Open-vocabulary object detection (OVOD) aims to detect known and unknown objects in the open world by leveraging text prompts. Benefiting from the emergence of large-scale vision--language pre-trained models, OVOD has demonstrated strong zero-shot generalization capabilities. However, when dealing with camouflaged objects, the detector often fails to distinguish and localize objects because the vi… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR2026

  15. arXiv:2603.25741  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Vega: Learning to Drive with Natural Language Instructions

    Authors: Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu

    Abstract: Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To address this, we first construct a large-scale driving dataset (InstructScene) conta… ▽ More

    Submitted 30 March, 2026; v1 submitted 26 March, 2026; originally announced March 2026.

    Comments: Code is available at https://github.com/zuosc19/Vega

  16. arXiv:2603.25058  [pdf, ps, other

    cs.CV

    Learning Explicit Continuous Motion Representation for Dynamic Gaussian Splatting from Monocular Videos

    Authors: Xuankai Zhang, Junjin Xiao, Shangwei Huang, Wei-shi Zheng, Qing Zhang

    Abstract: We present an approach for high-quality dynamic Gaussian Splatting from monocular videos. To this end, we in this work go one step further beyond previous methods to explicitly model continuous position and orientation deformation of dynamic Gaussians, using an SE(3) B-spline motion bases with a compact set of control points. To improve computational efficiency while enhancing the ability to model… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026

  17. arXiv:2603.22851  [pdf, ps, other

    cs.CV cs.AI

    UniQueR: Unified Query-based Feedforward 3D Reconstruction

    Authors: Chensheng Peng, Quentin Herau, Jiezhi Yang, Yichen Xie, Yihan Hu, Wenzhao Zheng, Matthew Strong, Masayoshi Tomizuka, Wei Zhan

    Abstract: We present UniQueR, a unified query-based feedforward framework for efficient and accurate 3D reconstruction from unposed images. Existing feedforward models such as DUSt3R, VGGT, and AnySplat typically predict per-pixel point maps or pixel-aligned Gaussians, which remain fundamentally 2.5D and limited to visible surfaces. In contrast, UniQueR formulates reconstruction as a sparse 3D query inferen… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  18. arXiv:2603.21790  [pdf, ps, other

    cs.CG cs.DS

    Charting the Diameter Computation Landscape of Geometric Intersection Graphs in Three Dimensions and Higher

    Authors: Timothy M. Chan, Hsien-Chih Chang, Jie Gao, Sándor Kisfaludi-Bak, Hung Le, Da Wei Zheng

    Abstract: Recent research on computing the diameter of geometric intersection graphs has made significant strides, primarily focusing on the 2D case where truly subquadratic-time algorithms were given for simple objects such as unit-disks and (axis-aligned) squares. However, in three or higher dimensions, there is no known truly subquadratic-time algorithm for any intersection graph of non-trivial objects,… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: SoCG '26

  19. arXiv:2603.20290  [pdf, ps, other

    cs.CV cs.RO eess.IV

    Transparent Fragments Contour Estimation via Visual-Tactile Fusion for Autonomous Reassembly

    Authors: Qihao Lin, Borui Chen, Yuping Zhou, Jianing Wu, Yulan Guo, Weishi Zheng, Chongkun Xia

    Abstract: The contour estimation of transparent fragments is very important for autonomous reassembly, especially in the fields of precision optical instrument repair, cultural relic restoration, and identification of other precious device broken accidents. Different from general intact transparent objects, the contour estimation of transparent fragments face greater challenges due to strict optical propert… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

    Comments: 17 pages, 22 figures, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

  20. arXiv:2603.19219  [pdf, ps, other

    cs.CV cs.LG

    DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding

    Authors: Dong Zhuo, Wenzhao Zheng, Sicheng Zuo, Siming Yan, Lu Hou, Jie Zhou, Jiwen Lu

    Abstract: With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed for monocular and 2D scenes, leading to inefficiency and inter-view inconsistency when applied to high-resolution multi-view driving scenes. To address this, we… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Project Page: https://paryi555.github.io/DriveTok/ Code: https://github.com/paryi555/DriveTok

  21. arXiv:2603.19048  [pdf, ps, other

    cs.CV

    Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos

    Authors: Weijia Dou, Wenzhao Zheng, Weiliang Chen, Yu Zheng, Jie Zhou, Jiwen Lu

    Abstract: Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a m… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Code available at https://github.com/tj12323/SGC

  22. arXiv:2603.18446  [pdf, ps, other

    cs.CL cs.LG

    UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference

    Authors: Lang Zhou, Shuxuan Li, Zhuohao Li, Shi Liu, Zhilin Zhao, Wei-Shi Zheng

    Abstract: Long-context inference remains challenging for large language models due to attention dilution and out-of-distribution degradation. Context selection mitigates this limitation by attending to a subset of key-value cache entries, yet most methods allocate a fixed context budget throughout decoding despite highly non-uniform token-level contextual demands. To address this issue, we propose Uncertain… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  23. arXiv:2603.16806  [pdf, ps, other

    cs.RO cs.AI

    DexGrasp-Zero: A Morphology-Aligned Policy for Zero-Shot Cross-Embodiment Dexterous Grasping

    Authors: Yuliang Wu, Yanhan Lin, WengKit Lao, Yuhao Lin, Yi-Lin Wei, Wei-Shi Zheng, Ancong Wu

    Abstract: To meet the demands of increasingly diverse dexterous hand hardware, it is crucial to develop a policy that enables zero-shot cross-embodiment grasping without redundant re-learning. Cross-embodiment alignment is challenging due to heterogeneous hand kinematics and physical constraints. Existing approaches typically predict intermediate motion targets and retarget them to each embodiment, which ma… ▽ More

    Submitted 17 March, 2026; v1 submitted 17 March, 2026; originally announced March 2026.

  24. arXiv:2603.15042  [pdf, ps, other

    cs.DC cs.OS

    Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing

    Authors: Zhenyuan Yang, Wenxin Zheng, Mingyu Li, Haibo Chen

    Abstract: Existing GPU spatial sharing systems face a three-way tradeoff: resource utilization, performance isolation, and semantic determinism. Hardware partitioning suffers from hardware under-utilization. Hardware multiplexing fails to avoid performance interference. Recently proposed software-based GPU kernel slicing reshapes floating-point reduction orders, destroying semantic determinism and inducing… ▽ More

    Submitted 3 April, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  25. arXiv:2603.14845  [pdf, ps, other

    cs.LG cs.AI

    Integrating Weather Foundation Model and Satellite to Enable Fine-Grained Solar Irradiance Forecasting

    Authors: Ziqing Ma, Kai Ying, Xinyue Gu, Tian Zhou, Tianyu Zhu, Haifan Zhang, Peisong Niu, Wang Zheng, Cong Bai, Liang Sun

    Abstract: Accurate day-ahead solar irradiance forecasting is essential for integrating solar energy into the power grid. However, it remains challenging due to the pronounced diurnal cycle and inherently complex cloud dynamics. Current methods either lack fine-scale resolution (e.g., numerical weather prediction, weather foundation models) or degrade at longer lead times (e.g., satellite extrapolation). We… ▽ More

    Submitted 17 March, 2026; v1 submitted 16 March, 2026; originally announced March 2026.

  26. arXiv:2603.05075  [pdf, ps, other

    cs.CV

    UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark

    Authors: Yanlin Li, Minghui Guo, Kaiwen Zhang, Shize Zhang, Yiran Zhao, Haodong Li, Congyue Zhou, Weijie Zheng, Yushen Yan, Shengqiong Wu, Wei Ji, Lei Cui, Furu Wei, Hao Fei, Mong-Li Lee, Wynne Hsu

    Abstract: In real-world multimodal applications, systems usually need to comprehend arbitrarily combined and interleaved multimodal inputs from users, while also generating outputs in any interleaved multimedia form. This capability defines the goal of any-to-any interleaved multimodal learning under a unified paradigm of understanding and generation, posing new challenges and opportunities for advancing Mu… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

    Comments: 70 pages, 63 figures, 30 tables, CVPR

  27. arXiv:2603.04806  [pdf, ps, other

    cs.HC

    SparkTales: Facilitating Cross-Language Collaborative Storytelling through Coordinator-AI Collaboration

    Authors: Wenxin Zhao, Peng Zhang, Hansu Gu, Haoxuan Zhou, Xiaojie Huo, Lin Wang, Wen Zheng, Tun Lu, Ning Gu

    Abstract: Cross-language collaborative storytelling plays a vital role in children's language learning and cultural development, fostering both expressive ability and intercultural awareness. Yet, in practice, children's participation is often shallow, and facilitating such sessions places heavy cognitive and organizational burdens on coordinators, who must coordinate language support, maintain children's e… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

  28. arXiv:2603.02137  [pdf, ps, other

    cs.IR cs.CV

    NextAds: Towards Next-generation Personalized Video Advertising

    Authors: Yiyan Xu, Ruoxuan Xia, Wuqiang Zheng, Fengbin Zhu, Wenjie Wang, Fuli Feng

    Abstract: With the rapid growth of online video consumption, video advertising has become increasingly dominant in the digital advertising landscape. Yet diverse users and viewing contexts makes one-size-fits-all ad creatives insufficient for consistent effectiveness, underlining the importance of personalization. In practice, most personalized video advertising systems follow a retrieval-based paradigm, se… ▽ More

    Submitted 2 March, 2026; originally announced March 2026.

  29. arXiv:2603.00502  [pdf, ps, other

    cs.LG

    Trinity: A Scenario-Aware Recommendation Framework for Large-Scale Cold-Start Users

    Authors: Wenhao Zheng, Wang Lu, Fangshuang Tang, Yiyang Lu, Jun Yang, Pengcheng Xiong, Yulan Yan

    Abstract: Early-stage users in a new scenario intensify cold-start challenges, yet prior works often address only parts of the problem through model architecture. Launching a new user experience to replace an established product involves sparse behavioral signals, low-engagement cohorts, and unstable model performance. We argue that effective recommendations require the synergistic integration of feature en… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

    Journal ref: WWW 2026

  30. arXiv:2602.23259  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

    Authors: Jiangxin Sun, Feng Xue, Teng Long, Chang Liu, Jian-Fang Hu, Wei-Shi Zheng, Nicu Sebe

    Abstract: With advances in imitation learning (IL) and large-scale driving datasets, end-to-end autonomous driving (E2E-AD) has made great progress recently. Currently, IL-based methods have become a mainstream paradigm: models rely on standard driving behaviors given by experts, and learn to minimize the discrepancy between their actions and expert actions. However, this objective of "only driving like the… ▽ More

    Submitted 26 February, 2026; originally announced February 2026.

  31. arXiv:2602.21553  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Revisiting RAG Retrievers: An Information Theoretic Benchmark

    Authors: Wenqing Zheng, Dmitri Kalaev, Noah Fatsi, Daniel Barcklow, Owen Reinert, Igor Melnyk, Senthil Kumar, C. Bayan Bruss

    Abstract: Retrieval-Augmented Generation (RAG) systems rely critically on the retriever module to surface relevant context for large language models. Although numerous retrievers have recently been proposed, each built on different ranking principles such as lexical matching, dense embeddings, or graph citations, there remains a lack of systematic understanding of how these mechanisms differ and overlap. Ex… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

  32. arXiv:2602.21015  [pdf, ps, other

    cs.CV

    From Perception to Action: An Interactive Benchmark for Vision Reasoning

    Authors: Yuhao Wu, Maojia Song, Yihuai Lan, Lei Wang, Zhiqiang Hu, Yao Xiao, Heng Zhou, Weihua Zheng, Dylan Raharja, Soujanya Poria, Roy Ka-Wei Lee

    Abstract: Understanding the physical structure is essential for real-world applications such as embodied agents, interactive design, and long-horizon manipulation. Yet, prevailing Vision-Language Model (VLM) evaluations still center on structure-agnostic, single-turn setups (e.g., VQA), which fail to assess agents' ability to reason about how geometry, contact, and support relations jointly constrain what a… ▽ More

    Submitted 24 February, 2026; originally announced February 2026.

    Comments: Work in processing. Website: https://social-ai-studio.github.io/CHAIN/

  33. arXiv:2602.19049  [pdf, ps, other

    cs.CL cs.LG

    IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

    Authors: Yinhan He, Yaochen Zhu, Mingjia Shi, Wendy Zheng, Lin Su, Xiaoqing Wang, Qi Guo, Jundong Li

    Abstract: Large language models increasingly rely on long chains of thought to improve accuracy, yet such gains come with substantial inference-time costs. We revisit token-efficient post-training and argue that existing sequence-level reward-shaping methods offer limited control over how reasoning effort is allocated across tokens. To bridge the gap, we propose IAPO, an information-theoretic post-training… ▽ More

    Submitted 22 February, 2026; originally announced February 2026.

  34. arXiv:2602.18532  [pdf, ps, other

    cs.CV cs.AI cs.RO

    VLANeXt: Recipes for Building Strong VLA Models

    Authors: Xiao-Ming Wu, Bin Fan, Kang Liao, Jian-Jian Jiang, Runze Yang, Yihang Luo, Zhonghua Wu, Wei-Shi Zheng, Chen Change Loy

    Abstract: Following the rise of large foundation models, Vision-Language-Action models (VLAs) emerged, leveraging strong visual and language understanding for general-purpose policy learning. Yet, the current VLA landscape remains fragmented and exploratory. Although many groups have proposed their own VLA models, inconsistencies in training protocols and evaluation settings make it difficult to identify wh… ▽ More

    Submitted 20 February, 2026; originally announced February 2026.

    Comments: 17 pages, 11 figures, Project Page: https://dravenalg.github.io/VLANeXt/

  35. arXiv:2602.17701  [pdf, ps, other

    eess.SP cs.LG

    Deep Neural Network Architectures for Electrocardiogram Classification: A Comprehensive Evaluation

    Authors: Yun Song, Wenjia Zheng, Tiedan Chen, Ziyu Wang, Jiazhao Shi, Yisong Chen

    Abstract: With the rising prevalence of cardiovascular diseases, electrocardiograms (ECG) remain essential for the non-invasive detection of cardiac abnormalities. This study presents a comprehensive evaluation of deep neural network architectures for automated arrhythmia classification, integrating temporal modeling, attention mechanisms, and ensemble strategies. To address data scarcity in minority classe… ▽ More

    Submitted 7 February, 2026; originally announced February 2026.

  36. arXiv:2602.17040  [pdf, ps, other

    cs.GR

    Fuse3D: Generating 3D Assets Controlled by Multi-Image Fusion

    Authors: Xuancheng Jin, Rengan Xie, Wenting Zheng, Rui Wang, Hujun Bao, Yuchi Huo

    Abstract: Recently, generating 3D assets with the control of condition images has achieved impressive quality. However, existing 3D generation methods are limited to handling a single control objective and lack the ability to utilize multiple images to independently control different regions of a 3D asset, which hinders their flexibility in applications. We propose Fuse3D, a novel method that enables genera… ▽ More

    Submitted 12 November, 2025; originally announced February 2026.

  37. arXiv:2602.16124  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Rethinking ANN-based Retrieval: Multifaceted Learnable Index for Large-scale Recommendation System

    Authors: Jiang Zhang, Yubo Wang, Wei Chang, Lu Han, Xingying Cheng, Feng Zhang, Min Li, Songhao Jiang, Wei Zheng, Harry Tran, Zhen Wang, Lei Chen, Yueming Wang, Benyu Zhang, Xiangjun Fan, Bi Xue, Qifan Wang

    Abstract: Approximate nearest neighbor (ANN) search is widely used in the retrieval stage of large-scale recommendation systems. In this stage, candidate items are indexed using their learned embedding vectors, and ANN search is executed for each user (or item) query to retrieve a set of relevant items. However, ANN-based retrieval has two key limitations. First, item embeddings and their indices are typica… ▽ More

    Submitted 17 February, 2026; originally announced February 2026.

  38. arXiv:2602.12253  [pdf, ps, other

    cs.GT cs.LG

    Is Online Linear Optimization Sufficient for Strategic Robustness?

    Authors: Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

    Abstract: We consider bidding in repeated Bayesian first-price auctions. Bidding algorithms that achieve optimal regret have been extensively studied, but their strategic robustness to the seller's manipulation remains relatively underexplored. Bidding algorithms based on no-swap-regret algorithms achieve both desirable properties, but are suboptimal in terms of statistical and computational efficiency. In… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

    Comments: 26 pages

  39. arXiv:2602.12159  [pdf, ps, other

    cs.RO cs.AI

    3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting

    Authors: Wancai Zheng, Hao Chen, Xianlong Lu, Linlin Ou, Xinyi Yu

    Abstract: Object navigation is a core capability of embodied intelligence, enabling an agent to locate target objects in unknown environments. Recent advances in vision-language models (VLMs) have facilitated zero-shot object navigation (ZSON). However, existing methods often rely on scene abstractions that convert environments into semantic maps or textual representations, causing high-level decision makin… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

  40. arXiv:2602.10604  [pdf, ps, other

    cs.CL cs.AI

    Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

    Authors: Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao, Changyi Wan, Chao Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengting Feng, Chengyuan Yao, Chunrui Han, Dan Ma, Dapeng Shi, Daxin Jiang, Dehua Ma, Deshan Sun , et al. (191 additional authors not shown)

    Abstract: We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/f… ▽ More

    Submitted 23 February, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

    Comments: Technical report for Step 3.5 Flash

  41. arXiv:2602.09023  [pdf, ps, other

    cs.RO

    TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

    Authors: Qinwen Xu, Jiaming Liu, Rui Zhou, Shaojun Shi, Nuowei Han, Zhuoyang Liu, Chenyang Gu, Shuo Gu, Yang Yue, Gao Huang, Wenzhao Zheng, Sirui Han, Peng Jia, Shanghang Zhang

    Abstract: Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and insufficient real-world interaction. While online reinforcement learning (RL) has shown promise in improving general foundation models, applying RL to VLA manipulation in real-world settings is still hindered by low exploration efficiency and a restricted… ▽ More

    Submitted 19 March, 2026; v1 submitted 9 February, 2026; originally announced February 2026.

  42. arXiv:2602.08145  [pdf, ps, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Reliable and Responsible Foundation Models: A Comprehensive Survey

    Authors: Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, Shengbang Tong, Lingfeng Shen, Rafael Rafailov, Runjia Li, Zhaoyang Wang, Yiyang Zhou, Chenhang Cui, Yu Wang, Wenhao Zheng, Huichi Zhou, Jindong Gu, Zhaorun Chen, Peng Xia, Tony Lee, Thomas Zollo , et al. (27 additional authors not shown)

    Abstract: Foundation models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), Image Generative Models (i.e, Text-to-Image Models and Image-Editing Models), and Video Generative Models, have become essential tools with broad applications across various domains such as law, medicine, education, finance, science, and beyond. As these models see increasing real-world deployment,… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

    Comments: TMLR camera-ready version

  43. arXiv:2602.06994  [pdf, ps, other

    q-bio.NC cs.AI cs.CV

    SurfAge-Net: A Hierarchical Surface-Based Network for Interpretable Fine-Grained Brain Age Prediction

    Authors: Rongzhao He, Dalin Zhu, Ying Wang, Songhong Yue, Leilei Zhao, Yu Fu, Dan Wu, Bin Hu, Weihao Zheng

    Abstract: Brain age prediction serves as a powerful framework for assessing brain status and detecting deviations associated with neurodevelopmental and neurodegenerative disorders. However, most existing approaches emphasize whole-brain age prediction and therefore overlook the pronounced regional heterogeneity of brain maturation that is crucial for detecting localized atypical trajectories. To address th… ▽ More

    Submitted 28 January, 2026; originally announced February 2026.

  44. arXiv:2602.04454  [pdf, ps, other

    cs.CV

    Seg-ReSearch: Segmentation with Interleaved Reasoning and External Search

    Authors: Tianming Liang, Qirui Du, Jian-Fang Hu, Haichao Jiang, Zicheng Lin, Wei-Shi Zheng

    Abstract: Segmentation based on language has been a popular topic in computer vision. While recent advances in multimodal large language models (MLLMs) have endowed segmentation systems with reasoning capabilities, these efforts remain confined by the frozen internal knowledge of MLLMs, which limits their potential for real-world scenarios that involve up-to-date information or domain-specific concepts. In… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  45. arXiv:2602.04167  [pdf, ps, other

    cs.CV

    Point2Insert: Video Object Insertion via Sparse Point Guidance

    Authors: Yu Zhou, Xiaoyan Yang, Bojia Zi, Lihan Zhang, Ruijie Sun, Weishi Zheng, Haibin Huang, Chi Zhang, Xuelong Li

    Abstract: This paper introduces Point2Insert, a sparse-point-based framework for flexible and user-friendly object insertion in videos, motivated by the growing popularity of accurate, low-effort object placement. Existing approaches face two major challenges: mask-based insertion methods require labor-intensive mask annotations, while instruction-based methods struggle to place objects at precise locations… ▽ More

    Submitted 3 February, 2026; originally announced February 2026.

  46. arXiv:2602.03595  [pdf, ps, other

    cs.CV

    Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation

    Authors: Haichao Jiang, Tianming Liang, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Referring Video Object Segmentation (RVOS) aims to segment objects in videos based on textual queries. Current methods mainly rely on large-scale supervised fine-tuning (SFT) of Multi-modal Large Language Models (MLLMs). However, this paradigm suffers from heavy data dependence and limited scalability against the rapid evolution of MLLMs. Although recent zero-shot approaches offer a flexible alter… ▽ More

    Submitted 6 February, 2026; v1 submitted 3 February, 2026; originally announced February 2026.

  47. arXiv:2602.01753  [pdf, ps, other

    cs.CV

    ObjEmbed: Towards Universal Multimodal Object Embeddings

    Authors: Shenghao Fu, Yukun Su, Fengyun Rao, Jing Lyu, Xiaohua Xie, Wei-Shi Zheng

    Abstract: Aligning objects with corresponding textual descriptions is a fundamental challenge and a realistic requirement in vision-language understanding. While recent multimodal embedding models excel at global image-text alignment, they often struggle with fine-grained alignment between image regions and specific phrases. In this work, we present ObjEmbed, a novel MLLM embedding model that decomposes the… ▽ More

    Submitted 2 February, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

  48. arXiv:2601.19904  [pdf, ps, other

    cs.AR cs.AI cs.CL cs.DC cs.PF

    DABench-LLM: Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators for LLMs

    Authors: Ziyu Hu, Zhiqing Zhong, Weijian Zheng, Zhijing Ye, Xuwei Tan, Xueru Zhang, Zheng Xie, Rajkumar Kettimuthu, Xiaodong Yu

    Abstract: The exponential growth of large language models has outpaced the capabilities of traditional CPU and GPU architectures due to the slowdown of Moore's Law. Dataflow AI accelerators present a promising alternative; however, there remains a lack of in-depth performance analysis and standardized benchmarking methodologies for LLM training. We introduce DABench-LLM, the first benchmarking framework des… ▽ More

    Submitted 4 December, 2025; originally announced January 2026.

  49. arXiv:2601.18192  [pdf, ps, other

    cs.CV cs.HC cs.MM

    MindCine: Multimodal EEG-to-Video Reconstruction with Large-Scale Pretrained Models

    Authors: Tian-Yi Zhou, Xuan-Hao Liu, Bao-Liang Lu, Wei-Long Zheng

    Abstract: Reconstructing human dynamic visual perception from electroencephalography (EEG) signals is of great research significance since EEG's non-invasiveness and high temporal resolution. However, EEG-to-video reconstruction remains challenging due to: 1) Single Modality: existing studies solely align EEG signals with the text modality, which ignores other modalities and are prone to suffer from overfit… ▽ More

    Submitted 26 January, 2026; v1 submitted 26 January, 2026; originally announced January 2026.

  50. arXiv:2601.16667  [pdf, ps, other

    cs.RO cs.CV

    ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance

    Authors: Zhuohao Li, Yinghao Li, Jian-Jian Jiang, Lang Zhou, Tianyu Zhang, Jiadong Yin, Mu Lin, Yi-Lin Wei, Wei-Shi Zheng

    Abstract: Vision-Language-Action (VLA) models have advanced robotic manipulation by combining vision, language, and proprioception to predict actions. However, previous methods fuse proprioceptive signals directly with vision-language features, resulting in state-dominant bias and \textbf{false completions} despite visible execution failures. We systematically analyze this failure mode, attributing it to mo… ▽ More

    Submitted 11 March, 2026; v1 submitted 23 January, 2026; originally announced January 2026.