Skip to main content

Showing 1–50 of 481 results for author: Hao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.11453  [pdf, ps, other

    cs.NI

    Programmable Packet Scheduling with Dynamic Reordering at Line Rate

    Authors: Zekun Wang, Binghao Yue, Yichen Deng, Weitao Pan, Jiangyi Shi, Yue Hao

    Abstract: High-speed switch packet scheduling demands both line-rate performance and programmability. Existing programmable hardware scheduling models, such as PIFO and PIEO, can express a broad range of scheduling algorithms; however, their semantics are restricted to packet-level ordering and cannot dynamically reorder buffered packets, which limits the support for dynamic-ordering algorithms such as pFab… ▽ More

    Submitted 13 April, 2026; originally announced April 2026.

    Comments: 14 pages,12 body

  2. arXiv:2604.08523  [pdf, ps, other

    cs.CL cs.AI

    ClawBench: Can AI Agents Complete Everyday Online Tasks?

    Authors: Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen

    Abstract: AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accomplish regularly in their lives and work, spanning 144 live platforms across 15 c… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: Project page: https://claw-bench.com

  3. arXiv:2604.05424  [pdf, ps, other

    cs.AI cs.CL

    PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection

    Authors: Siyuan Cheng, Bozhong Tian, YanChao Hao, Zheng Wei

    Abstract: PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection Siyuan Cheng, Bozhong Tian, Yanchao Hao, Zheng Wei Published: 06 Apr 2026, Last Modified: 06 Apr 2026 ACL 2026 Findings Conference, Area Chairs, Reviewers, Publication Chairs, Authors Revisions BibTeX CC BY 4.0 Keywords: Efficient/Low-Resource Methods for NLP, Generation, Question Answering Abstract: The emergence of re… ▽ More

    Submitted 7 April, 2026; originally announced April 2026.

  4. arXiv:2604.04987  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

    Authors: Yongchang Hao, Lili Mou

    Abstract: Speculative sampling (SpS) has been successful in accelerating the decoding throughput of auto-regressive large language models by leveraging smaller draft models. SpS strictly enforces the generated distribution to match that of the verifier LLM. This is unnecessarily restrictive as slight variations of the verifier's distribution, such as sampling with top-$k$ or temperature, would also be accep… ▽ More

    Submitted 4 April, 2026; originally announced April 2026.

    Comments: Camera-ready version. Accepted at ICLR 2026

  5. arXiv:2604.02289  [pdf, ps, other

    cs.CV cs.AI

    Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation

    Authors: Chongjie Ye, Cheng Cao, Chuanyu Pan, Yiming Hao, Yihao Zhi, Yuanming Hu, Xiaoguang Han

    Abstract: Recent multimodal large language models have achieved strong performance in unified text and image understanding and generation, yet extending such native capability to 3D remains challenging due to limited data. Compared to abundant 2D imagery, high-quality 3D assets are scarce, making 3D synthesis under-constrained. Existing methods often rely on indirect pipelines that edit in 2D and lift resul… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  6. arXiv:2604.00824  [pdf, ps, other

    cs.SE

    Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

    Authors: CodeArts Model Team, Yang Ye, Jingyuan Tan, Tianyue Jiang, Ruizhe Ye, Qiankun He, Jiarui Yang, Jian Dong, Sicong Liang, Chongjian Yue, Peibai Xu, Lufan Lu, Shiguan Pang, Taotao Qian, Junbao Hu, Yuechan Hao, Ensheng Shi, Qi Zhang, Yi Hao, Na Fan, Xin Tan, Shuai Yao, Zhiwei Shen, Zongchen Li, Yanlin Wang , et al. (2 additional authors not shown)

    Abstract: Training effective software engineering agents requires large volumes of task-specific trajectories, incurring substantial data construction costs. Inspired by the "Less-Is-More" hypothesis in mathematical reasoning, we investigate its extension to agentic scenarios and propose an end-to-end training framework that achieves superior agentic capabilities with fewer but higher-quality training traje… ▽ More

    Submitted 6 April, 2026; v1 submitted 1 April, 2026; originally announced April 2026.

    Comments: 17 pages, 5 figures

  7. arXiv:2603.29644  [pdf, ps, other

    cs.LG

    Disentangled Graph Prompting for Out-Of-Distribution Detection

    Authors: Cheng Yang, Yu Hao, Qi Zhang, Chuan Shi

    Abstract: When testing data and training data come from different distributions, deep neural networks (DNNs) will face significant safety risks in practical applications. Therefore, out-of-distribution (OOD) detection techniques, which can identify OOD samples at test time and alert the system, are urgently needed. Existing graph OOD detection methods usually characterize fine-grained in-distribution (ID) p… ▽ More

    Submitted 31 March, 2026; originally announced March 2026.

    Comments: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (TKDE)

  8. arXiv:2603.29452  [pdf, ps, other

    cs.RO

    CReF: Cross-modal and Recurrent Fusion for Depth-conditioned Humanoid Locomotion

    Authors: Yuan Hao, Ruiqi Yu, Shixin Luo, Guoteng Zhang, Jun Wu, Qiuguo Zhu

    Abstract: Stable traversal over geometrically complex terrain increasingly requires exteroceptive perception, yet prior perceptive humanoid locomotion methods often remain tied to explicit geometric abstractions, either by mediating control through robot-centric 2.5D terrain representations or by shaping depth learning with auxiliary geometry-related targets. Such designs inherit the representational bias o… ▽ More

    Submitted 31 March, 2026; v1 submitted 31 March, 2026; originally announced March 2026.

  9. arXiv:2603.28610  [pdf, ps, other

    cs.CV cs.AI cs.CL

    ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

    Authors: Huanxuan Liao, Zhongtao Jiang, Yupu Hao, Yuqiao Tan, Shizhu He, Ben Wang, Jun Zhao, Kun Xu, Kang Liu

    Abstract: Multimodal Large Language Models (MLLMs) achieve stronger visual understanding by scaling input fidelity, yet the resulting visual token growth makes jointly sustaining high spatial resolution and long temporal context prohibitive. We argue that the bottleneck lies not in how post-encoding representations are compressed but in the volume of pixels the encoder receives, and address it with ResAdapt… ▽ More

    Submitted 31 March, 2026; v1 submitted 30 March, 2026; originally announced March 2026.

    Comments: work in progress

  10. arXiv:2603.27136  [pdf, ps, other

    cs.SE cs.CY

    The First Issue Matters: Linking Task-Level Characteristics to Long-Term Newcomer Retention in OSS

    Authors: Yichen Hao, Weiwei Xu, Kai Gao, Xiaofang Zhang

    Abstract: Sustaining newcomer participation is critical for the long-term health of open-source communities. Although prior research has explored various task recommendation approaches to help newcomers resolve their first-issue, these methods overlook how characteristics of first-issues may influence newcomers' long-term retention, limiting our understanding of whether initial success leads to sustained pa… ▽ More

    Submitted 28 March, 2026; originally announced March 2026.

  11. arXiv:2603.25107  [pdf, ps, other

    cs.CV

    Label What Matters: Modality-Balanced and Difficulty-Aware Multimodal Active Learning

    Authors: Yuqiao Zeng, Xu Wang, Tengfei Liang, Yiqing Hao, Yi Jin, Hui Yu

    Abstract: Multimodal learning integrates complementary information from different modalities such as image, text, and audio to improve model performance, but its success relies on large-scale labeled data, which is costly to obtain. Active learning (AL) mitigates this challenge by selectively annotating informative samples. In multimodal settings, many approaches implicitly assume that modality importance i… ▽ More

    Submitted 26 March, 2026; originally announced March 2026.

  12. arXiv:2603.22993  [pdf, ps, other

    math.CO cs.DM

    Backward Arcs in Hamilton Oriented Cycles and Paths in Directed Graphs with Independence Number Two

    Authors: S. Gerke, Q. Guo, G. Gutin, Y. Hao, W. Veeranonchai, A. Yeo

    Abstract: In a digraph $D=(V,A)$, an oriented path is a sequence $P=x_1x_2\dots x_p$ of distinct vertices such that either $x_ix_{i+1}\in A$ or $x_{i+1}x_{i}\in A$ or both for every $i\in [p-1]$. If $x_ix_{i+1}\in A$ in $P$, then $x_ix_{i+1}$ is a forward arc of $P$; otherwise, $x_{i+1}x_{i}$ is a backward arc. The independence number $α(D)$ is the maximum integer $p$ such that $D$ has a set of $p$ vertices… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  13. arXiv:2603.22728  [pdf, ps, other

    cs.SD eess.AS

    The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models

    Authors: Heinrich Dinkel, Jiahao Zhou, Guanbo Wang, Yadong Niu, Junbo Zhang, Yufeng Hao, Ying Liu, Ke Li, Wenwu Wang, Zhiyong Wu, Jian Luan

    Abstract: This paper presents the Interspeech 2026 Audio Encoder Capability Challenge, a benchmark specifically designed to evaluate and advance the performance of pre-trained audio encoders as front-end modules for Large Audio Language Models (LALMs). While LALMs have shown remarkable understanding of complex acoustic scenes, their performance depends on the semantic richness of the underlying audio encode… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: Interspeech 2026 Challenge

  14. arXiv:2603.22626  [pdf, ps, other

    cs.CV

    PIVM: Diffusion-Based Prior-Integrated Variation Modeling for Anatomically Precise Abdominal CT Synthesis

    Authors: Dinglun He, Baoming Zhang, Xu Wang, Yao Hao, Deshan Yang, Ye Duan

    Abstract: Abdominal CT data are limited by high annotation costs and privacy constraints, which hinder the development of robust segmentation and diagnostic models. We present a Prior-Integrated Variation Modeling (PIVM) framework, a diffusion-based method for anatomically accurate CT image synthesis. Instead of generating full images from noise, PIVM predicts voxel-wise intensity variations relative to org… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

    Comments: Accepted at the IEEE International Symposium on Biomedical Imaging (ISBI) 2026 (Oral). Equal contribution by the first three authors

  15. arXiv:2603.20939  [pdf, ps, other

    cs.CL cs.AI cs.HC cs.IR stat.ML

    User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

    Authors: Yuren Hao, Shuhaib Mehri, ChengXiang Zhai, Dilek Hakkani-Tür

    Abstract: Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retri… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: 21 pages including appendices

    ACM Class: I.2.7

  16. arXiv:2603.20620  [pdf, ps, other

    cs.AI

    Reasoning Traces Shape Outputs but Models Won't Say So

    Authors: Yijie Hao, Lingjie Chen, Ali Emami, Joyce Ho

    Abstract: Can we trust the reasoning traces that large reasoning models (LRMs) produce? We investigate whether these traces faithfully reflect what drives model outputs, and whether models will honestly report their influence. We introduce Thought Injection, a method that injects synthetic reasoning snippets into a model's <think> trace, then measures whether the model follows the injected reasoning and ack… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

  17. arXiv:2603.19470  [pdf, ps, other

    cs.LG cs.AI

    Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL

    Authors: Chenlu Ye, Xuanchang Zhang, Yifan Hao, Zhou Yu, Ziji Zhang, Abhinav Gullapalli, Hao Chen, Jing Huang, Tong Zhang

    Abstract: Off-policy problems such as policy staleness and training-inference mismatch, has become a major bottleneck for training stability and further exploration for LLM RL. To enhance inference efficiency, the distribution gap between the inference and updated policy grows, leading to heavy-tailed importance ratios. Heavy-tailed ratios arise when the policy is locally sharp, which further inflates sharp… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

  18. arXiv:2603.18757  [pdf, ps, other

    cs.CV

    DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection

    Authors: Haochen Li, Rui Zhang, Hantao Yao, Xin Zhang, Yifan Hao, Shaohui Peng, Yongwei Zhao, Ling Li

    Abstract: Domain Adaptive Object Detection (DAOD) aims to transfer detectors from a labeled source domain to an unlabeled target domain. Existing DAOD methods employ multi-granularity feature alignment to learn domain-invariant representations. However, the local connectivity of their CNN-based backbone and detection head restricts alignment to local regions, failing to extract global domain-invariant featu… ▽ More

    Submitted 19 March, 2026; originally announced March 2026.

    Comments: Accepted by CVPR 2026

  19. arXiv:2603.14730  [pdf, ps, other

    cs.LG

    GNNVerifier: Graph-based Verifier for LLM Task Planning

    Authors: Yu Hao, Qiuyu Wang, Cheng Yang, Yawen Li, Zhiqiang Zhang, Chuan Shi

    Abstract: Large language models (LLMs) facilitate the development of autonomous agents. As a core component of such agents, task planning aims to decompose complex natural language requests into concrete, solvable sub-tasks. Since LLM-generated plans are frequently prone to hallucinations and sensitive to long-context prom-pts, recent research has introduced plan verifiers to identify and correct potential… ▽ More

    Submitted 17 March, 2026; v1 submitted 15 March, 2026; originally announced March 2026.

    Comments: 17pages,12figures

  20. arXiv:2603.11896  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

    Authors: Lu Wang, Zhuoran Jin, Yupu Hao, Yubo Chen, Kang Liu, Yulong Ao, Jun Zhao

    Abstract: Multimodal large language models (MLLMs) have shown strong performance on offline video understanding, but most are limited to offline inference or have weak online reasoning, making multi-turn interaction over continuously arriving video streams difficult. Existing streaming methods typically use an interleaved perception-generation paradigm, which prevents concurrent perception and generation an… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  21. arXiv:2603.11515  [pdf, ps, other

    cs.AI

    Multi-Agent Collaboration for Automated Design Exploration on High Performance Computing Systems

    Authors: Harshitha Menon, Charles F. Jekel, Kevin Korner, Brian Gunnarson, Nathan K. Brown, Michael Stees, M. Giselle Fernandez-Godino, Walter Nissen, Meir H. Shachar, Dane M. Sterbentz, William J. Schill, Yue Hao, Robert Rieben, William Quadros, Steve Owen, Scott Mitchell, Ismael D. Boureima, Jonathan L. Belof

    Abstract: Today's scientific challenges, from climate modeling to Inertial Confinement Fusion design to novel material design, require exploring huge design spaces. In order to enable high-impact scientific discovery, we need to scale up our ability to test hypotheses, generate results, and learn from them rapidly. We present MADA (Multi-Agent Design Assistant), a Large Language Model (LLM) powered multi-ag… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  22. arXiv:2603.05232  [pdf, ps, other

    cs.LG

    SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity

    Authors: Hanyong Shao, Yingbo Hao, Ting Song, Yan Xia, Di Zhang, Shaohan Huang, Xun Wu, Songchen Xu, Le Xu, Li Dong, Zewen Chi, Yi Zou, Furu Wei

    Abstract: NVIDIA's 2:4 Sparse Tensor Cores deliver 2x throughput but demand strict 50% pruning -- a ratio that collapses LLM reasoning accuracy (Qwen3: 54% to 15%). Milder $(2N-2):2N$ patterns (e.g., 6:8, 25% pruning) preserve accuracy yet receive no hardware support, falling back to dense execution without any benefit from sparsity. We present SlideSparse, the first system to unlock Sparse Tensor Core acce… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  23. arXiv:2603.05185  [pdf, ps, other

    cs.RO

    Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation

    Authors: Pengfei Yi, Yingjie Ma, Wenjiang Xu, Yanan Hao, Shuai Gan, Wanting Li, Shanlin Zhong

    Abstract: Balancing high-level semantic reasoning with low-level reactive control remains a core challenge in visual robotic manipulation. While Vision-Language Models (VLMs) excel at cognitive planning, their inference latency precludes real-time execution. Conversely, fast Vision-Language-Action (VLA) models often lack the semantic depth required for complex, long-horizon tasks. To bridge this gap, we int… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  24. arXiv:2603.05168  [pdf, ps, other

    cs.CL

    Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

    Authors: Di Zhang, Xun Wu, Shaohan Huang, Yudong Wang, Hanyong Shao, Yingbo Hao, Zewen Chi, Li Dong, Ting Song, Yan Xia, Zhifang Sui, Furu Wei

    Abstract: Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propo… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  25. arXiv:2603.03138  [pdf, ps, other

    cs.RO

    Look Forward to Walk Backward: Efficient Terrain Memory for Backward Locomotion with Forward Vision

    Authors: Shixin Luo, Songbo Li, Yuan Hao, Yaqi Wang, Jun Zheng, Jun Wu, Qiuguo Zhu

    Abstract: Legged robots with egocentric forward-facing depth cameras can couple exteroception and proprioception to achieve robust forward agility on complex terrain. When these robots walk backward, the forward-only field of view provides no preview. Purely proprioceptive controllers can remain stable on moderate ground when moving backward but cannot fully exploit the robot's capabilities on complex terra… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: Accepted for 2026 IEEE International Conference on Robotics and Automation (ICRA)

  26. arXiv:2603.00607  [pdf, ps, other

    cs.CV cs.AI

    IdGlow: Dynamic Identity Modulation for Multi-Subject Generation

    Authors: Honghao Cai, Xiangyuan Wang, Yunhao Bai, Tianze Zhou, Sijie Xu, Yuyang Hao, Zezhou Cui, Yuyuan Yang, Wei Zhu, Yibo Chen, Xu Tang, Yao Hu, Zhen Li

    Abstract: Multi-subject image generation requires seamlessly harmonizing multiple reference identities within a coherent scene. However, existing methods relying on rigid spatial masks or localized attention often struggle with the "stability-plasticity dilemma," particularly failing in tasks that require complex structural deformations, such as identity-preserving age transformation. To address this, we pr… ▽ More

    Submitted 28 February, 2026; originally announced March 2026.

  27. arXiv:2602.23798  [pdf, ps, other

    cs.LG cs.AI cs.CR cs.DC

    MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

    Authors: Tiantong Wang, Xinyu Yan, Tiantong Wu, Yurong Hao, Yong Jiang, Fei Huang, Wei Yang Bryan Lim

    Abstract: Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for ra… ▽ More

    Submitted 27 February, 2026; originally announced February 2026.

  28. arXiv:2602.17641  [pdf, ps, other

    cs.LG cs.AI

    FAMOSE: A ReAct Approach to Automated Feature Discovery

    Authors: Keith Burghardt, Jienan Liu, Sadman Sakib, Yuning Hao, Bo Li

    Abstract: Feature engineering remains a critical yet challenging bottleneck in machine learning, particularly for tabular data, as identifying optimal features from an exponentially large feature space traditionally demands substantial domain expertise. To address this challenge, we introduce FAMOSE (Feature AugMentation and Optimal Selection agEnt), a novel framework that leverages the ReAct paradigm to au… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

    Comments: 23 pages, 6 figures

  29. arXiv:2602.17205  [pdf

    astro-ph.IM astro-ph.CO astro-ph.GA cs.AI

    Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising

    Authors: Yuduo Guo, Hao Zhang, Mingyu Li, Fujiang Yu, Yunjing Wu, Yuhan Hao, Song Huang, Yongming Liang, Xiaojing Lin, Xinyang Li, Jiamin Wu, Zheng Cai, Qionghai Dai

    Abstract: The detection limit of astronomical imaging observations is limited by several noise sources. Some of that noise is correlated between neighbouring image pixels and exposures, so in principle could be learned and corrected. We present an astronomical self-supervised transformer-based denoising algorithm (ASTERIS), that integrates spatiotemporal information across multiple exposures. Benchmarking o… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

    Comments: Published in Science. This is the author's version of the work. It is posted here by permission of the AAAS for personal use, not for redistribution

  30. arXiv:2602.12966  [pdf, ps, other

    cs.CL cs.SE

    ProbeLLM: Automating Principled Diagnosis of LLM Failures

    Authors: Yue Huang, Zhengzhe Jiang, Yuchen Ma, Yu Jiang, Xiangqi Wang, Yujun Zhou, Yuexing Hao, Kehan Guo, Pin-Yu Chen, Stefan Feuerriegel, Xiangliang Zhang

    Abstract: Understanding how and why large language models (LLMs) fail is becoming a central challenge as models rapidly evolve and static evaluations fall behind. While automated probing has been enabled by dynamic test generation, existing approaches often discover isolated failure cases, lack principled control over exploration, and provide limited insight into the underlying structure of model weaknesses… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

  31. "Not Human, Funnier": How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy

    Authors: Xuehan Huang, Canwen Wang, Yifei Hao, Daijin Yang, Ray LC

    Abstract: Chatbots are increasingly applied to domains previously reserved for human actors. One such domain is comedy, whereby both the general public working with ChatGPT and research-based LLM-systems have tried their hands on making humor. In formative interviews with professional comedians and video analyses of stand-up comedy in humans, we found that human performers often use their ethnic, gender, co… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

    Comments: 27 pages, 5 figures. Conditionally Accepted to CHI '26

  32. arXiv:2602.08220  [pdf, ps, other

    cs.CL

    Pretraining with Token-Level Adaptive Latent Chain-of-Thought

    Authors: Boyi Zeng, Yiqin Hao, He Li, Shixiang Song, Feichen Song, Zitong Wang, Siyuan Huang, Yi Xu, ZiWei He, Xinbing Wang, Zhouhan Lin

    Abstract: Scaling large language models by increasing parameters and training data is increasingly constrained by limited high-quality corpora and rising communication costs. This work explores an alternative axis: increasing per-token computation without expanding parameters, by internalizing latent Chain-of-Thought (CoT) into pretraining. We propose Pretraining with Token-Level Adaptive Latent CoT (adapti… ▽ More

    Submitted 10 March, 2026; v1 submitted 8 February, 2026; originally announced February 2026.

    Comments: 15pages

  33. arXiv:2602.06409  [pdf, ps, other

    cs.CR

    VENOMREC: Cross-Modal Interactive Poisoning for Targeted Promotion in Multimodal LLM Recommender Systems

    Authors: Guowei Guan, Yurong Hao, Jiaming Zhang, Tiantong Wu, Fuyao Zhang, Tianxiang Chen, Longtao Huang, Cyril Leung, Wei Yang Bryan Lim

    Abstract: Multimodal large language models (MLLMs) are pushing recommender systems (RecSys) toward content-grounded retrieval and ranking via cross-modal fusion. We find that while cross-modal consensus often mitigates conventional poisoning that manipulates interaction logs or perturbs a single modality, it also introduces a new attack surface where synchronised multimodal poisoning can reliably steer fuse… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  34. arXiv:2602.02579  [pdf, ps, other

    cs.OS cs.AI

    ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation

    Authors: Shihao Wang, Jiahao Chen, Yanqi Pan, Hao Huang, Yichen Hao, Xiangyu Zou, Wen Xia, Wentao Zhang, Chongyang Qiu, Pengfei Wang

    Abstract: The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retrieved RAG documents (by a user query) and reprocess selected tokens to recover cross-attention between these pre-calculated KV caches. However, we identify a fundamental "crowding-out effect" in current… ▽ More

    Submitted 4 February, 2026; v1 submitted 31 January, 2026; originally announced February 2026.

  35. arXiv:2602.00709  [pdf, ps, other

    cs.AI cs.LG

    Physics-informed Diffusion Generation for Geomagnetic Map Interpolation

    Authors: Wenda Li, Tongya Zheng, Kaixuan Chen, Shunyu Liu, Haoze Jiang, Yunzhi Hao, Rui Miao, Zujie Ren, Mingli Song, Hang Shi, Gang Chen

    Abstract: Geomagnetic map interpolation aims to infer unobserved geomagnetic data at spatial points, yielding critical applications in navigation and resource exploration. However, existing methods for scattered data interpolation are not specifically designed for geomagnetic maps, which inevitably leads to suboptimal performance due to detection noise and the laws of physics. Therefore, we propose a Physic… ▽ More

    Submitted 31 January, 2026; originally announced February 2026.

    Comments: 5 pages, 2 figures, IEEE ICASSP'26

  36. arXiv:2601.19578  [pdf, ps, other

    cs.CL

    Yunque DeepResearch Technical Report

    Authors: Yuxuan Cai, Xinyi Lai, Peng Yuan, Weiting Liu, Huajian Li, Mingda Li, Xinghua Wang, Shengxie Zheng, Yanchao Hao, Yuyang Yin, Zheng Wei

    Abstract: Deep research has emerged as a transformative capability for autonomous agents, empowering Large Language Models to navigate complex, open-ended tasks. However, realizing its full potential is hindered by critical limitations, including escalating contextual noise in long-horizon tasks, fragility leading to cascading errors, and a lack of modular extensibility. To address these challenges, we intr… ▽ More

    Submitted 27 January, 2026; originally announced January 2026.

  37. arXiv:2601.18184  [pdf, ps, other

    cs.SD cs.AI eess.AS

    VIBEVOICE-ASR Technical Report

    Authors: Zhiliang Peng, Jianwei Yu, Yaoyao Chang, Zilong Wang, Li Dong, Yingbo Hao, Yujie Tu, Chenyu Yang, Wenhui Wang, Songchen Xu, Yutao Sun, Hangbo Bao, Weijiang Xu, Yi Zhu, Zehua Wang, Ting Song, Yan Xia, Zewen Chi, Shaohan Huang, Liang Wang, Chuang Ding, Shuai Wang, Xie Chen, Furu Wei

    Abstract: This report presents VibeVoice-ASR, a general-purpose speech understanding framework built upon VibeVoice, designed to address the persistent challenges of context fragmentation and multi-speaker complexity in long-form audio (e.g., meetings, podcasts) that remain despite recent advancements in short-form speech recognition. Unlike traditional pipelined approaches that rely on audio chunking, Vibe… ▽ More

    Submitted 14 March, 2026; v1 submitted 26 January, 2026; originally announced January 2026.

  38. arXiv:2601.10365  [pdf, ps, other

    cs.RO

    FastStair: Learning to Run Up Stairs with Humanoid Robots

    Authors: Yan Liu, Tao Yu, Haolin Song, Hongbo Zhu, Nianzong Hu, Yuzhi Hao, Xiuyong Yao, Xizhe Zang, Hua Chen, Jie Zhao

    Abstract: Running up stairs is effortless for humans but remains extremely challenging for humanoid robots due to the simultaneous requirements of high agility and strict stability. Model-free reinforcement learning (RL) can generate dynamic locomotion, yet implicit stability rewards and heavy reliance on task-specific reward shaping tend to result in unsafe behaviors, especially on stairs; conversely, mode… ▽ More

    Submitted 15 January, 2026; originally announced January 2026.

  39. arXiv:2601.09081  [pdf, ps, other

    cs.DS cs.NI

    A Grouped Sorting Queue Supporting Dynamic Updates for Timer Management in High-Speed Network Interface Cards

    Authors: Zekun Wang, Binghao Yue, Weitao Pan, Jianyi Shi, Yue Hao

    Abstract: With the hardware offloading of network functions, network interface cards (NICs) undertake massive stateful, high-precision, and high-throughput tasks, where timers serve as a critical enabling component. However, existing timer management schemes suffer from heavy software load, low precision, lack of hardware update support, and overflow. This paper proposes two novel operations for priority qu… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

  40. arXiv:2601.08808  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

    Authors: Yao Tang, Li Dong, Yaru Hao, Qingxiu Dong, Furu Wei, Jiatao Gu

    Abstract: Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate token… ▽ More

    Submitted 13 January, 2026; originally announced January 2026.

    Comments: 21 pages. Code available at https://github.com/GMLR-Penn/Multiplex-Thinking

  41. PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception

    Authors: Guotao Li, Shaoyun Xu, Yuexing Hao, Yang Wang, Yuhui Sun

    Abstract: Distributed Multi-Agent Path Finding (MAPF) integrated with Multi-Agent Reinforcement Learning (MARL) has emerged as a prominent research focus, enabling real-time cooperative decision-making in partially observable environments through inter-agent communication. However, due to insufficient collaborative and perceptual capabilities, existing methods are inadequate for scaling across diverse envir… ▽ More

    Submitted 5 January, 2026; originally announced January 2026.

    Comments: 8 pages,7 figures,3 tables,Accepted to IROS 2025

  42. arXiv:2512.22170  [pdf, ps, other

    cs.LG cs.CV

    SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models

    Authors: Jiesong Lian, Ruizhe Zhong, Zixiang Zhou, Xiaoyue Mi, Long Hu, Yuan Zhou, Qinglin Lu, Yixue Hao, Junchi Yan

    Abstract: Post-training alignment of video generation models with human preferences is a critical goal. Developing effective Reward Models (RMs) for this process faces significant methodological hurdles. Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise. Concurrently, the architectural design of VLM-based RMs, particularly their output mechanisms, remai… ▽ More

    Submitted 16 March, 2026; v1 submitted 17 December, 2025; originally announced December 2025.

    Comments: 16 pages, 9 figures

  43. arXiv:2512.15744  [pdf, ps, other

    cs.LG

    How Do Graph Signals Affect Recommendation: Unveiling the Mystery of Low and High-Frequency Graph Signals

    Authors: Feng Liu, Hao Cang, Huanhuan Yuan, Jiaqing Fan, Yongjing Hao, Fuzhen Zhuang, Guanfeng Liu, Pengpeng Zhao

    Abstract: Spectral graph neural networks (GNNs) are highly effective in modeling graph signals, with their success in recommendation often attributed to low-pass filtering. However, recent studies highlight the importance of high-frequency signals. The role of low-frequency and high-frequency graph signals in recommendation remains unclear. This paper aims to bridge this gap by investigating the influence o… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

  44. arXiv:2512.10778  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Building Audio-Visual Digital Twins with Smartphones

    Authors: Zitong Lan, Yiwei Tang, Yuhan Wang, Haowen Lai, Yiduo Hao, Mingmin Zhao

    Abstract: Digital twins today are almost entirely visual, overlooking acoustics-a core component of spatial realism and interaction. We introduce AV-Twin, the first practical system that constructs editable audio-visual digital twins using only commodity smartphones. AV-Twin combines mobile RIR capture and a visual-assisted acoustic field model to efficiently reconstruct room acoustics. It further recovers… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

    Comments: Under Mobisys 2026 review, single blind

  45. arXiv:2512.09200  [pdf, ps, other

    cs.IR

    Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations

    Authors: Liang Luo, Yuxin Chen, Zhengyu Zhang, Mengyue Hang, Andrew Gu, Buyun Zhang, Boyang Liu, Chen Chen, Chengze Fan, Dong Liang, Fan Yang, Feifan Gu, Huayu Li, Jade Nie, Jiayi Xu, Jiyan Yang, Jongsoo Park, Laming Chen, Longhao Jin, Qianru Li, Qin Huang, Shali Jiang, Shiwen Shen, Shuaiwen Wang, Sihan Zeng , et al. (17 additional authors not shown)

    Abstract: The rapidly evolving landscape of products, surfaces, policies, and regulations poses significant challenges for deploying state-of-the-art recommendation models at industry scale, primarily due to data fragmentation across domains and escalating infrastructure costs that hinder sustained quality improvements. To address this challenge, we propose Lattice, a recommendation framework centered aro… ▽ More

    Submitted 14 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

    Comments: Accepted to KDD 2026

  46. arXiv:2512.08987  [pdf, ps, other

    cs.CV cs.AI

    3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization

    Authors: Yuze Hao, Linchao Zhu, Yi Yang

    Abstract: Inverse design aims to design the input variables of a physical system to optimize a specified objective function, typically formulated as a search or optimization problem. However, in 3D domains, the design space grows exponentially, rendering exhaustive grid-based searches infeasible. Recent advances in deep learning have accelerated inverse design by providing powerful generative priors and dif… ▽ More

    Submitted 6 December, 2025; originally announced December 2025.

    Comments: Accepted at NeurIPS 2025

  47. arXiv:2512.08785  [pdf, ps, other

    cs.CV

    LoFA: Learning to Predict Personalized Priors for Fast Adaptation of Visual Generative Models

    Authors: Yiming Hao, Mutian Xu, Chongjie Ye, Jie Qin, Shunlin Lu, Yipeng Qin, Xiaoguang Han

    Abstract: Personalizing visual generative models to meet specific user needs has gained increasing attention, yet current methods like Low-Rank Adaptation (LoRA) remain impractical due to their demand for task-specific data and lengthy optimization. While a few hypernetwork-based approaches attempt to predict adaptation weights directly, they struggle to map fine-grained user prompts to complex LoRA distrib… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

    Comments: Project page: https://jaeger416.github.io/lofa/

  48. arXiv:2512.07137  [pdf, ps, other

    cs.RO cs.MA

    Time-Varying Formation Tracking Control of Wheeled Mobile Robots With Region Constraint: A Generalized Udwadia-Kalaba Framework

    Authors: Yijie Kang, Yuqing Hao, Qingyun Wang, Guanrong Chen

    Abstract: In this article, the time-varying formation tracking control of wheeled mobile robots with region constraint is investigated from a generalized Udwadia-Kalaba framework. The communication network is modeled as a directed and weighted graph that has a spanning tree with the leader being the root. By reformulating the time-varying formation tracking control objective as an equality constrained equat… ▽ More

    Submitted 26 February, 2026; v1 submitted 7 December, 2025; originally announced December 2025.

    Comments: 17 pages,9 figures

  49. arXiv:2512.00427  [pdf

    cs.RO physics.optics

    Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control

    Authors: Mengting Yu, Shuiying Xiang, Changjian Xie, Yonghang Chen, Haowen Zhao, Xingxing Guo, Yahui Zhang, Yanan Han, Yue Hao

    Abstract: Robotic continuous control tasks impose stringent demands on the energy efficiency and latency of computing architectures due to their high-dimensional state spaces and real-time interaction requirements. Conventional electronic computing platforms face computational bottlenecks, whereas the fusion of photonic computing and spiking reinforcement learning (RL) offers a promising alternative. Here,… ▽ More

    Submitted 29 November, 2025; originally announced December 2025.

  50. arXiv:2511.22172  [pdf, ps, other

    cs.CV

    Guiding the Inner Eye: A Framework for Hierarchical and Flexible Visual Grounded Reasoning

    Authors: Zhaoyang Wei, Wenchao Ding, Yanchao Hao, Xi Chen

    Abstract: Models capable of "thinking with images" by dynamically grounding their reasoning in visual evidence represent a major leap in multimodal AI. However, replicating and advancing this ability is non-trivial, with current methods often trapped between the instability of end-to-end reinforcement learning (RL) and the rigidity of supervised fine-tuning (SFT). This leads to models that either struggle t… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

    Comments: 9pages