Skip to main content

Showing 1–50 of 270 results for author: Zhu, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.13119  [pdf, ps, other

    cs.CR

    Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks

    Authors: Keke Tang, Tianyu Hao, Xiaofei Wang, Weilong Peng, Denghui Zhang, Peican Zhu, Zhihong Tian

    Abstract: Most adversarial attacks on point clouds perturb a large number of points, causing widespread geometric changes and limiting applicability in real-world scenarios. While recent works explore sparse attacks by modifying only a few points, such approaches often struggle to maintain effectiveness due to the limited influence of individual perturbations. In this paper, we propose SCP, a sparse and coo… ▽ More

    Submitted 15 December, 2025; originally announced December 2025.

    Comments: Accepted by AAAI'2026 (Oral)

  2. arXiv:2512.10932  [pdf, ps, other

    cs.CV cs.AI

    BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models

    Authors: Shengao Wang, Wenqi Wang, Zecheng Wang, Max Whitton, Michael Wakeham, Arjun Chandra, Joey Huang, Pengyue Zhu, Helen Chen, David Li, Jeffrey Li, Shawn Li, Andrew Zagula, Amy Zhao, Andrew Zhu, Sayaka Nakamura, Yuki Yamamoto, Jerry Jun Yokono, Aaron Mueller, Bryan A. Plummer, Kate Saenko, Venkatesh Saligrama, Boqing Gong

    Abstract: Early children's developmental trajectories set up a natural goal for sample-efficient pretraining of vision foundation models. We introduce BabyVLM-V2, a developmentally grounded framework for infant-inspired vision-language modeling that extensively improves upon BabyVLM-V1 through a longitudinal, multifaceted pretraining set, a versatile model, and, most importantly, DevCV Toolbox for cognitive… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  3. arXiv:2512.10581  [pdf, ps, other

    cs.CV

    Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration

    Authors: Wenlong Jiao, Heyang Lee, Ping Wang, Pengfei Zhu, Qinghua Hu, Dongwei Ren

    Abstract: All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework, yet existing methods increasingly rely on complex architectures (e.g., Mixture-of-Experts, diffusion models) and elaborate degradation prompt strategies. In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  4. arXiv:2512.09629  [pdf, ps, other

    cs.AI cs.LG

    An End-to-end Planning Framework with Agentic LLMs and PDDL

    Authors: Emanuele La Malfa, Ping Zhu, Samuele Marro, Sara Bernardini, Michael Wooldridge

    Abstract: We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domain Definition Language) model, where the domain and problem are iteratively refined by sub-modules (agents) to address common planning requirements, such as time constraints and optimality, as well as ambiguitie… ▽ More

    Submitted 10 December, 2025; originally announced December 2025.

    Comments: Code: https://github.com/EmanueleLM/MultiAgentPlanning

  5. arXiv:2511.18900  [pdf, ps, other

    cs.GR cs.CV

    MatMart: Material Reconstruction of 3D Objects via Diffusion

    Authors: Xiuchao Wu, Pengfei Zhu, Jiangjing Lyu, Xinguo Liu, Jie Guo, Yanwen Guo, Weiwei Xu, Chengfei Lyu

    Abstract: Applying diffusion models to physically-based material estimation and generation has recently gained prominence. In this paper, we propose \ttt, a novel material reconstruction framework for 3D objects, offering the following advantages. First, \ttt\ adopts a two-stage reconstruction, starting with accurate material prediction from inputs and followed by prior-guided material generation for unobse… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

  6. arXiv:2511.12079  [pdf, ps, other

    cs.CV

    Point Cloud Quantization through Multimodal Prompting for 3D Understanding

    Authors: Hongxuan Li, Wencheng Zhu, Huiying Xu, Xinzhong Zhu, Pengfei Zhu

    Abstract: Vector quantization has emerged as a powerful tool in large-scale multimodal models, unifying heterogeneous representations through discrete token encoding. However, its effectiveness hinges on robust codebook design. Current prototype-based approaches relying on trainable vectors or clustered centroids fall short in representativeness and interpretability, even as multimodal alignment demonstrate… ▽ More

    Submitted 19 November, 2025; v1 submitted 15 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026. 11 pages, 7 figures

  7. arXiv:2511.10698  [pdf, ps, other

    cs.CR

    Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges

    Authors: Meixia He, Peican Zhu, Le Cheng, Yangming Guo, Manman Yuan, Keke Tang

    Abstract: Recent studies have demonstrated that hypergraph neural networks (HGNNs) are susceptible to adversarial attacks. However, existing methods rely on the specific information mechanisms of target HGNNs, overlooking the common vulnerability caused by the significant differences in hyperedge pivotality along aggregation paths in most HGNNs, thereby limiting the transferability and effectiveness of atta… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: AAAI 2026, Accept

  8. arXiv:2511.05904  [pdf

    cs.SI physics.chem-ph

    The Role and Mechanism of Deep Statistical Machine Learning In Biological Target Screening and Immune Microenvironment Regulation of Asthma

    Authors: Pengwei Zhu

    Abstract: As an important source of small molecule drugs, natural products show remarkable biological activities with their rich types and unique structures. However, due to the limited number of samples and structural complexity, the rapid discovery of lead compounds is limited. Therefore, in this study, natural inhibitors of phosphodiesterase 4 (PDE4) and Phosphodiesterase 7 (PDE7) were screened by combin… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

  9. arXiv:2511.03245  [pdf, ps, other

    cs.CV

    Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning

    Authors: Liwei Luo, Shuaitengyuan Li, Dongwei Ren, Qilong Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Recently, remarkable progress has been made in large-scale pre-trained model tuning, and inference efficiency is becoming more crucial for practical deployment. Early exiting in conjunction with multi-stage predictors, when cooperated with a parameter-efficient fine-tuning strategy, offers a straightforward way to achieve an inference-efficient model. However, a key challenge remains unresolved: H… ▽ More

    Submitted 5 November, 2025; originally announced November 2025.

    Comments: Accepted by ICCV2025

  10. arXiv:2510.26149  [pdf, ps, other

    cs.CV

    BasicAVSR: Arbitrary-Scale Video Super-Resolution via Image Priors and Enhanced Motion Compensation

    Authors: Wei Shang, Wanying Zhang, Shuhang Gu, Pengfei Zhu, Qinghua Hu, Dongwei Ren

    Abstract: Arbitrary-scale video super-resolution (AVSR) aims to enhance the resolution of video frames, potentially at various scaling factors, which presents several challenges regarding spatial detail reproduction, temporal consistency, and computational complexity. In this paper, we propose a strong baseline BasicAVSR for AVSR by integrating four key components: 1) adaptive multi-scale frequency priors g… ▽ More

    Submitted 6 November, 2025; v1 submitted 30 October, 2025; originally announced October 2025.

    Comments: 13 pages, 10 figures, 5 tables

    ACM Class: I.4.3

  11. arXiv:2510.20877  [pdf, ps, other

    cs.LG cs.AI

    Multimodal Negative Learning

    Authors: Baoquan Gong, Xiyuan Gao, Pengfei Zhu, Qinghua Hu, Bing Cao

    Abstract: Multimodal learning systems often encounter challenges related to modality imbalance, where a dominant modality may overshadow others, thereby hindering the learning of weak modalities. Conventional approaches often force weak modalities to align with dominant ones in "Learning to be (the same)" (Positive Learning), which risks suppressing the unique information inherent in the weak modalities. To… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

    Comments: Published in NeurIPS 2025

  12. arXiv:2510.14049  [pdf, ps, other

    cs.LG cs.MS

    CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations

    Authors: Guangyi Chen, Yunlong Deng, Peiyuan Zhu, Yan Li, Yifan Shen, Zijian Li, Kun Zhang

    Abstract: Causal Representation Learning (CRL) aims to uncover the data-generating process and identify the underlying causal variables and relations, whose evaluation remains inherently challenging due to the requirement of known ground-truth causal variables and causal structure. Existing evaluations often rely on either simplistic synthetic datasets or downstream performance on real-world tasks, generall… ▽ More

    Submitted 17 October, 2025; v1 submitted 15 October, 2025; originally announced October 2025.

  13. arXiv:2510.11246  [pdf, ps, other

    cs.CR

    Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems

    Authors: Pengyu Zhu, Lijun Li, Yaxing Lyu, Li Sun, Sen Su, Jing Shao

    Abstract: LLM-based multi-agent systems (MAS) demonstrate increasing integration into next-generation applications, but their safety in backdoor attacks remains largely underexplored. However, existing research has focused exclusively on single-agent backdoor attacks, overlooking the novel attack surfaces introduced by agent collaboration in MAS. To bridge this gap, we present the first Distributed Backdoor… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

  14. arXiv:2510.09071  [pdf

    cs.CV cs.RO

    Visual Anomaly Detection for Reliable Robotic Implantation of Flexible Microelectrode Array

    Authors: Yitong Chen, Xinyao Xu, Ping Zhu, Xinyong Han, Fangbo Qin, Shan Yu

    Abstract: Flexible microelectrode (FME) implantation into brain cortex is challenging due to the deformable fiber-like structure of FME probe and the interaction with critical bio-tissue. To ensure reliability and safety, the implantation process should be monitored carefully. This paper develops an image-based anomaly detection framework based on the microscopic cameras of the robotic FME implantation syst… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

    Comments: Accept by IROS 2025

  15. arXiv:2510.08392  [pdf, ps, other

    eess.AS cs.SD

    MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

    Authors: Guobin Ma, Jixun Yao, Ziqian Ning, Yuepeng Jiang, Lingxin Xiong, Lei Xie, Pengcheng Zhu

    Abstract: Zero-shot voice conversion (VC) aims to transfer timbre from a source speaker to any unseen target speaker while preserving linguistic content. Growing application scenarios demand models with streaming inference capabilities. This has created a pressing need for models that are simultaneously fast, lightweight, and high-fidelity. However, existing streaming methods typically rely on either autore… ▽ More

    Submitted 21 December, 2025; v1 submitted 9 October, 2025; originally announced October 2025.

  16. arXiv:2510.04585  [pdf, ps, other

    cs.RO

    Everything-Grasping (EG) Gripper: A Universal Gripper with Synergistic Suction-Grasping Capabilities for Cross-Scale and Cross-State Manipulation

    Authors: Jianshu Zhou, Jing Shu, Tianle Pan, Puchen Zhu, Jiajun An, Huayu Zhang, Junda Huang, Upinder Kaur, Xin Ma, Masayoshi Tomizuka

    Abstract: Grasping objects across vastly different sizes and physical states-including both solids and liquids-with a single robotic gripper remains a fundamental challenge in soft robotics. We present the Everything-Grasping (EG) Gripper, a soft end-effector that synergistically integrates distributed surface suction with internal granular jamming, enabling cross-scale and cross-state manipulation without… ▽ More

    Submitted 6 October, 2025; originally announced October 2025.

    Comments: 19 pages, 10 figures, journal

  17. arXiv:2510.03744  [pdf, ps, other

    cs.LG cs.AI cs.DC cs.NE physics.geo-ph

    HydroFusion-LMF: Semi-Supervised Multi-Network Fusion with Large-Model Adaptation for Long-Term Daily Runoff Forecasting

    Authors: Qianfei Fan, Jiayu Wei, Peijun Zhu, Wensheng Ye, Meie Fang

    Abstract: Accurate decade-scale daily runoff forecasting in small watersheds is difficult because signals blend drifting trends, multi-scale seasonal cycles, regime shifts, and sparse extremes. Prior deep models (DLinear, TimesNet, PatchTST, TiDE, Nonstationary Transformer, LSTNet, LSTM) usually target single facets and under-utilize unlabeled spans, limiting regime adaptivity. We propose HydroFusion-LMF, a… ▽ More

    Submitted 4 October, 2025; originally announced October 2025.

    Comments: V1

  18. arXiv:2510.02345  [pdf, ps, other

    cs.CL cs.AI cs.DC cs.LG cs.NE

    Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

    Authors: Peijun Zhu, Ning Yang, Jiayu Wei, Jinghang Wu, Haijun Zhang

    Abstract: Mixture-of-Experts (MoE) Large Language Models (LLMs) face a trilemma of load imbalance, parameter redundancy, and communication overhead. We introduce a unified framework based on dynamic expert clustering and structured compression to address these issues cohesively. Our method employs an online clustering procedure that periodically regroups experts using a fused metric of parameter and activat… ▽ More

    Submitted 27 September, 2025; originally announced October 2025.

    Comments: 12 pages, 2 figures, 3 tables. Under review as a conference paper at ICLR 2026

  19. arXiv:2509.26574  [pdf, ps, other

    cs.AI cond-mat.other cs.CL hep-th quant-ph

    Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

    Authors: Minhui Zhu, Minyang Tian, Xiaocheng Yang, Tianci Zhou, Lifan Yuan, Penghao Zhu, Eli Chertkov, Shengyan Liu, Yufeng Du, Ziming Ji, Indranil Das, Junyi Cao, Yufeng Du, Jiabin Yu, Peixue Wu, Jinchen He, Yifan Su, Yikun Jiang, Yujie Zhang, Chang Liu, Ze-Min Huang, Weizhen Jia, Yunkai Wang, Farshid Jafarpour, Yong Zhao , et al. (39 additional authors not shown)

    Abstract: While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integr… ▽ More

    Submitted 20 November, 2025; v1 submitted 30 September, 2025; originally announced September 2025.

    Comments: 39 pages, 6 figures, 6 tables

  20. arXiv:2509.14579  [pdf, ps, other

    cs.SD

    Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis

    Authors: Qingyu Liu, Yushen Chen, Zhikang Niu, Chunhui Wang, Yunting Yang, Bowen Zhang, Jian Zhao, Pengcheng Zhu, Kai Yu, Xie Chen

    Abstract: Flow-matching-based text-to-speech (TTS) models have shown high-quality speech synthesis. However, most current flow-matching-based TTS models still rely on reference transcripts corresponding to the audio prompt for synthesis. This dependency prevents cross-lingual voice cloning when audio prompt transcripts are unavailable, particularly for unseen languages. The key challenges for flow-matching-… ▽ More

    Submitted 20 September, 2025; v1 submitted 17 September, 2025; originally announced September 2025.

    Comments: 5 pages, 2 figures

  21. arXiv:2508.19789  [pdf, ps, other

    cs.CV

    StableIntrinsic: Detail-preserving One-step Diffusion Model for Multi-view Material Estimation

    Authors: Xiuchao Wu, Pengfei Zhu, Jiangjing Lyu, Xinguo Liu, Jie Guo, Yanwen Guo, Weiwei Xu, Chengfei Lyu

    Abstract: Recovering material information from images has been extensively studied in computer graphics and vision. Recent works in material estimation leverage diffusion model showing promising results. However, these diffusion-based methods adopt a multi-step denoising strategy, which is time-consuming for each estimation. Such stochastic inference also conflicts with the deterministic material estimation… ▽ More

    Submitted 27 August, 2025; originally announced August 2025.

  22. arXiv:2508.17615  [pdf, ps, other

    cs.IT

    Average Achievable Rate Analysis of Cell-Free Massive MIMO in the Finite Blocklength Regime with Imperfect CSI

    Authors: Kai Chen, Feng Ye, Jiamin Li, Pengcheng Zhu, Dongming Wang, Xiaohu You

    Abstract: Acquiring perfect channel state information (CSI) introduces substantial challenges in cell-free massive MIMO (CF-mMIMO) systems, primarily due to the large dimensionality of channel parameters, especially under ultra-reliable low-latency communication (uRLLC) constraints. Furthermore, the impact of imperfect CSI on the average achievable rate within the finite blocklength regime remains largely u… ▽ More

    Submitted 25 August, 2025; v1 submitted 24 August, 2025; originally announced August 2025.

  23. arXiv:2508.07225  [pdf, ps, other

    eess.IV cs.CV q-bio.QM

    HaDM-ST: Histology-Assisted Differential Modeling for Spatial Transcriptomics Generation

    Authors: Xuepeng Liu, Zheng Jiang, Pinan Zhu, Hanyu Liu, Chao Li

    Abstract: Spatial transcriptomics (ST) reveals spatial heterogeneity of gene expression, yet its resolution is limited by current platforms. Recent methods enhance resolution via H&E-stained histology, but three major challenges persist: (1) isolating expression-relevant features from visually complex H&E images; (2) achieving spatially precise multimodal alignment in diffusion-based frameworks; and (3) mod… ▽ More

    Submitted 10 August, 2025; originally announced August 2025.

    Comments: 10 pages, 5 figures, includes comparisons with TESLA, HiStoGene, and iStar; submitted to arXiv 2025

    MSC Class: 92C40; 68T07 ACM Class: I.2.10; I.4.8

  24. arXiv:2507.23508  [pdf, ps, other

    cs.CV

    Hyperbolic Cycle Alignment for Infrared-Visible Image Fusion

    Authors: Timing Li, Bing Cao, Jiahe Feng, Haifang Cao, Qinghau Hu, Pengfei Zhu

    Abstract: Image fusion synthesizes complementary information from multiple sources, mitigating the inherent limitations of unimodal imaging systems. Accurate image registration is essential for effective multi-source data fusion. However, existing registration methods, often based on image translation in Euclidean space, fail to handle cross-modal misalignment effectively, resulting in suboptimal alignment… ▽ More

    Submitted 31 July, 2025; originally announced July 2025.

  25. arXiv:2507.22336  [pdf, ps, other

    eess.IV cs.CV

    A Segmentation Framework for Accurate Diagnosis of Amyloid Positivity without Structural Images

    Authors: Penghan Zhu, Shurui Mei, Shushan Chen, Xiaobo Chu, Shanbo He, Ziyi Liu

    Abstract: This study proposes a deep learning-based framework for automated segmentation of brain regions and classification of amyloid positivity using positron emission tomography (PET) images alone, without the need for structural MRI or CT. A 3D U-Net architecture with four layers of depth was trained and validated on a dataset of 200 F18-florbetapir amyloid-PET scans, with an 130/20/50 train/validation… ▽ More

    Submitted 29 July, 2025; originally announced July 2025.

  26. arXiv:2507.18870  [pdf, ps, other

    cs.CV

    Transferable and Undefendable Point Cloud Attacks via Medial Axis Transform

    Authors: Keke Tang, Yuze Gao, Weilong Peng, Xiaofei Wang, Meie Fang, Peican Zhu

    Abstract: Studying adversarial attacks on point clouds is essential for evaluating and improving the robustness of 3D deep learning models. However, most existing attack methods are developed under ideal white-box settings and often suffer from limited transferability to unseen models and insufficient robustness against common defense mechanisms. In this paper, we propose MAT-Adv, a novel adversarial attack… ▽ More

    Submitted 24 July, 2025; originally announced July 2025.

  27. arXiv:2507.18576  [pdf, ps, other

    cs.AI cs.CL cs.CV

    SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

    Authors: Shanghai AI Lab, :, Yicheng Bao, Guanxu Chen, Mingkang Chen, Yunhao Chen, Chiyu Chen, Lingjie Chen, Sirui Chen, Xinquan Chen, Jie Cheng, Yu Cheng, Dengke Deng, Yizhuo Ding, Dan Ding, Xiaoshan Ding, Yi Ding, Zhichen Dong, Lingxiao Du, Yuyu Fan, Xinshun Feng, Yanwei Fu, Yuxuan Gao, Ruijun Ge, Tianle Gu , et al. (93 additional authors not shown)

    Abstract: We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training, supported by a suite of multi-principled verifiers. Unlike previous alignment methods such as RLHF that simply learn… ▽ More

    Submitted 7 August, 2025; v1 submitted 24 July, 2025; originally announced July 2025.

    Comments: 47 pages, 18 figures, authors are listed in alphabetical order by their last names; v3 modifies minor issues

  28. arXiv:2507.13415  [pdf, ps, other

    cs.MM cs.AI

    SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection

    Authors: Peican Zhu, Yubo Jing, Le Cheng, Bin Chen, Xiaodong Cui, Lianwei Wu, Keke Tang

    Abstract: Previous studies on multimodal fake news detection mainly focus on the alignment and integration of cross-modal features, as well as the application of text-image consistency. However, they overlook the semantic enhancement effects of large multimodal models and pay little attention to the emotional features of news. In addition, people find that fake news is more inclined to contain negative emot… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: Accepted by SMC 2025

  29. arXiv:2507.12204  [pdf, ps, other

    cs.HC

    Tao-Technology for Teen Mobile Use: Harmonizing Adaptation, Autonomy, and Reflection

    Authors: Pengyu Zhu, Janghee Cho

    Abstract: Adolescents' mobile technology use is often regulated through rigid control mechanisms that fail to account for their autonomy and natural usage patterns. Drawing on Taoist philosophy, particularly Wu Wei, Yin-Yang, and Zi Ran, this position paper proposes Tao-Technology, a self-organizing, adaptive regulatory framework. Integrating insights from Reflective Informatics and Information Ecologies, w… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

  30. arXiv:2507.09882  [pdf, ps, other

    cs.LG

    AdaBrain-Bench: Benchmarking Brain Foundation Models for Brain-Computer Interface Applications

    Authors: Jiamin Wu, Zichen Ren, Junyu Wang, Pengyu Zhu, Yonghao Song, Mianxin Liu, Qihao Zheng, Lei Bai, Wanli Ouyang, Chunfeng Song

    Abstract: Non-invasive Brain-Computer Interfaces (BCI) offer a safe and accessible means of connecting the human brain to external devices, with broad applications in home and clinical settings to enhance human capabilities. However, the high noise level and limited task-specific data in non-invasive signals constrain decoding capabilities. Recently, the adoption of self-supervised pre-training is transform… ▽ More

    Submitted 5 August, 2025; v1 submitted 13 July, 2025; originally announced July 2025.

  31. arXiv:2507.09647  [pdf, ps, other

    cs.MM cs.AI

    KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection

    Authors: Peican Zhu, Yubo Jing, Le Cheng, Keke Tang, Yangming Guo

    Abstract: In recent years, the rampant spread of misinformation on social media has made accurate detection of multimodal fake news a critical research focus. However, previous research has not adequately understood the semantics of images, and models struggle to discern news authenticity with limited textual information. Meanwhile, treating all emotional types of news uniformly without tailored approaches… ▽ More

    Submitted 17 July, 2025; v1 submitted 13 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM MM 2025

  32. arXiv:2507.07708  [pdf, ps, other

    cs.CV

    Motion-Aware Adaptive Pixel Pruning for Efficient Local Motion Deblurring

    Authors: Wei Shang, Dongwei Ren, Wanying Zhang, Pengfei Zhu, Qinghua Hu, Wangmeng Zuo

    Abstract: Local motion blur in digital images originates from the relative motion between dynamic objects and static imaging systems during exposure. Existing deblurring methods face significant challenges in addressing this problem due to their inefficient allocation of computational resources and inadequate handling of spatially varying blur patterns. To overcome these limitations, we first propose a trai… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted by ACMMM 2025

    ACM Class: I.4.3

  33. arXiv:2507.05248  [pdf, ps, other

    cs.CL

    Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models

    Authors: Ziqi Miao, Lijun Li, Yuan Xiong, Zhenhua Liu, Pengyu Zhu, Jing Shao

    Abstract: Contextual priming, where earlier stimuli covertly bias later judgments, offers an unexplored attack surface for large language models (LLMs). We uncover a contextual priming vulnerability in which the previous response in the dialogue can steer its subsequent behavior toward policy-violating content. While existing jailbreak attacks largely rely on single-turn or multi-turn prompt manipulations,… ▽ More

    Submitted 21 November, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

    Comments: 20 pages, 10 figures. Code and data available at https://github.com/Dtc7w3PQ/Response-Attack

  34. arXiv:2507.00690  [pdf, ps, other

    cs.CV cs.CR

    Cage-Based Deformation for Transferable and Undefendable Point Cloud Attack

    Authors: Keke Tang, Ziyong Du, Weilong Peng, Xiaofei Wang, Peican Zhu, Ligang Liu, Zhihong Tian

    Abstract: Adversarial attacks on point clouds often impose strict geometric constraints to preserve plausibility; however, such constraints inherently limit transferability and undefendability. While deformation offers an alternative, existing unstructured approaches may introduce unnatural distortions, making adversarial point clouds conspicuous and undermining their plausibility. In this paper, we propose… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  35. arXiv:2506.18962  [pdf, ps, other

    cs.HC

    UniMind: Unleashing the Power of LLMs for Unified Multi-Task Brain Decoding

    Authors: Weiheng Lu, Chunfeng Song, Jiamin Wu, Pengyu Zhu, Yuchen Zhou, Weijian Mai, Qihao Zheng, Wanli Ouyang

    Abstract: Decoding human brain activity from electroencephalography (EEG) signals is a central challenge at the intersection of neuroscience and artificial intelligence, enabling diverse applications in mental state assessment, clinical monitoring, and human-machine interaction. Recent efforts have extensively explored EEG-based brain foundation models for generalized brain decoding, employing large-scale t… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 19pages,4 figures

  36. arXiv:2506.12796  [pdf, ps, other

    cs.CL

    Surprise Calibration for Better In-Context Learning

    Authors: Zhihang Tan, Jingrui Hou, Ping Wang, Qibiao Hu, Peng Zhu

    Abstract: In-context learning (ICL) has emerged as a powerful paradigm for task adaptation in large language models (LLMs), where models infer underlying task structures from a few demonstrations. However, ICL remains susceptible to biases that arise from prior knowledge and contextual demonstrations, which can degrade the performance of LLMs. Existing bias calibration methods typically apply fixed class pr… ▽ More

    Submitted 17 June, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

    Comments: 16 pages, 11 figures

    MSC Class: I.2.7

  37. arXiv:2506.12708  [pdf, ps, other

    cs.DC cs.AI cs.AR cs.LG

    Serving Large Language Models on Huawei CloudMatrix384

    Authors: Pengfei Zuo, Huimin Lin, Junbo Deng, Nan Zou, Xingkun Yang, Yingyu Diao, Weifeng Gao, Ke Xu, Zhangyu Chen, Shirui Lu, Zhao Qiu, Peiyang Li, Xianyu Chang, Zhengzhong Yu, Fangzheng Miao, Jia Zheng, Ying Li, Yuan Feng, Bei Wang, Zaijian Zong, Mosong Zhou, Wenli Zhou, Houjiang Chen, Xingyu Liao, Yipeng Li , et al. (21 additional authors not shown)

    Abstract: The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-leve… ▽ More

    Submitted 19 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

    Comments: 59 pages, 24 figures

  38. arXiv:2506.09113  [pdf, ps, other

    cs.CV

    Seedance 1.0: Exploring the Boundaries of Video Generation Models

    Authors: Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, Xunsong Li, Yifu Li, Shanchuan Lin, Zhijie Lin, Jiawei Liu, Shu Liu, Xiaonan Nie, Zhiwu Qing, Yuxi Ren, Li Sun, Zhi Tian, Rui Wang, Sen Wang, Guoqiang Wei, Guohong Wu , et al. (19 additional authors not shown)

    Abstract: Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core tec… ▽ More

    Submitted 28 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: Seedance 1.0 Technical Report

  39. arXiv:2506.06818  [pdf, ps, other

    cs.CV

    Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation

    Authors: Chao Yin, Hao Li, Kequan Yang, Jide Li, Pinpin Zhu, Xiaoqiang Li

    Abstract: While promptable segmentation (\textit{e.g.}, SAM) has shown promise for various segmentation tasks, it still requires manual visual prompts for each object to be segmented. In contrast, task-generic promptable segmentation aims to reduce the need for such detailed prompts by employing only a task-generic prompt to guide segmentation across all test samples. However, when applied to Camouflaged Ob… ▽ More

    Submitted 14 August, 2025; v1 submitted 7 June, 2025; originally announced June 2025.

    Comments: accepted by ACM MM2025

  40. arXiv:2505.22995  [pdf, ps, other

    eess.AS cs.SD

    LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting

    Authors: Pai Zhu, Quan Wang, Dhruuv Agarwal, Kurt Partridge

    Abstract: Custom keyword spotting (KWS) allows detecting user-defined spoken keywords from streaming audio. This is achieved by comparing the embeddings from voice enrollments and input audio. State-of-the-art custom KWS models are typically trained contrastively using utterances whose keywords are randomly sampled from training dataset. These KWS models often struggle with confusing keywords, such as "blue… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  41. arXiv:2505.20511  [pdf, ps, other

    cs.CL

    Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects

    Authors: Chengyan Wu, Yiqiang Cai, Yang Liu, Pengxu Zhu, Yun Xue, Ziwei Gong, Julia Hirschberg, Bolei Ma

    Abstract: While text-based emotion recognition methods have achieved notable success, real-world dialogue systems often demand a more nuanced emotional understanding than any single modality can offer. Multimodal Emotion Recognition in Conversations (MERC) has thus emerged as a crucial direction for enhancing the naturalness and emotional understanding of human-computer interaction. Its goal is to accuratel… ▽ More

    Submitted 9 September, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

    Comments: EMNLP 2025 Findings

  42. arXiv:2505.14814  [pdf, ps, other

    cs.SD cs.CL eess.AS

    GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples

    Authors: Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang

    Abstract: Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversa… ▽ More

    Submitted 24 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  43. arXiv:2505.14085  [pdf, ps, other

    cs.NI

    CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration

    Authors: Pengyan Zhu, Tingting Yang

    Abstract: Emerging intelligent service scenarios in 6G communication impose stringent requirements for low latency, high reliability, and privacy preservation. Generative large language models (LLMs) are gradually becoming key enablers for the integration of semantic communication and computation. However, due to the limited computational resources of edge devices and the increasing complexity of heterogene… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 14 pages, 7 figures including subplots

  44. arXiv:2505.13990  [pdf, ps, other

    cs.CL

    DecIF: Improving Instruction-Following through Meta-Decomposition

    Authors: Tingfeng Hui, Pengyu Zhu, Bowen Ping, Ling Tang, Guanting Dong, Yaqi Zhang, Sen Su

    Abstract: Instruction-following has emerged as a crucial capability for large language models (LLMs). However, existing approaches often rely on pre-existing documents or external resources to synthesize instruction-following data, which limits their flexibility and generalizability. In this paper, we introduce DecIF, a fully autonomous, meta-decomposition guided framework that generates diverse and high-qu… ▽ More

    Submitted 10 June, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: We release the source code and SFT data in this version

  45. arXiv:2505.12910  [pdf, ps, other

    cs.SI cs.AI

    SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs

    Authors: Le Cheng, Peican Zhu, Yangming Guo, Chao Gao, Zhen Wang, Keke Tang

    Abstract: Source detection on graphs has demonstrated high efficacy in identifying rumor origins. Despite advances in machine learning-based methods, many fail to capture intrinsic dynamics of rumor propagation. In this work, we present SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs, which harnesses the recent success of the state space model Mamba, known for… ▽ More

    Submitted 4 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI25

  46. arXiv:2505.12894  [pdf, ps, other

    cs.SI cs.AI

    HyperDet: Source Detection in Hypergraphs via Interactive Relationship Construction and Feature-rich Attention Fusion

    Authors: Le Cheng, Peican Zhu, Yangming Guo, Keke Tang, Chao Gao, Zhen Wang

    Abstract: Hypergraphs offer superior modeling capabilities for social networks, particularly in capturing group phenomena that extend beyond pairwise interactions in rumor propagation. Existing approaches in rumor source detection predominantly focus on dyadic interactions, which inadequately address the complexity of more intricate relational structures. In this study, we present a novel approach for Sourc… ▽ More

    Submitted 4 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI25

  47. arXiv:2505.09323  [pdf, ps, other

    eess.IV cs.CV

    Q-space Guided Collaborative Attention Translation Network for Flexible Diffusion-Weighted Images Synthesis

    Authors: Pengli Zhu, Yingji Fu, Nanguang Chen, Anqi Qiu

    Abstract: This study, we propose a novel Q-space Guided Collaborative Attention Translation Networks (Q-CATN) for multi-shell, high-angular resolution DWI (MS-HARDI) synthesis from flexible q-space sampling, leveraging the commonly acquired structural MRI data. Q-CATN employs a collaborative attention mechanism to effectively extract complementary information from multiple modalities and dynamically adjust… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: MICCAI 2025

  48. arXiv:2505.08822  [pdf, other

    cs.CY cs.LG physics.soc-ph

    The Geography of Transportation Cybersecurity: Visitor Flows, Industry Clusters, and Spatial Dynamics

    Authors: Yuhao Wang, Kailai Wang, Songhua Hu, Yunpeng, Zhang, Gino Lim, Pengyu Zhu

    Abstract: The rapid evolution of the transportation cybersecurity ecosystem, encompassing cybersecurity, automotive, and transportation and logistics sectors, will lead to the formation of distinct spatial clusters and visitor flow patterns across the US. This study examines the spatiotemporal dynamics of visitor flows, analyzing how socioeconomic factors shape industry clustering and workforce distribution… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  49. arXiv:2505.06975  [pdf, ps, other

    cs.CV

    High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution

    Authors: Wei Shang, Dongwei Ren, Wanying Zhang, Pengfei Zhu, Qinghua Hu, Wangmeng Zuo

    Abstract: The primary challenge in accelerating image super-resolution lies in reducing computation while maintaining performance and adaptability. Motivated by the observation that high-frequency regions (e.g., edges and textures) are most critical for reconstruction, we propose a training-free adaptive masking module for acceleration that dynamically focuses computation on these challenging areas. Specifi… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 10 pages, 6 figures, 5 tables

    ACM Class: I.4.3

  50. arXiv:2505.06920  [pdf, ps, other

    cs.CV

    Bi-directional Self-Registration for Misaligned Infrared-Visible Image Fusion

    Authors: Timing Li, Bing Cao, Pengfei Zhu, Bin Xiao, Qinghua Hu

    Abstract: Acquiring accurately aligned multi-modal image pairs is fundamental for achieving high-quality multi-modal image fusion. To address the lack of ground truth in current multi-modal image registration and fusion methods, we propose a novel self-supervised \textbf{B}i-directional \textbf{S}elf-\textbf{R}egistration framework (\textbf{B-SR}). Specifically, B-SR utilizes a proxy data generator (PDG) an… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.