Skip to main content

Showing 1–50 of 83 results for author: Pei, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.28458  [pdf, ps, other

    cs.LG cs.AI

    HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

    Authors: Yufei Xu, Fanxu Meng, Fan Jiang, Yuxuan Wang, Ruijie Zhou, Zhaohui Wang, Jiexi Wu, Zhixin Pan, Xiaojuan Tang, Wenjie Pei, Tongxuan Liu, Di Yin, Xing Sun, Muhan Zhang

    Abstract: Token-level sparse attention mechanisms, exemplified by DeepSeek Sparse Attention (DSA), achieve fine-grained key selection by scoring every historical key for each query through a lightweight indexer, then computing attention only on the selected subset. While the downstream sparse attention itself scales favorably, the indexer must still scan the entire prefix for every query, introducing an per… ▽ More

    Submitted 6 April, 2026; v1 submitted 30 March, 2026; originally announced March 2026.

  2. arXiv:2603.10990  [pdf, ps, other

    cs.CV

    Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

    Authors: Zhengyao Fang, Zexi Jia, Yijia Zhong, Pengcheng Luo, Jinchao Zhang, Guangming Lu, Jun Yu, Wenjie Pei

    Abstract: Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is partly due to biases in existing evaluation paradigms: human ratings and preference-trained metrics often favor visually vivid images with exaggerated saturation and contrast, which make generations often… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

    Comments: accepted by CVPR2026

  3. arXiv:2603.09480  [pdf, ps, other

    cs.CV

    Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity

    Authors: Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Guangming Lu, Jun Yu, Wenjie Pei

    Abstract: Vision-language models (VLMs) face significant computational inefficiencies caused by excessive generation of visual tokens. While prior work shows that a large fraction of visual tokens are redundant, existing compression methods struggle to balance importance preservation and information diversity. To address this, we propose PruneSID, a training-free Synergistic Importance-Diversity approach fe… ▽ More

    Submitted 11 March, 2026; v1 submitted 10 March, 2026; originally announced March 2026.

    Comments: accepted by ICLR2026

  4. arXiv:2603.00413  [pdf, ps, other

    cs.CV cs.GR

    DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects

    Authors: Changpu Li, Shuang Wu, Songlin Tang, Guangming Lu, Jun Yu, Wenjie Pei

    Abstract: Reconstructing transparent objects from a set of multi-view images is a challenging task due to the complicated nature and indeterminate behavior of light propagation. Typical methods are primarily tailored to specific scenarios, such as objects following a uniform topology, exhibiting ideal transparency and surface specular reflections, or with only surface materials, which substantially constrai… ▽ More

    Submitted 27 February, 2026; originally announced March 2026.

  5. arXiv:2601.22716  [pdf, ps, other

    cs.LG cs.AI

    Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation

    Authors: Pingzhi Tang, Ruijie Zhou, Fanxu Meng, Wenjie Pei, Muhan Zhang

    Abstract: Current quantization methods for LLMs predominantly rely on block-wise structures to maintain efficiency, often at the cost of representational flexibility. In this work, we demonstrate that element-wise quantization can be made as efficient as block-wise scaling while providing strictly superior expressive power by modeling the scaling manifold as continuous low-rank matrices ($S = BA$). We propo… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

  6. arXiv:2601.01150  [pdf, ps, other

    cs.LG cs.NE

    Evo-TFS: Evolutionary Time-Frequency Domain-Based Synthetic Minority Oversampling Approach to Imbalanced Time Series Classification

    Authors: Wenbin Pei, Ruohao Dai, Bing Xue, Mengjie Zhang, Qiang Zhang, Yiu-Ming Cheung

    Abstract: Time series classification is a fundamental machine learning task with broad real-world applications. Although many deep learning methods have proven effective in learning time-series data for classification, they were originally developed under the assumption of balanced data distributions. Once data distribution is uneven, these methods tend to ignore the minority class that is typically of high… ▽ More

    Submitted 3 January, 2026; originally announced January 2026.

  7. arXiv:2512.22014  [pdf, ps, other

    cs.LG

    HWL-HIN: A Hypergraph-Level Hypergraph Isomorphism Network as Powerful as the Hypergraph Weisfeiler-Lehman Test with Application to Higher-Order Network Robustness

    Authors: Chengyu Tian, Wenbin Pei

    Abstract: Robustness in complex systems is of significant engineering and economic importance. However, conventional attack-based a posteriori robustness assessments incur prohibitive computational overhead. Recently, deep learning methods, such as Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs), have been widely employed as surrogates for rapid robustness prediction. Nevertheless, the… ▽ More

    Submitted 26 December, 2025; originally announced December 2025.

  8. arXiv:2511.13576  [pdf, ps, other

    cs.CR cs.HC

    Exploring the Effectiveness of Google Play Store's Privacy Transparency Channels

    Authors: Anhao Xiang, Weiping Pei, Chuan Yue

    Abstract: With the requirements and emphases on privacy transparency placed by regulations such as GDPR and CCPA, the Google Play Store requires Android developers to more responsibly communicate their apps' privacy practices to potential users by providing the proper information via the data safety, privacy policy, and permission manifest privacy transparency channels. However, it is unclear how effective… ▽ More

    Submitted 17 November, 2025; originally announced November 2025.

  9. arXiv:2510.24594  [pdf, ps, other

    cs.HC

    Detecting the Use of Generative AI in Crowdsourced Surveys: Implications for Data Integrity

    Authors: Dapeng Zhang, Marina Katoh, Weiping Pei

    Abstract: The widespread adoption of generative AI (GenAI) has introduced new challenges in crowdsourced data collection, particularly in survey-based research. While GenAI offers powerful capabilities, its unintended use in crowdsourcing, such as generating automated survey responses, threatens the integrity of empirical research and complicates efforts to understand public opinion and behavior. In this st… ▽ More

    Submitted 29 October, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

    Comments: Accepted by CSCW 2025 workshop Beyond Information: Online Participatory Culture and Information Disorder

  10. arXiv:2510.20217  [pdf, ps, other

    cs.CV

    EditInfinity: Image Editing with Binary-Quantized Generative Models

    Authors: Jiahuan Wang, Yuxin Chen, Jun Yu, Guangming Lu, Wenjie Pei

    Abstract: Adapting pretrained diffusion-based generative models for text-driven image editing with negligible tuning overhead has demonstrated remarkable potential. A classical adaptation paradigm, as followed by these methods, first infers the generative trajectory inversely for a given source image by image inversion, then performs image editing along the inferred trajectory guided by the target text prom… ▽ More

    Submitted 7 November, 2025; v1 submitted 23 October, 2025; originally announced October 2025.

    Comments: 28 pages, 13 figures, accepted by The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)

  11. arXiv:2507.17242  [pdf

    cs.HC eess.SP q-bio.NC

    High-Density EEG Enables the Fastest Visual Brain-Computer Interfaces

    Authors: Gege Ming, Weihua Pei, Sen Tian, Xiaogang Chen, Xiaorong Gao, Yijun Wang

    Abstract: Brain-computer interface (BCI) technology establishes a direct communication pathway between the brain and external devices. Current visual BCI systems suffer from insufficient information transfer rates (ITRs) for practical use. Spatial information, a critical component of visual perception, remains underexploited in existing systems because the limited spatial resolution of recording methods hin… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  12. arXiv:2507.16310  [pdf, ps, other

    cs.CV

    MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation

    Authors: Yanchen Liu, Yanan Sun, Zhening Xing, Junyao Gao, Kai Chen, Wenjie Pei

    Abstract: Existing text-to-video methods struggle to transfer motion smoothly from a reference object to a target object with significant differences in appearance or structure between them. To address this challenge, we introduce MotionShot, a training-free framework capable of parsing reference-target correspondences in a fine-grained manner, thereby achieving high-fidelity motion transfer while preservin… ▽ More

    Submitted 22 July, 2025; originally announced July 2025.

  13. arXiv:2506.23120  [pdf, ps, other

    cs.CV

    Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation

    Authors: Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi, Guangming Lu, Daojing He, Wenjie Pei, Li Jiang

    Abstract: Recent advances in point cloud perception have demonstrated remarkable progress in scene understanding through vision-language alignment leveraging large language models (LLMs). However, existing methods may still encounter challenges in handling complex instructions that require accurate spatial reasoning, even if the 3D point cloud data provides detailed spatial cues such as size and position fo… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  14. arXiv:2506.07358  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS

    Lightweight Joint Audio-Visual Deepfake Detection via Single-Stream Multi-Modal Learning Framework

    Authors: Kuiyuan Zhang, Wenjie Pei, Rushi Lan, Yifang Guo, Zhongyun Hua

    Abstract: Deepfakes are AI-synthesized multimedia data that may be abused for spreading misinformation. Deepfake generation involves both visual and audio manipulation. To detect audio-visual deepfakes, previous studies commonly employ two relatively independent sub-models to learn audio and visual features, respectively, and fuse them subsequently for deepfake detection. However, this may underutilize the… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  15. arXiv:2504.11879  [pdf, other

    cs.CV

    Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval

    Authors: Yushuai Sun, Zikun Zhou, Dongmei Jiang, Yaowei Wang, Jun Yu, Guangming Lu, Wenjie Pei

    Abstract: Asymmetric retrieval is a typical scenario in real-world retrieval systems, where compatible models of varying capacities are deployed on platforms with different resource configurations. Existing methods generally train pre-defined networks or subnetworks with capacities specifically designed for pre-determined platforms, using compatible learning. Nevertheless, these methods suffer from limited… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025

  16. arXiv:2503.17811  [pdf, ps, other

    cs.CL cs.AI cs.DB

    Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models

    Authors: Wenqi Pei, Hailing Xu, Hengyuan Zhao, Shizheng Hou, Han Chen, Zining Zhang, Pingyi Luo, Bingsheng He

    Abstract: Natural Language to SQL (NL2SQL) has seen significant advancements with large language models (LLMs). However, these models often depend on closed-source systems and high computational resources, posing challenges in data privacy and deployment. In contrast, small language models (SLMs) struggle with NL2SQL tasks, exhibiting poor performance and incompatibility with existing frameworks. To address… ▽ More

    Submitted 18 August, 2025; v1 submitted 22 March, 2025; originally announced March 2025.

    Comments: DL4C @ ICLR 2025

  17. arXiv:2503.14824  [pdf, ps, other

    cs.CV

    Prototype Perturbation for Relaxing Alignment Constraints in Backward-Compatible Learning

    Authors: Zikun Zhou, Yushuai Sun, Wenjie Pei, Xin Li, Yaowei Wang

    Abstract: The traditional paradigm to update retrieval models requires re-computing the embeddings of the gallery data, a time-consuming and computationally intensive process known as backfilling. To circumvent backfilling, Backward-Compatible Learning (BCL) has been widely explored, which aims to train a new model compatible with the old one. Many previous works focus on effectively aligning the embeddings… ▽ More

    Submitted 7 March, 2026; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: Accept to IEEE TMM

  18. arXiv:2503.08387  [pdf, ps, other

    cs.CV

    Recognition-Synergistic Scene Text Editing

    Authors: Zhengyao Fang, Pengyuan Lyu, Jingjing Wu, Chengquan Zhang, Jun Yu, Guangming Lu, Wenjie Pei

    Abstract: Scene text editing aims to modify text content within scene images while maintaining style consistency. Traditional methods achieve this by explicitly disentangling style and content from the source image and then fusing the style with the target content, while ensuring content consistency using a pre-trained recognition model. Despite notable progress, these methods suffer from complex pipelines,… ▽ More

    Submitted 10 March, 2026; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: accepted by CVPR2025

  19. arXiv:2502.15027  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.HC

    InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

    Authors: Henry Hengyuan Zhao, Wenqi Pei, Yifei Tao, Haiyang Mei, Mike Zheng Shou

    Abstract: Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users, which is vital for developing general-purpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench which evaluates interactive intelligence us… ▽ More

    Submitted 7 November, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted by EMNLP 2025 Findings

  20. arXiv:2412.13466  [pdf, other

    cs.LG

    Federated Unlearning Model Recovery in Data with Skewed Label Distributions

    Authors: Xinrui Yu, Wenbin Pei, Bing Xue, Qiang Zhang

    Abstract: In federated learning, federated unlearning is a technique that provides clients with a rollback mechanism that allows them to withdraw their data contribution without training from scratch. However, existing research has not considered scenarios with skewed label distributions. Unfortunately, the unlearning of a client with skewed data usually results in biased models and makes it difficult to de… ▽ More

    Submitted 20 December, 2024; v1 submitted 17 December, 2024; originally announced December 2024.

  21. arXiv:2412.10461  [pdf, other

    cs.LG cs.AI cs.NE

    EvoSampling: A Granular Ball-based Evolutionary Hybrid Sampling with Knowledge Transfer for Imbalanced Learning

    Authors: Wenbin Pei, Ruohao Dai, Bing Xue, Mengjie Zhang, Qiang Zhang, Yiu-Ming Cheung, Shuyin Xia

    Abstract: Class imbalance would lead to biased classifiers that favor the majority class and disadvantage the minority class. Unfortunately, from a practical perspective, the minority class is of importance in many real-life applications. Hybrid sampling methods address this by oversampling the minority class to increase the number of its instances, followed by undersampling to remove low-quality instances.… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  22. arXiv:2410.11278  [pdf, other

    cs.LG

    UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba

    Authors: Li Wu, Wenbin Pei, Jiulong Jiao, Qiang Zhang

    Abstract: Multivariate Time series forecasting is crucial in domains such as transportation, meteorology, and finance, especially for predicting extreme weather events. State-of-the-art methods predominantly rely on Transformer architectures, which utilize attention mechanisms to capture temporal dependencies. However, these methods are hindered by quadratic time complexity, limiting the model's scalability… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  23. arXiv:2409.00014  [pdf, other

    cs.CV cs.AI

    DivDiff: A Conditional Diffusion Model for Diverse Human Motion Prediction

    Authors: Hua Yu, Yaqing Hou, Wenbin Pei, Qiang Zhang

    Abstract: Diverse human motion prediction (HMP) aims to predict multiple plausible future motions given an observed human motion sequence. It is a challenging task due to the diversity of potential human motions while ensuring an accurate description of future human motions. Current solutions are either low-diversity or limited in expressiveness. Recent denoising diffusion models (DDPM) hold potential gener… ▽ More

    Submitted 16 August, 2024; originally announced September 2024.

  24. arXiv:2408.01669  [pdf, other

    cs.CV cs.MM

    SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

    Authors: Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these lim… ▽ More

    Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024. Project page: https://synopground.github.io/

  25. arXiv:2407.19542  [pdf, other

    cs.CV

    UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation

    Authors: Shuang Wu, Songlin Tang, Guangming Lu, Jianzhuang Liu, Wenjie Pei

    Abstract: Typical inverse rendering methods focus on learning implicit neural scene representations by modeling the geometry, materials and illumination separately, which entails significant computations for optimization. In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  26. arXiv:2407.19507  [pdf, other

    cs.CV cs.AI

    WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

    Authors: Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastiv… ▽ More

    Submitted 13 January, 2025; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  27. arXiv:2406.18958  [pdf, other

    cs.CV

    AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

    Authors: Yanan Sun, Yanchen Liu, Yinhao Tang, Wenjie Pei, Kai Chen

    Abstract: The field of text-to-image (T2I) generation has made significant progress in recent years, largely driven by advancements in diffusion models. Linguistic control enables effective content creation, but struggles with fine-grained control over image generation. This challenge has been explored, to a great extent, by incorporating additional user-supplied spatial conditions, such as depth maps and e… ▽ More

    Submitted 18 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted by ECCV 2024, code and dataset available in https://github.com/open-mmlab/AnyControl

  28. arXiv:2405.09185  [pdf, other

    cs.SI cs.NE

    Influence Maximization in Hypergraphs Using A Genetic Algorithm with New Initialization and Evaluation Methods

    Authors: Xilong Qu, Wenbin Pei, Yingchao Yang, Xirong Xu, Renquan Zhang, Qiang Zhang

    Abstract: Influence maximization (IM) is a crucial optimization task related to analyzing complex networks in the real world, such as social networks, disease propagation networks, and marketing networks. Publications to date about the IM problem focus mainly on graphs, which fail to capture high-order interaction relationships from the real world. Therefore, the use of hypergraphs for addressing the IM pro… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  29. arXiv:2404.10322  [pdf, other

    cs.CV

    Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation

    Authors: Jiapeng Su, Qi Fan, Guangming Lu, Fanglin Chen, Wenjie Pei

    Abstract: Few-shot semantic segmentation (FSS) has achieved great success on segmenting objects of novel classes, supported by only a few annotated samples. However, existing FSS methods often underperform in the presence of domain shifts, especially when encountering new domain styles that are unseen during training. It is suboptimal to directly adapt or generalize the entire model to new domains in the fe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  30. arXiv:2402.00404  [pdf, other

    cs.NE

    Improving Critical Node Detection Using Neural Network-based Initialization in a Genetic Algorithm

    Authors: Chanjuan Liu, Shike Ge, Zhihan Chen, Wenbin Pei, Enqiang Zhu, Yi Mei, Hisao Ishibuchi

    Abstract: The Critical Node Problem (CNP) is concerned with identifying the critical nodes in a complex network. These nodes play a significant role in maintaining the connectivity of the network, and removing them can negatively impact network performance. CNP has been studied extensively due to its numerous real-world applications. Among the different versions of CNP, CNP-1a has gained the most popularity… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 14 pages, 13 figures

  31. arXiv:2401.00755  [pdf, other

    cs.LG

    Saliency-Aware Regularized Graph Neural Network

    Authors: Wenjie Pei, Weina Xu, Zongze Wu, Weichao Li, Jinfan Wang, Guangming Lu, Xiangrong Wang

    Abstract: The crux of graph classification lies in the effective representation learning for the entire graph. Typical graph neural networks focus on modeling the local dependencies when aggregating features of neighboring nodes, and obtain the representation for the entire graph by aggregating node features. Such methods have two potential limitations: 1) the global node saliency w.r.t. graph classificatio… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted by Artificial Intelligence Journal with minor revision

  32. arXiv:2312.10608  [pdf, other

    cs.CV

    Robust 3D Tracking with Quality-Aware Shape Completion

    Authors: Jingwen Zhang, Zikun Zhou, Guangming Lu, Jiandong Tian, Wenjie Pei

    Abstract: 3D single object tracking remains a challenging problem due to the sparsity and incompleteness of the point clouds. Existing algorithms attempt to address the challenges in two strategies. The first strategy is to learn dense geometric features based on the captured sparse point cloud. Nevertheless, it is quite a formidable task since the learned dense geometric features are with high uncertainty… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: A detailed version of the paper accepted by AAAI 2024

  33. arXiv:2312.10376  [pdf, other

    cs.CV

    SA$^2$VP: Spatially Aligned-and-Adapted Visual Prompt

    Authors: Wenjie Pei, Tongqi Xia, Fanglin Chen, Jinsong Li, Jiandong Tian, Guangming Lu

    Abstract: As a prominent parameter-efficient fine-tuning technique in NLP, prompt tuning is being explored its potential in computer vision. Typical methods for visual prompt tuning follow the sequential modeling paradigm stemming from NLP, which represents an input image as a flattened sequence of token embeddings and then learns a set of unordered parameterized tokens prefixed to the sequence representati… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  34. arXiv:2312.01431  [pdf, ps, other

    cs.CV

    D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition

    Authors: Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian, Jun Yu

    Abstract: Adapting pre-trained image models to video modality has proven to be an effective strategy for robust few-shot action recognition. In this work, we explore the potential of adapter tuning in image-to-video model adaptation and propose a novel video adapter tuning framework, called Disentangled-and-Deformable Spatio-Temporal Adapter (D$^2$ST-Adapter). It features a lightweight design, low adaptatio… ▽ More

    Submitted 30 June, 2025; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: Accepted by ICCV2025

  35. arXiv:2308.14061  [pdf, other

    cs.CV

    Hierarchical Contrastive Learning for Pattern-Generalizable Image Corruption Detection

    Authors: Xin Feng, Yifeng Xu, Guangming Lu, Wenjie Pei

    Abstract: Effective image restoration with large-size corruptions, such as blind image inpainting, entails precise detection of corruption region masks which remains extremely challenging due to diverse shapes and patterns of corruptions. In this work, we present a novel method for automatic corruption detection, which allows for blind corruption restoration without known corruption masks. Specifically, we… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  36. arXiv:2308.05104  [pdf, other

    cs.CV

    Scene-Generalizable Interactive Segmentation of Radiance Fields

    Authors: Songlin Tang, Wenjie Pei, Xin Tao, Tanghui Jia, Guangming Lu, Yu-Wing Tai

    Abstract: Existing methods for interactive segmentation in radiance fields entail scene-specific optimization and thus cannot generalize across different scenes, which greatly limits their applicability. In this work we make the first attempt at Scene-Generalizable Interactive Segmentation in Radiance Fields (SGISRF) and propose a novel SGISRF method, which can perform 3D object segmentation for novel (unse… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  37. arXiv:2308.03529  [pdf, other

    cs.CV

    Feature Decoupling-Recycling Network for Fast Interactive Segmentation

    Authors: Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie Pei

    Abstract: Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input without considering the invariant nature of the source image. As a result, extracting features from the source image is repeated in each interaction, resulting in substantial computational redundancy. In this work, we propose the Feature Decoupling-Recycling Network (FDRN… ▽ More

    Submitted 8 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023

  38. arXiv:2308.03177  [pdf, other

    cs.CV

    Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement

    Authors: Zhenhua Ning, Zhuotao Tian, Guangming Lu, Wenjie Pei

    Abstract: Although extensive research has been conducted on 3D point cloud segmentation, effectively adapting generic models to novel categories remains a formidable challenge. This paper proposes a novel approach to improve point cloud few-shot segmentation (PC-FSS) models. Unlike existing PC-FSS methods that directly utilize categorical information from support prototypes to recognize novel classes in que… ▽ More

    Submitted 8 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023

  39. arXiv:2303.14384  [pdf, other

    cs.CV

    Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation

    Authors: Zikun Zhou, Kaige Mao, Wenjie Pei, Hongpeng Wang, Yaowei Wang, Zhenyu He

    Abstract: This paper aims to solve the video object segmentation (VOS) task in a scribble-supervised manner, in which VOS models are not only trained by the sparse scribble annotations but also initialized with the sparse target scribbles for inference. Thus, the annotation burdens for both training and initialization can be substantially lightened. The difficulties of scribble-supervised VOS lie in two asp… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: This project is available at https://github.com/mkg1204/RHMNet-for-SSVOS

  40. arXiv:2301.06690  [pdf, other

    cs.CV

    Audio2Gestures: Generating Diverse Gestures from Audio

    Authors: Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

    Abstract: People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one mapping, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during infe… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2108.06720

  41. arXiv:2212.01131  [pdf, other

    cs.CV

    Activating the Discriminability of Novel Classes for Few-shot Segmentation

    Authors: Dianwen Mei, Wei Zhuo, Jiandong Tian, Guangming Lu, Wenjie Pei

    Abstract: Despite the remarkable success of existing methods for few-shot segmentation, there remain two crucial challenges. First, the feature learning for novel classes is suppressed during the training on base classes in that the novel classes are always treated as background. Thus, the semantics of novel classes are not well learned. Second, most of existing methods fail to consider the underlying seman… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  42. arXiv:2211.15143  [pdf, other

    cs.CV cs.LG

    Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations

    Authors: Bin Wang, Wenbin Pei, Bing Xue, Mengjie Zhang

    Abstract: Deep convolutional neural networks have proven their effectiveness, and have been acknowledged as the most dominant method for image classification. However, a severe drawback of deep convolutional neural networks is poor explainability. Unfortunately, in many real-world applications, users need to understand the rationale behind the predictions of deep convolutional neural networks when determini… ▽ More

    Submitted 25 March, 2025; v1 submitted 28 November, 2022; originally announced November 2022.

  43. arXiv:2211.14705  [pdf, other

    cs.CV

    Semantic-Aware Local-Global Vision Transformer

    Authors: Jiatong Zhang, Zengwei Yao, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Vision Transformers have achieved remarkable progresses, among which Swin Transformer has demonstrated the tremendous potential of Transformer for vision tasks. It surmounts the key challenge of high computational complexity by performing local self-attention within shifted windows. In this work we propose the Semantic-Aware Local-Global Vision Transformer (SALG), to further investigate two potent… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  44. arXiv:2210.16834  [pdf, other

    cs.CV

    Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid

    Authors: Jing Xu, Xu Luo, Xinglin Pan, Wenjie Pei, Yanan Li, Zenglin Xu

    Abstract: Few-shot learning (FSL) targets at generalization of vision models towards unseen tasks without sufficient annotations. Despite the emergence of a number of few-shot learning methods, the sample selection bias problem, i.e., the sensitivity to the limited amount of support data, has not been well understood. In this paper, we find that this problem usually occurs when the positions of support samp… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS 2022

  45. arXiv:2208.14093  [pdf, other

    cs.CV

    SSORN: Self-Supervised Outlier Removal Network for Robust Homography Estimation

    Authors: Yi Li, Wenjie Pei, Zhenyu He

    Abstract: The traditional homography estimation pipeline consists of four main steps: feature detection, feature matching, outlier removal and transformation estimation. Recent deep learning models intend to address the homography estimation problem using a single convolutional network. While these models are trained in an end-to-end fashion to simplify the homography estimation problem, they lack the featu… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  46. arXiv:2208.06162  [pdf, other

    cs.CV

    Layout-Bridging Text-to-Image Synthesis

    Authors: Jiadong Liang, Wenjie Pei, Feng Lu

    Abstract: The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image mapping directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circu… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

  47. arXiv:2207.12941  [pdf, other

    cs.CV eess.IV

    Learning Generalizable Latent Representations for Novel Degradations in Super Resolution

    Authors: Fengjun Li, Xin Feng, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Typical methods for blind image super-resolution (SR) focus on dealing with unknown degradations by directly estimating them or learning the degradation representations in a latent space. A potential limitation of these methods is that they assume the unknown degradations can be simulated by the integration of various handcrafted degradations (e.g., bicubic downsampling), which is not necessarily… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  48. arXiv:2207.12049  [pdf, other

    cs.CV

    Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations

    Authors: Wenjie Pei, Shuang Wu, Dianwen Mei, Fanglin Chen, Jiandong Tian, Guangming Lu

    Abstract: While fine-tuning based methods for few-shot object detection have achieved remarkable progress, a crucial challenge that has not been addressed well is the potential class-specific overfitting on base classes and sample-specific overfitting on novel classes. In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfi… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  49. arXiv:2207.11549  [pdf, other

    cs.CV

    Self-Support Few-Shot Semantic Segmentation

    Authors: Qi Fan, Wenjie Pei, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Existing few-shot segmentation methods have achieved great progress based on the support-query matching framework. But they still heavily suffer from the limited coverage of intra-class variations from the few-shot supports provided. Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  50. arXiv:2207.11184  [pdf, other

    cs.CV

    Multi-Faceted Distillation of Base-Novel Commonality for Few-shot Object Detection

    Authors: Shuang Wu, Wenjie Pei, Dianwen Mei, Fanglin Chen, Jiandong Tian, Guangming Lu

    Abstract: Most of existing methods for few-shot object detection follow the fine-tuning paradigm, which potentially assumes that the class-agnostic generalizable knowledge can be learned and transferred implicitly from base classes with abundant samples to novel classes with limited samples via such a two-stage training strategy. However, it is not necessarily true since the object detector can hardly disti… ▽ More

    Submitted 3 November, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022