Skip to main content

Showing 1–50 of 128 results for author: Pu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.15766  [pdf, ps, other

    cs.PL cs.AI cs.DC cs.PF

    LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models

    Authors: Yijie Zhi, Yayu Cao, Jianhua Dai, Xiaoyang Han, Jingwen Pu, Qingran Wu, Sheng Cheng, Ming Cai

    Abstract: Loop transformations are semantics-preserving optimization techniques, widely used to maximize objectives such as parallelism. Despite decades of research, applying the optimal composition of loop transformations remains challenging due to inherent complexities, including cost modeling for optimization objectives. Recent studies have explored the potential of Large Language Models (LLMs) for code… ▽ More

    Submitted 12 December, 2025; originally announced December 2025.

    Comments: Accepted to ASPLOS 2026

  2. arXiv:2512.01672  [pdf, ps, other

    cs.LG cs.AI

    ICAD-LLM: One-for-All Anomaly Detection via In-Context Learning with Large Language Models

    Authors: Zhongyuan Wu, Jingyuan Wang, Zexuan Cheng, Yilong Zhou, Weizhi Wang, Juhua Pu, Chao Li, Changqing Ma

    Abstract: Anomaly detection (AD) is a fundamental task of critical importance across numerous domains. Current systems increasingly operate in rapidly evolving environments that generate diverse yet interconnected data modalities -- such as time series, system logs, and tabular records -- as exemplified by modern IT systems. Effective AD methods in such environments must therefore possess two critical capab… ▽ More

    Submitted 1 December, 2025; originally announced December 2025.

  3. arXiv:2511.21767  [pdf

    eess.IV cs.AI cs.CV q-bio.TO

    LAYER: A Quantitative Explainable AI Framework for Decoding Tissue-Layer Drivers of Myofascial Low Back Pain

    Authors: Zixue Zeng, Anthony M. Perti, Tong Yu, Grant Kokenberger, Hao-En Lu, Jing Wang, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M. Cormack, Allison C. Bean, Ryan P. Nussbaum, Emily Landis-Walkenhorst, Kang Kim, Ajay D. Wasan, Jiantao Pu

    Abstract: Myofascial pain (MP) is a leading cause of chronic low back pain, yet its tissue-level drivers remain poorly defined and lack reliable image biomarkers. Existing studies focus predominantly on muscle while neglecting fascia, fat, and other soft tissues that play integral biomechanical roles. We developed an anatomically grounded explainable artificial intelligence (AI) framework, LAYER (Layer-wise… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

  4. arXiv:2511.14349  [pdf, ps, other

    cs.CV

    ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

    Authors: Junfu Pu, Teng Wang, Yixiao Ge, Yuying Ge, Chen Li, Ying Shan

    Abstract: The proliferation of hour-long videos (e.g., lectures, podcasts, documentaries) has intensified demand for efficient content structuring. However, existing approaches are constrained by small-scale training with annotations that are typical short and coarse, restricting generalization to nuanced transitions in long videos. We introduce ARC-Chapter, the first large-scale video chaptering model trai… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: Project Page: https://arcchapter.github.io/index_en.html

  5. arXiv:2511.13079  [pdf, ps, other

    cs.CV

    Decoupling Scene Perception and Ego Status: A Multi-Context Fusion Approach for Enhanced Generalization in End-to-End Autonomous Driving

    Authors: Jiacheng Tang, Mingyue Feng, Jiachao Liu, Yaonong Wang, Jian Pu

    Abstract: Modular design of planning-oriented autonomous driving has markedly advanced end-to-end systems. However, existing architectures remain constrained by an over-reliance on ego status, hindering generalization and robust scene understanding. We identify the root cause as an inherent design within these architectures that allows ego status to be easily leveraged as a shortcut. Specifically, the prema… ▽ More

    Submitted 18 November, 2025; v1 submitted 17 November, 2025; originally announced November 2025.

    Comments: Accepted to AAAI 2026 (Oral)

  6. arXiv:2510.25138  [pdf, ps, other

    cs.RO

    Learning Spatial-Aware Manipulation Ordering

    Authors: Yuxiang Yan, Zhiyuan Zhou, Xin Gao, Guanghao Li, Shenglin Li, Jiaqi Chen, Qunyan Pu, Jian Pu

    Abstract: Manipulation in cluttered environments is challenging due to spatial dependencies among objects, where an improper manipulation order can cause collisions or blocked access. Existing approaches often overlook these spatial relationships, limiting their flexibility and scalability. To address these limitations, we propose OrderMind, a unified spatial-aware manipulation ordering framework that direc… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

    Comments: Accepted to NeurIPS 2025

  7. arXiv:2510.25117  [pdf, ps, other

    cs.CL

    A Survey on Unlearning in Large Language Models

    Authors: Ruichen Qiu, Jiajun Tan, Jiayue Pu, Honglin Wang, Xiao-Shan Gao, Fei Sun

    Abstract: Large Language Models (LLMs) demonstrate remarkable capabilities, but their training on massive corpora poses significant risks from memorized sensitive information. To mitigate these issues and align with legal standards, unlearning has emerged as a critical technique to selectively erase specific knowledge from LLMs without compromising their overall performance. This survey provides a systemati… ▽ More

    Submitted 17 November, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

  8. arXiv:2510.24102  [pdf, ps, other

    cs.CL

    Squrve: A Unified and Modular Framework for Complex Real-World Text-to-SQL Tasks

    Authors: Yihan Wang, Peiyu Liu, Runyu Chen, Jiaxing Pu, Wei Xu

    Abstract: Text-to-SQL technology has evolved rapidly, with diverse academic methods achieving impressive results. However, deploying these techniques in real-world systems remains challenging due to limited integration tools. Despite these advances, we introduce Squrve, a unified, modular, and extensive Text-to-SQL framework designed to bring together research advances and real-world applications. Squrve fi… ▽ More

    Submitted 28 October, 2025; originally announced October 2025.

  9. arXiv:2510.08551  [pdf, ps, other

    cs.CV

    ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

    Authors: Guanghao Li, Kerui Ren, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu, Mulin Yu, Jiangmiao Pang

    Abstract: On-the-fly 3D reconstruction from monocular image sequences is a long-standing challenge in computer vision, critical for applications such as real-to-sim, AR/VR, and robotics. Existing methods face a major tradeoff: per-scene optimization yields high fidelity but is computationally expensive, whereas feed-forward foundation models enable real-time inference but struggle with accuracy and robustne… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  10. arXiv:2510.08263  [pdf, ps, other

    cs.AI

    Co-TAP: Three-Layer Agent Interaction Protocol Technical Report

    Authors: Shunyu An, Miao Wang, Yongchao Li, Dong Wan, Lina Wang, Ling Qin, Liqin Gao, Congyao Fan, Zhiyong Mao, Jiange Pu, Wenji Xia, Dong Zhao, Zhaohui Hao, Rui Hu, Ji Lu, Guiyue Zhou, Baoyu Tang, Yanqin Gao, Yongsheng Du, Daigang Xu, Lingjun Huang, Baoli Wang, Xiwen Zhang, Luyao Wang, Shilong Liu

    Abstract: This paper proposes Co-TAP (T: Triple, A: Agent, P: Protocol), a three-layer agent interaction protocol designed to address the challenges faced by multi-agent systems across the three core dimensions of Interoperability, Interaction and Collaboration, and Knowledge Sharing. We have designed and proposed a layered solution composed of three core protocols: the Human-Agent Interaction Protocol (HAI… ▽ More

    Submitted 28 October, 2025; v1 submitted 9 October, 2025; originally announced October 2025.

  11. arXiv:2510.06969  [pdf, ps, other

    cs.CV cs.AI

    Learning Global Representation from Queries for Vectorized HD Map Construction

    Authors: Shoumeng Qiu, Xinrun Li, Yang Long, Xiangyang Xue, Varun Ojha, Jian Pu

    Abstract: The online construction of vectorized high-definition (HD) maps is a cornerstone of modern autonomous driving systems. State-of-the-art approaches, particularly those based on the DETR framework, formulate this as an instance detection problem. However, their reliance on independent, learnable object queries results in a predominantly local query perspective, neglecting the inherent global represe… ▽ More

    Submitted 8 October, 2025; originally announced October 2025.

    Comments: 16 pages

  12. arXiv:2509.26463  [pdf, ps, other

    cs.SE

    ErrorPrism: Reconstructing Error Propagation Paths in Cloud Service Systems

    Authors: Junsong Pu, Yichen Li, Zhuangbin Chen, Jinyang Liu, Zhihan Jiang, Jianjun Chen, Rui Shi, Zibin Zheng, Tieying Zhang

    Abstract: Reliability management in cloud service systems is challenging due to the cascading effect of failures. Error wrapping, a practice prevalent in modern microservice development, enriches errors with context at each layer of the function call stack, constructing an error chain that describes a failure from its technical origin to its business impact. However, this also presents a significant traceab… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: 12 pages, 6 figures, 1 table, this paper has been accepted by the 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025

    ACM Class: D.2.5

  13. arXiv:2509.23375  [pdf, ps, other

    cs.CV

    CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation

    Authors: Yifan Yang, Yuxiang Yan, Boda Liu, Jian Pu

    Abstract: Point clouds collected from real-world environments are often incomplete due to factors such as limited sensor resolution, single viewpoints, occlusions, and noise. These challenges make point cloud completion essential for various applications. A key difficulty in this task is predicting the overall shape and reconstructing missing regions from highly incomplete point clouds. To address this, we… ▽ More

    Submitted 27 September, 2025; originally announced September 2025.

    Comments: Accepted to IROS2025

  14. arXiv:2509.22796  [pdf, ps, other

    cs.CR cs.LG

    What Do They Fix? LLM-Aided Categorization of Security Patches for Critical Memory Bugs

    Authors: Xingyu Li, Juefei Pu, Yifan Wu, Xiaochen Zou, Shitong Zhu, Xiaochen Zou, Shitong Zhu, Qiushi Wu, Zheng Zhang, Joshua Hsu, Yue Dong, Zhiyun Qian, Kangjie Lu, Trent Jaeger, Michael De Lucia, Srikanth V. Krishnamurthy

    Abstract: Open-source software projects are foundational to modern software ecosystems, with the Linux kernel standing out as a critical exemplar due to its ubiquity and complexity. Although security patches are continuously integrated into the Linux mainline kernel, downstream maintainers often delay their adoption, creating windows of vulnerability. A key reason for this lag is the difficulty in identifyi… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  15. arXiv:2509.18094  [pdf, ps, other

    cs.CV cs.AI

    UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

    Authors: Ye Liu, Zongyang Ma, Junfu Pu, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen

    Abstract: Recent advances in Large Multi-modal Models (LMMs) have demonstrated their remarkable success as general-purpose multi-modal assistants, with particular focuses on holistic image- and video-language understanding. Conversely, less attention has been given to scaling fine-grained pixel-level understanding capabilities, where the models are expected to realize pixel-level alignment between visual si… ▽ More

    Submitted 10 November, 2025; v1 submitted 22 September, 2025; originally announced September 2025.

    Comments: NeurIPS 2025 Camera Ready. Project Page: https://polyu-chenlab.github.io/unipixel/

  16. arXiv:2509.09183  [pdf, ps, other

    cs.CV cs.AI

    Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection

    Authors: Jiasheng Guo, Xin Gao, Yuxiang Yan, Guanghao Li, Jian Pu

    Abstract: Low-light Object detection is crucial for many real-world applications but remains challenging due to degraded image quality. While recent studies have shown that RAW images offer superior potential over RGB images, existing approaches either use RAW-RGB images with information loss or employ complex frameworks. To address these, we propose a lightweight and self-adaptive Image Signal Processing (… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: 11 pages, 6 figures, conference

    Journal ref: ICCV 2025

  17. arXiv:2507.21017  [pdf, ps, other

    cs.AI

    MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them

    Authors: Weichen Zhang, Yiyou Sun, Pohao Huang, Jiayue Pu, Heyue Lin, Dawn Song

    Abstract: Hallucinations pose critical risks for large language model (LLM)-based agents, often manifesting as hallucinative actions resulting from fabricated or misinterpreted information within the cognitive context. While recent studies have exposed such failures, existing evaluations remain fragmented and lack a principled testbed. In this paper, we present MIRAGE-Bench--Measuring Illusions in Risky AGE… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

    Comments: Code and data: https://github.com/sunblaze-ucb/mirage-bench.git

  18. arXiv:2507.20939  [pdf, ps, other

    cs.CV

    ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

    Authors: Yuying Ge, Yixiao Ge, Chen Li, Teng Wang, Junfu Pu, Yizhuo Li, Lu Qiu, Jin Ma, Lisheng Duan, Xinyu Zuo, Jinwen Luo, Weibo Gu, Zexuan Li, Xiaojing Zhang, Yangyu Tao, Han Hu, Di Wang, Ying Shan

    Abstract: Real-world user-generated short videos, especially those distributed on platforms such as WeChat Channel and TikTok, dominate the mobile internet. However, current large multimodal models lack essential temporally-structured, detailed, and in-depth video comprehension capabilities, which are the cornerstone of effective video search and recommendation, as well as emerging video applications. Under… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

    Comments: Project Page: https://tencentarc.github.io/posts/arc-video-announcement/

  19. arXiv:2507.01213  [pdf, ps, other

    cs.CL

    AF-MAT: Aspect-aware Flip-and-Fuse xLSTM for Aspect-based Sentiment Analysis

    Authors: Adamu Lawan, Juhua Pu, Haruna Yunusa, Muhammad Lawan, Mahmoud Basi, Muhammad Adam

    Abstract: Aspect-based Sentiment Analysis (ABSA) is a crucial NLP task that extracts fine-grained opinions and sentiments from text, such as product reviews and customer feedback. Existing methods often trade off efficiency for performance: traditional LSTM or RNN models struggle to capture long-range dependencies, transformer-based methods are computationally costly, and Mamba-based approaches rely on CUDA… ▽ More

    Submitted 14 August, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: 9, 4 figure

  20. arXiv:2505.23868  [pdf, ps, other

    cs.LG cs.AI

    Noise-Robustness Through Noise: A Framework combining Asymmetric LoRA with Poisoning MoE

    Authors: Zhaokun Wang, Jinyu Guo, Jingwen Pu, Lingfeng Chen, Hongli Pu, Jie Ou, Libo Qin, Wenhong Tian

    Abstract: Current parameter-efficient fine-tuning methods for adapting pre-trained language models to downstream tasks are susceptible to interference from noisy data. Conventional noise-handling approaches either rely on laborious data pre-processing or employ model architecture modifications prone to error accumulation. In contrast to existing noise-process paradigms, we propose a noise-robust adaptation… ▽ More

    Submitted 20 October, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: Accecpted to NeurIPS 2025

  21. arXiv:2505.19648  [pdf, other

    cs.LO cs.AI

    Model Enumeration of Two-Variable Logic with Quadratic Delay Complexity

    Authors: Qiaolan Meng, Juhua Pu, Hongting Niu, Yuyi Wang, Yuanhong Wang, Ondřej Kuželka

    Abstract: We study the model enumeration problem of the function-free, finite domain fragment of first-order logic with two variables ($FO^2$). Specifically, given an $FO^2$ sentence $Γ$ and a positive integer $n$, how can one enumerate all the models of $Γ$ over a domain of size $n$? In this paper, we devise a novel algorithm to address this problem. The delay complexity, the time required between producin… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 16 pages, 4 figures and to be published in Fortieth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)

  22. arXiv:2504.15927  [pdf, ps, other

    cs.SI cs.AI

    New Recipe for Semi-supervised Community Detection: Clique Annealing under Crystallization Kinetics

    Authors: Ling Cheng, Jiashu Pu, Ruicheng Liang, Qian Shao, Hezhe Qiao, Feida Zhu

    Abstract: Semi-supervised community detection methods are widely used for identifying specific communities due to the label scarcity. Existing semi-supervised community detection methods typically involve two learning stages learning in both initial identification and subsequent adjustment, which often starts from an unreasonable community core candidate. Moreover, these methods encounter scalability issues… ▽ More

    Submitted 6 October, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2203.05898 by other authors

  23. arXiv:2504.13471  [pdf, other

    cs.CL

    From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs

    Authors: Jiliang Ni, Jiachen Pu, Zhongyi Yang, Kun Zhou, Hui Wang, Xiaoliang Xiao, Dakui Wang, Xin Li, Jingfeng Luo, Conggang Hu

    Abstract: Large Language Models (LLMs) have significantly advanced artificial intelligence by optimizing traditional Natural Language Processing (NLP) workflows, facilitating their integration into various systems. Many such NLP systems, including ours, directly incorporate LLMs. However, this approach either results in expensive costs or yields suboptimal performance after fine-tuning. In this paper, we in… ▽ More

    Submitted 11 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

  24. arXiv:2504.09839  [pdf, other

    cs.SD cs.AI cs.CR cs.LG

    SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis

    Authors: Zhisheng Zhang, Derui Wang, Qianyi Yang, Pengyang Huang, Junhan Pu, Yuxin Cao, Kai Ye, Jie Hao, Yixian Yang

    Abstract: Speech synthesis technology has brought great convenience, while the widespread usage of realistic deepfake audio has triggered hazards. Malicious adversaries may unauthorizedly collect victims' speeches and clone a similar voice for illegal exploitation (\textit{e.g.}, telecom fraud). However, the existing defense methods cannot effectively prevent deepfake exploitation and are vulnerable to robu… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: Accepted to USENIX Security 2025

  25. arXiv:2503.23740  [pdf, other

    cs.CL cs.AI

    LANID: LLM-assisted New Intent Discovery

    Authors: Lu Fan, Jiashu Pu, Rongsheng Zhang, Xiao-Ming Wu

    Abstract: Task-oriented Dialogue Systems (TODS) often face the challenge of encountering new intents. New Intent Discovery (NID) is a crucial task that aims to identify these novel intents while maintaining the capability to recognize existing ones. Previous efforts to adapt TODS to new intents have struggled with inadequate semantic representation or have depended on external knowledge, which is often not… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: Published in LREC-COLING 2024

  26. arXiv:2503.19736  [pdf

    eess.IV cs.CV

    GRN+: A Simplified Generative Reinforcement Network for Tissue Layer Analysis in 3D Ultrasound Images for Chronic Low-back Pain

    Authors: Zixue Zeng, Xiaoyan Zhao, Matthew Cartier, Xin Meng, Jiantao Pu

    Abstract: 3D ultrasound delivers high-resolution, real-time images of soft tissues, which is essential for pain research. However, manually distinguishing various tissues for quantitative analysis is labor-intensive. To streamline this process, we developed and validated GRN+, a novel multi-model framework that automates layer segmentation with minimal annotated data. GRN+ combines a ResNet-based generator… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  27. arXiv:2503.19735  [pdf

    eess.IV cs.CV

    InterSliceBoost: Identifying Tissue Layers in Three-dimensional Ultrasound Images for Chronic Lower Back Pain (cLBP) Assessment

    Authors: Zixue Zeng, Matthew Cartier, Xiaoyan Zhao, Pengyu Chen, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison C. Bean, Ryan P. Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Kang Kim, Ajay D. Wasan, Jiantao Pu

    Abstract: Available studies on chronic lower back pain (cLBP) typically focus on one or a few specific tissues rather than conducting a comprehensive layer-by-layer analysis. Since three-dimensional (3-D) images often contain hundreds of slices, manual annotation of these anatomical structures is both time-consuming and error-prone. We aim to develop and validate a novel approach called InterSliceBoost to e… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  28. arXiv:2503.17712  [pdf, other

    cs.CV cs.AI

    Multi-modality Anomaly Segmentation on the Road

    Authors: Heng Gao, Zhuolin He, Shoumeng Qiu, Xiangyang Xue, Jian Pu

    Abstract: Semantic segmentation allows autonomous driving cars to understand the surroundings of the vehicle comprehensively. However, it is also crucial for the model to detect obstacles that may jeopardize the safety of autonomous driving systems. Based on our experiments, we find that current uni-modal anomaly segmentation frameworks tend to produce high anomaly scores for non-anomalous regions in images… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

  29. arXiv:2503.12086  [pdf, other

    cs.CV

    FA-BARF: Frequency Adapted Bundle-Adjusting Neural Radiance Fields

    Authors: Rui Qian, Chenyangguang Zhang, Yan Di, Guangyao Zhai, Ruida Zhang, Jiayu Guo, Benjamin Busam, Jian Pu

    Abstract: Neural Radiance Fields (NeRF) have exhibited highly effective performance for photorealistic novel view synthesis recently. However, the key limitation it meets is the reliance on a hand-crafted frequency annealing strategy to recover 3D scenes with imperfect camera poses. The strategy exploits a temporal low-pass filter to guarantee convergence while decelerating the joint optimization of implici… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  30. arXiv:2502.20767  [pdf, other

    cs.RO

    A2DO: Adaptive Anti-Degradation Odometry with Deep Multi-Sensor Fusion for Autonomous Navigation

    Authors: Hui Lai, Qi Chen, Junping Zhang, Jian Pu

    Abstract: Accurate localization is essential for the safe and effective navigation of autonomous vehicles, and Simultaneous Localization and Mapping (SLAM) is a cornerstone technology in this context. However, The performance of the SLAM system can deteriorate under challenging conditions such as low light, adverse weather, or obstructions due to sensor degradation. We present A2DO, a novel end-to-end multi… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 6+1pages, 6 figures, accept by ICRA

  31. arXiv:2502.16115  [pdf, other

    cs.LG cs.CV stat.ML

    Detecting OOD Samples via Optimal Transport Scoring Function

    Authors: Heng Gao, Zhuolin He, Jian Pu

    Abstract: To deploy machine learning models in the real world, researchers have proposed many OOD detection algorithms to help models identify unknown samples during the inference phase and prevent them from making untrustworthy predictions. Unlike methods that rely on extra data for outlier exposure training, post hoc methods detect Out-of-Distribution (OOD) samples by developing scoring functions, which a… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025

  32. arXiv:2502.11037  [pdf, other

    cs.LG cs.AI cs.CV

    Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

    Authors: Xin Gao, Jian Pu

    Abstract: Multi-View Representation Learning (MVRL) aims to derive a unified representation from multi-view data by leveraging shared and complementary information across views. However, when views are irregularly missing, the incomplete data can lead to representations that lack sufficiency and consistency. To address this, we propose Multi-View Permutation of Variational Auto-Encoders (MVP), which excavat… ▽ More

    Submitted 28 February, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: 10 pages, 4 figures, ICLR 2025

  33. arXiv:2502.06318  [pdf, other

    cs.SE

    Tracezip: Efficient Distributed Tracing via Trace Compression

    Authors: Zhuangbin Chen, Junsong Pu, Zibin Zheng

    Abstract: Distributed tracing serves as a fundamental building block in the monitoring and testing of cloud service systems. To reduce computational and storage overheads, the de facto practice is to capture fewer traces via sampling. However, existing work faces a trade-off between the completeness of tracing and system overhead. On one hand, head-based sampling indiscriminately selects requests to trace w… ▽ More

    Submitted 13 April, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted by The 34th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2025)

  34. arXiv:2501.17690  [pdf

    cs.CV cs.AI cs.LG

    Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment

    Authors: Zixue Zeng, Xiaoyan Zhao, Matthew Cartier, Tong Yu, Jing Wang, Xin Meng, Zhiyu Sheng, Maryam Satarpour, John M Cormack, Allison Bean, Ryan Nussbaum, Maya Maurer, Emily Landis-Walkenhorst, Dinesh Kumbhare, Kang Kim, Ajay Wasan, Jiantao Pu

    Abstract: We introduce a novel segmentation-aware joint training framework called generative reinforcement network (GRN) that integrates segmentation loss feedback to optimize both image generation and segmentation performance in a single stage. An image enhancement technique called segmentation-guided enhancement (SGE) is also developed, where the generator produces images tailored specifically for the seg… ▽ More

    Submitted 25 November, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  35. arXiv:2501.15368  [pdf, other

    cs.CL cs.SD eess.AS

    Baichuan-Omni-1.5 Technical Report

    Authors: Yadong Li, Jun Liu, Tao Zhang, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan, Chong Li, Yuanbo Fang, Dongdong Kuang, Mingrui Wang, Chenglin Zhu, Youwei Zhang, Hongyu Guo, Fengyu Zhang, Yuran Wang, Bowen Ding, Wei Song, Xu Li, Yuqi Huo, Zheng Liang , et al. (68 additional authors not shown)

    Abstract: We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pip… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  36. arXiv:2412.19645  [pdf, other

    cs.CV

    VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

    Authors: Tao Wu, Yong Zhang, Xiaodong Cun, Zhongang Qi, Junfu Pu, Huanzhang Dou, Guangcong Zheng, Ying Shan, Xi Li

    Abstract: Zero-shot customized video generation has gained significant attention due to its substantial application potential. Existing methods rely on additional models to extract and inject reference subject features, assuming that the Video Diffusion Model (VDM) alone is insufficient for zero-shot customized video generation. However, these methods often struggle to maintain consistent subject appearance… ▽ More

    Submitted 29 December, 2024; v1 submitted 27 December, 2024; originally announced December 2024.

    Comments: Project Page: https://wutao-cs.github.io/VideoMaker/

  37. arXiv:2412.14821  [pdf, other

    cs.CV

    PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation

    Authors: Shoumeng Qiu, Xinrun Li, XiangYang Xue, Jian Pu

    Abstract: Although multiview fusion has demonstrated potential in LiDAR segmentation, its dependence on computationally intensive point-based interactions, arising from the lack of fixed correspondences between views such as range view and Bird's-Eye View (BEV), hinders its practical deployment. This paper challenges the prevailing notion that multiview fusion is essential for achieving high performance. We… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: AAAI 2025

  38. arXiv:2412.04939  [pdf, ps, other

    cs.CV

    Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

    Authors: Zehao Wang, Xinpeng Liu, Yudonglin Zhang, Xiaoqian Wu, Zhou Fang, Yifan Fang, Junfu Pu, Cewu Lu, Yong-Lu Li

    Abstract: Multimodal Large Language Models (MLLMs) have garnered significant attention recently and demonstrate outstanding capabilities in various tasks such as OCR, VQA, captioning, $\textit{etc}$. However, hallucination remains a persistent issue. While numerous methods have been proposed to mitigate hallucinations, achieving notable improvements, these methods primarily focus on mitigating hallucination… ▽ More

    Submitted 20 December, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI-26

  39. arXiv:2411.15041  [pdf, other

    cs.AI cs.CL

    mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA

    Authors: Tao Zhang, Ziqi Zhang, Zongyang Ma, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Yuxuan Zhao, Zehua Xie, Jin Ma, Ying Shan, Weiming Hu

    Abstract: Advanced Multimodal Large Language Models (MLLMs) struggle with recent Knowledge-based VQA tasks, such as INFOSEEK and Encyclopedic-VQA, due to their limited and frozen knowledge scope, often leading to ambiguous and inaccurate responses. Thus, multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge, effectively expandin… ▽ More

    Submitted 22 November, 2024; originally announced November 2024.

  40. arXiv:2411.04746  [pdf, ps, other

    cs.CV

    Taming Rectified Flow for Inversion and Editing

    Authors: Jiangshan Wang, Junfu Pu, Zhongang Qi, Jiayi Guo, Yue Ma, Nisha Huang, Yuxin Chen, Xiu Li, Ying Shan

    Abstract: Rectified-flow-based diffusion transformers like FLUX and OpenSora have demonstrated outstanding performance in the field of image and video generation. Despite their robust generative capabilities, these models often struggle with inversion inaccuracies, which could further limit their effectiveness in downstream tasks such as image and video editing. To address this issue, we propose RF-Solver,… ▽ More

    Submitted 12 June, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: ICML 2025; GitHub: https://github.com/wangjiangshan0725/RF-Solver-Edit

  41. arXiv:2410.12324  [pdf, other

    cs.RO cs.CV

    PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

    Authors: Guanghao Li, Yu Cao, Qi Chen, Yifan Yang, Jian Pu

    Abstract: In point-line SLAM systems, the utilization of line structural information and the optimization of lines are two significant problems. The former is usually addressed through structural regularities, while the latter typically involves using minimal parameter representations of lines in optimization. However, separating these two steps leads to the loss of constraint information to each other. We… ▽ More

    Submitted 18 October, 2024; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: 8 pages, 4 figures

  42. arXiv:2409.10063  [pdf, other

    cs.CV cs.AI cs.RO

    GlobalMapNet: An Online Framework for Vectorized Global HD Map Construction

    Authors: Anqi Shi, Yuze Cai, Xiangyu Chen, Jian Pu, Zeyu Fu, Hong Lu

    Abstract: High-definition (HD) maps are essential for autonomous driving systems. Traditionally, an expensive and labor-intensive pipeline is implemented to construct HD maps, which is limited in scalability. In recent years, crowdsourcing and online mapping have emerged as two alternative methods, but they have limitations respectively. In this paper, we provide a novel methodology, namely global map const… ▽ More

    Submitted 17 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

  43. arXiv:2408.15379  [pdf, other

    cs.CL

    DualKanbaFormer: An Efficient Selective Sparse Framework for Multimodal Aspect-based Sentiment Analysis

    Authors: Adamu Lawan, Juhua Pu, Haruna Yunusa, Muhammad Lawan, Aliyu Umar, Adamu Sani Yahya, Mahmoud Basi

    Abstract: Multimodal Aspect-based Sentiment Analysis (MABSA) enhances sentiment detection by integrating textual data with complementary modalities, such as images, to provide a more refined and comprehensive understanding of sentiment. However, conventional attention mechanisms, despite notable benchmarks, are hindered by quadratic complexity, limiting their ability to fully capture global contextual depen… ▽ More

    Submitted 19 April, 2025; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: 12 pages, 2 figures, and 3 tables

  44. arXiv:2408.05645  [pdf

    eess.IV cs.CV cs.LG

    BeyondCT: A deep learning model for predicting pulmonary function from chest CT scans

    Authors: Kaiwen Geng, Zhiyi Shi, Xiaoyan Zhao, Alaa Ali, Jing Wang, Joseph Leader, Jiantao Pu

    Abstract: Abstract Background: Pulmonary function tests (PFTs) and computed tomography (CT) imaging are vital in diagnosing, managing, and monitoring lung diseases. A common issue in practice is the lack of access to recorded pulmonary functions despite available chest CT scans. Purpose: To develop and validate a deep learning algorithm for predicting pulmonary function directly from chest CT scans. M… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 5 tables, 7 figures,22 pages

  45. arXiv:2408.01669  [pdf, other

    cs.CV cs.MM

    SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

    Authors: Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these lim… ▽ More

    Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024. Project page: https://synopground.github.io/

  46. arXiv:2407.13254  [pdf, other

    cs.CV

    Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation

    Authors: Shoumeng Qiu, Jie Chen, Xinrun Li, Ru Wan, Xiangyang Xue, Jian Pu

    Abstract: In this paper, we introduce a novel knowledge distillation approach for the semantic segmentation task. Unlike previous methods that rely on power-trained teachers or other modalities to provide additional knowledge, our approach does not require complex teacher models or information from extra sensors. Specifically, for the teacher model training, we propose to noise the label and then incorporat… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Journal ref: ECCV 2024

  47. arXiv:2407.10534  [pdf, other

    cs.CV

    Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

    Authors: Rong Ma, Jie Chen, Xiangyang Xue, Jian Pu

    Abstract: Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space acr… ▽ More

    Submitted 9 December, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  48. arXiv:2407.10347  [pdf, other

    cs.CL

    Enhancing Long-Range Dependency with State Space Model and Kolmogorov-Arnold Networks for Aspect-Based Sentiment Analysis

    Authors: Adamu Lawan, Juhua Pu, Haruna Yunusa, Aliyu Umar, Muhammad Lawan

    Abstract: Aspect-based Sentiment Analysis (ABSA) evaluates sentiments toward specific aspects of entities within the text. However, attention mechanisms and neural network models struggle with syntactic constraints. The quadratic complexity of attention mechanisms also limits their adoption for capturing long-range dependencies between aspect and opinion words in ABSA. This complexity can lead to the misint… ▽ More

    Submitted 26 December, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures and 3 tables. arXiv admin note: text overlap with arXiv:2405.13013

  49. arXiv:2407.07479  [pdf, other

    cs.CV

    How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

    Authors: Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu

    Abstract: Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus we investigate the following valuable question: how to make cross-encoder a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  50. arXiv:2407.05376  [pdf, other

    cs.RO

    Rethinking Closed-loop Planning Framework for Imitation-based Model Integrating Prediction and Planning

    Authors: Jiayu Guo, Mingyue Feng, Pengfei Zhu, Chengjun Li, Jian Pu

    Abstract: In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and pl… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 7 pages,5 figures