Skip to main content

Showing 1–50 of 245 results for author: Nie, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.27508  [pdf, ps, other

    cs.SD

    Investigation on the Robustness of Acoustic Foundation Models on Post Exercise Speech

    Authors: Xiangyuan Xue, Yuyu Wang, Ruijie Yao, Xiaoyue Ni, Xiaofan Jiang, Jingping Nie

    Abstract: Automatic speech recognition (ASR) has been extensively studied on neutral and stationary speech, yet its robustness under post-exercise physiological shift remains underexplored. Compared with resting speech, post-exercise speech often contains micro-breaths, non-semantic pauses, unstable phonation, and repetitions caused by reduced breath support, making transcription more difficult. In this wor… ▽ More

    Submitted 29 March, 2026; originally announced March 2026.

  2. arXiv:2603.26341  [pdf, ps, other

    cs.CV

    HINT: Composed Image Retrieval with Dual-path Compositional Contextualized Network

    Authors: Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, Yupeng Hu

    Abstract: Composed Image Retrieval (CIR) is a challenging image retrieval paradigm. It aims to retrieve target images from large-scale image databases that are consistent with the modification semantics, based on a multimodal query composed of a reference image and modification text. Although existing methods have made significant progress in cross-modal alignment and feature fusion, a key flaw remains: the… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: Accepted by ICASSP 2026

  3. arXiv:2603.25754  [pdf, ps, other

    cs.IT

    DUGC-VRNet: Joint VR Recognition and Channel Estimation for Spatially Non-Stationary XL-MIMO

    Authors: Jinhao Nie, Guangchi Zhang, Miao Cui, Hao Fu, Xiaoli Chu

    Abstract: In this letter, we address spatially non-stationary near-field channel estimation for extremely large-scale multiple-input multiple-output (XL-MIMO) systems with a hybrid combining architecture. One key challenge in the considered problem lies in that conventional channel estimation algorithms typically struggle to effectively identify and adapt to the partial antenna visibility caused by varying… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  4. arXiv:2603.23638  [pdf, ps, other

    cs.AI

    Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

    Authors: Yi Han, Lingfei Qian, Yan Wang, Yueru He, Xueqing Peng, Dongji Feng, Yankai Chen, Haohang Li, Yupeng Cao, Jimin Huang, Xue Liu, Jian-Yun Nie, Sophia Ananiadou

    Abstract: Large language models (LLMs) have enabled agentic systems that can reason, plan, and act across complex tasks, but it remains unclear whether they can allocate resources effectively under uncertainty. Unlike short-horizon reactive decisions, allocation requires committing scarce resources over time while balancing competing objectives and preserving flexibility for future needs. We introduce Enter… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

  5. arXiv:2603.22323  [pdf, ps, other

    cs.LG cs.AI

    A Multi-Task Targeted Learning Framework for Lithium-Ion Battery State-of-Health and Remaining Useful Life

    Authors: Chenhan Wang, Zhengyi Bao, Huipin Lin, Jiahao Nie, Chunxiang Zhu

    Abstract: Accurately predicting the state-of-health (SOH) and remaining useful life (RUL) of lithium-ion batteries is crucial for ensuring the safe and efficient operation of electric vehicles while minimizing associated risks. However, current deep learning methods are limited in their ability to selectively extract features and model time dependencies for these two parameters. Moreover, most existing meth… ▽ More

    Submitted 20 March, 2026; originally announced March 2026.

    Comments: https://github.com/wch1121/Joint-prediction-of-SOH-and-RUL

  6. arXiv:2603.18397  [pdf, ps, other

    cs.LG

    FlowMS: Flow Matching for De Novo Structure Elucidation from Mass Spectra

    Authors: Jianan Nie, Peng Gao

    Abstract: Mass spectrometry (MS) stands as a cornerstone analytical technique for molecular identification, yet de novo structure elucidation from spectra remains challenging due to the combinatorial complexity of chemical space and the inherent ambiguity of spectral fragmentation patterns. Recent deep learning approaches, including autoregressive sequence models, scaffold-based methods, and graph diffusion… ▽ More

    Submitted 18 March, 2026; originally announced March 2026.

  7. arXiv:2603.16292  [pdf, ps, other

    cs.CL cs.AI

    Attention-guided Evidence Grounding for Spoken Question Answering

    Authors: Ke Yang, Bolin Chen, Yuejie Li, Yueying Hua, Jianhao Nie, Yueping He, Bowen Li, Chengjun Mao

    Abstract: Spoken Question Answering (Spoken QA) presents a challenging cross-modal problem: effectively aligning acoustic queries with textual knowledge while avoiding the latency and error propagation inherent in cascaded ASR-based systems. In this paper, we introduce Attention-guided Evidence Grounding (AEG), a novel end-to-end framework that leverages the internal cross-modal attention of Speech Large La… ▽ More

    Submitted 17 March, 2026; v1 submitted 17 March, 2026; originally announced March 2026.

    Comments: Accepted to ICME 2026

  8. arXiv:2603.08721  [pdf, ps, other

    cs.AR cs.LG cs.SE

    KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware

    Authors: Jiayi Nie, Haoran Wu, Yao Lai, Zeyu Cao, Cheng Zhang, Binglei Lou, Erwei Wang, Jianyi Cheng, Timothy M. Jones, Robert Mullins, Rika Antonova, Yiren Zhao

    Abstract: New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels -- a time-consuming, laborious, and error-prone process that cannot scale across diverse hardware targets. This prevents emerging hardware platforms from reaching the market efficiently. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it… ▽ More

    Submitted 10 February, 2026; originally announced March 2026.

  9. arXiv:2603.07918  [pdf, ps, other

    cs.CV

    Enhancing Unregistered Hyperspectral Image Super-Resolution via Unmixing-based Abundance Fusion Learning

    Authors: Yingkai Zhang, Tao Zhang, Jing Nie, Ying Fu

    Abstract: Unregistered hyperspectral image (HSI) super-resolution (SR) typically aims to enhance a low-resolution HSI using an unregistered high-resolution reference image. In this paper, we propose an unmixing-based fusion framework that decouples spatial-spectral information to simultaneously mitigate the impact of unregistered fusion and enhance the learnability of SR models. Specifically, we first utili… ▽ More

    Submitted 8 March, 2026; originally announced March 2026.

  10. arXiv:2602.24133  [pdf, ps, other

    cs.CV

    FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking

    Authors: Sifan Zhou, Jiahao Nie, Ziyu Zhao, Yichao Cao, Xiaobo Lu

    Abstract: In 3D point cloud object tracking, the motion-centric methods have emerged as a promising avenue due to its superior performance in modeling inter-frame motion. However, existing two-stage motion-based approaches suffer from fundamental limitations: (1) error accumulation due to decoupled optimization caused by explicit foreground segmentation prior to motion estimation, and (2) computational bott… ▽ More

    Submitted 15 March, 2026; v1 submitted 27 February, 2026; originally announced February 2026.

    Comments: Acceptted in ACM MM 2025

  11. arXiv:2602.22547  [pdf, ps, other

    cs.IR cs.LG

    Towards Dynamic Dense Retrieval with Routing Strategy

    Authors: Zhan Su, Fengran Mo, Jinghan Zhang, Yuchen Hui, Jia Ao Sun, Bingbing Wen, Jian-Yun Nie

    Abstract: The \textit{de facto} paradigm for applying dense retrieval (DR) to new tasks involves fine-tuning a pre-trained model for a specific task. However, this paradigm has two significant limitations: (1) It is difficult adapt the DR to a new domain if the training dataset is limited. (2) Old DR models are simply replaced by newer models that are trained from scratch when the former are no longer up… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

  12. arXiv:2602.20019  [pdf, ps, other

    cs.LG cs.AI

    Learning Discriminative and Generalizable Anomaly Detector for Dynamic Graph with Limited Supervision

    Authors: Yuxing Tian, Yiyan Qi, Fengran Mo, Weixu Zhang, Jian Guo, Jian-Yun Nie

    Abstract: Dynamic graph anomaly detection (DGAD) is critical for many real-world applications but remains challenging due to the scarcity of labeled anomalies. Existing methods are either unsupervised or semi-supervised: unsupervised methods avoid the need for labeled anomalies but often produce ambiguous boundary, whereas semi-supervised methods can overfit to the limited labeled anomalies and generalize p… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

    Comments: 21 pages, 7 figures

  13. arXiv:2602.19969  [pdf, ps, other

    cs.CL cs.AI

    ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting

    Authors: Yuxing Tian, Fengran Mo, Weixu Zhang, Yiyan Qi, Jian-Yun Nie

    Abstract: The strong capabilities of recent Large Language Models (LLMs) have made them highly effective for zero-shot re-ranking task. Attention-based re-ranking methods, which derive relevance scores directly from attention weights, offer an efficient and interpretable alternative to generation-based re-ranking methods. However, they still face two major limitations. First, attention signals are highly co… ▽ More

    Submitted 23 February, 2026; originally announced February 2026.

    Comments: Accepted by EACL2026

  14. arXiv:2602.16990  [pdf, ps, other

    cs.AI cs.CE

    Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

    Authors: Yan Wang, Yi Han, Lingfei Qian, Yueru He, Xueqing Peng, Dongji Feng, Zhuohan Xie, Vincent Jim Zhang, Rosie Guo, Fengran Mo, Jimin Huang, Yankai Chen, Xue Liu, Jian-Yun Nie

    Abstract: Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy or short-sighted under market volatility and may conflict with a user's long-term goals. Treating what users chose as the sole ground truth, therefore, conflates behavioral imitation with decision quality. We introduce Conv-FinRe, a conversational and longi… ▽ More

    Submitted 18 February, 2026; originally announced February 2026.

  15. arXiv:2602.12783  [pdf, ps, other

    cs.IR cs.AI

    SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

    Authors: Yuejie Li, Ke Yang, Yueying Hua, Berlin Chen, Jianhao Nie, Yueping He, Caixin Kang

    Abstract: Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a robustness benchmark for spok… ▽ More

    Submitted 13 February, 2026; originally announced February 2026.

  16. arXiv:2602.10016  [pdf, ps, other

    cs.IR cs.AI

    Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

    Authors: Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen, Yue Dong, Yi Yang, Shuo Chang, Xiaorui Gan, Wenlin Chen, Santanu Kolay, Darren Liu, Jade Nie , et al. (4 additional authors not shown)

    Abstract: Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify… ▽ More

    Submitted 13 February, 2026; v1 submitted 10 February, 2026; originally announced February 2026.

    Comments: 10 pages, 4 figures

  17. arXiv:2602.05218  [pdf, ps, other

    cs.CV

    Boosting SAM for Cross-Domain Few-Shot Segmentation via Conditional Point Sparsification

    Authors: Jiahao Nie, Yun Xing, Wenbin An, Qingsong Zhao, Jiawei Shao, Yap-Peng Tan, Alex C. Kot, Shijian Lu, Xuelong Li

    Abstract: Motivated by the success of the Segment Anything Model (SAM) in promptable segmentation, recent studies leverage SAM to develop training-free solutions for few-shot segmentation, which aims to predict object masks in the target image based on a few reference exemplars. These SAM-based methods typically rely on point matching between reference and target images and use the matched dense points as p… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  18. arXiv:2602.05217  [pdf, ps, other

    cs.CV

    Cross-Domain Few-Shot Segmentation via Multi-view Progressive Adaptation

    Authors: Jiahao Nie, Guanqiao Fu, Wenbin An, Yap-Peng Tan, Alex C. Kot, Shijian Lu

    Abstract: Cross-Domain Few-Shot Segmentation aims to segment categories in data-scarce domains conditioned on a few exemplars. Typical methods first establish few-shot capability in a large-scale source domain and then adapt it to target domains. However, due to the limited quantity and diversity of target samples, existing methods still exhibit constrained performance. Moreover, the source-trained model's… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  19. arXiv:2602.05215  [pdf, ps, other

    cs.CV

    E.M.Ground: A Temporal Grounding Vid-LLM with Holistic Event Perception and Matching

    Authors: Jiahao Nie, Wenbin An, Gongjie Zhang, Yicheng Xu, Yap-Peng Tan, Alex C. Kot, Shijian Lu

    Abstract: Despite recent advances in Video Large Language Models (Vid-LLMs), Temporal Video Grounding (TVG), which aims to precisely localize time segments corresponding to query events, remains a significant challenge. Existing methods often match start and end frames by comparing frame features with two separate tokens, relying heavily on exact timestamps. However, this approach fails to capture the event… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  20. arXiv:2601.20706  [pdf, ps, other

    cs.AR cs.AI cs.DC

    Beyond GEMM-Centric NPUs: Enabling Efficient Diffusion LLM Sampling

    Authors: Binglei Lou, Haoran Wu, Yao Lai, Jiayi Nie, Can Xiao, Xuan Guo, Rika Antonova, Robert Mullins, Aaron Zhao

    Abstract: Diffusion Large Language Models (dLLMs) introduce iterative denoising to enable parallel token generation, but their sampling phase displays fundamentally different characteristics compared to GEMM-centric transformer layers. Profiling on modern GPUs reveals that sampling can account for up to 70% of total model inference latency-primarily due to substantial memory loads and writes from vocabulary… ▽ More

    Submitted 28 January, 2026; originally announced January 2026.

  21. arXiv:2601.20318  [pdf, ps, other

    cs.CV

    CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting

    Authors: Jiyuan Xu, Wenyu Zhang, Xin Jing, Shuai Chen, Shuai Zhang, Jiahao Nie

    Abstract: Current methods for multivariate time series forecasting can be classified into channel-dependent and channel-independent models. Channel-dependent models learn cross-channel features but often overfit the channel ordering, which hampers adaptation when channels are added or reordered. Channel-independent models treat each channel in isolation to increase flexibility, yet this neglects inter-chann… ▽ More

    Submitted 27 February, 2026; v1 submitted 28 January, 2026; originally announced January 2026.

    Comments: 22 pages, 10 figures, ICLR 2026

  22. arXiv:2601.14896  [pdf, ps, other

    cs.CL

    Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation

    Authors: Rui Qi, Fengran Mo, Yufeng Chen, Xue Zhang, Shuo Wang, Hongliang Li, Jinan Xu, Meng Jiang, Jian-Yun Nie, Kaiyu Huang

    Abstract: Multilingual retrieval-augmented generation (MRAG) requires models to effectively acquire and integrate beneficial external knowledge from multilingual collections. However, most existing studies employ a unitive process where queries of equivalent semantics across different languages are processed through a single-turn retrieval and subsequent optimization. Such a ``one-size-fits-all'' strategy i… ▽ More

    Submitted 21 January, 2026; originally announced January 2026.

  23. arXiv:2601.14716  [pdf, ps, other

    cs.LG cs.AI cs.CL

    PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning

    Authors: Yao Lu, Dengdong Fan, Jianzheng Nie, Fan Xu, Jie Chen, Bin Zhou, Yonghong Tian

    Abstract: We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model… ▽ More

    Submitted 21 January, 2026; originally announced January 2026.

  24. arXiv:2601.09028  [pdf, ps, other

    cs.CL cs.AI cs.IR

    OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG

    Authors: Fengran Mo, Zhan Su, Yuchen Hui, Jinghan Zhang, Jia Ao Sun, Zheyuan Liu, Chao Zhang, Tetsuya Sakai, Jian-Yun Nie

    Abstract: The development of large language models (LLMs) has achieved superior performance in a range of downstream tasks, including LLM-based retrieval-augmented generation (RAG). The quality of generated content heavily relies on the usefulness of the retrieved information and the capacity of LLMs' internal information processing mechanism to incorporate it in answer generation. It is generally assumed t… ▽ More

    Submitted 23 January, 2026; v1 submitted 13 January, 2026; originally announced January 2026.

    Comments: Accepted by ACM WWW 2026

  25. arXiv:2512.20556  [pdf, ps, other

    cs.CV

    Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios

    Authors: Mingwei Tang, Jiahao Nie, Guang Yang, Ziqing Cui, Jie Li

    Abstract: Image fusion aims to synthesize a single high-quality image from a pair of inputs captured under challenging conditions, such as differing exposure levels or focal depths. A core challenge lies in effectively handling disparities in dynamic range and focus depth between the inputs. With the advent of vision-language models, recent methods incorporate textual descriptions as auxiliary guidance to e… ▽ More

    Submitted 23 December, 2025; originally announced December 2025.

    Comments: Accepted to WACV 2026

  26. arXiv:2512.09200  [pdf, ps, other

    cs.IR

    Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations

    Authors: Liang Luo, Yuxin Chen, Zhengyu Zhang, Mengyue Hang, Andrew Gu, Buyun Zhang, Boyang Liu, Chen Chen, Chengze Fan, Dong Liang, Fan Yang, Feifan Gu, Huayu Li, Jade Nie, Jiayi Xu, Jiyan Yang, Jongsoo Park, Laming Chen, Longhao Jin, Qianru Li, Qin Huang, Shali Jiang, Shiwen Shen, Shuaiwen Wang, Sihan Zeng , et al. (17 additional authors not shown)

    Abstract: The rapidly evolving landscape of products, surfaces, policies, and regulations poses significant challenges for deploying state-of-the-art recommendation models at industry scale, primarily due to data fragmentation across domains and escalating infrastructure costs that hinder sustained quality improvements. To address this challenge, we propose Lattice, a recommendation framework centered aro… ▽ More

    Submitted 14 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

    Comments: Accepted to KDD 2026

  27. arXiv:2511.18171  [pdf, ps, other

    cs.AI

    BPMN to PDDL: Translating Business Workflows for AI Planning

    Authors: Jasper Nie, Christian Muise, Victoria Armstrong

    Abstract: Business Process Model and Notation (BPMN) is a widely used standard for modelling business processes. While automated planning has been proposed as a method for simulating and reasoning about BPMN workflows, most implementations remain incomplete or limited in scope. This project builds upon prior theoretical work to develop a functional pipeline that translates BPMN 2.0 diagrams into PDDL repres… ▽ More

    Submitted 22 November, 2025; originally announced November 2025.

    Comments: 8 pages, 3 figures. Code and generated PDDL outputs available at https://github.com/QuMuLab/bpmn-to-pddl-translation

    ACM Class: I.2.8; D.2.11

  28. arXiv:2511.17946  [pdf, ps, other

    cs.CL cs.AI

    Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models

    Authors: Shuo Zhang, Fabrizio Gotti, Fengran Mo, Jian-Yun Nie

    Abstract: Hallucination in large language models (LLMs) is a fundamental challenge, particularly in open-domain question answering. Prior work attempts to detect hallucination with model-internal signals such as token-level entropy or generation consistency, while the connection between pretraining data exposure and hallucination is underexplored. Existing studies show that LLMs underperform on long-tail kn… ▽ More

    Submitted 22 November, 2025; originally announced November 2025.

  29. arXiv:2511.17196  [pdf, ps, other

    cs.CV

    Real Noise Decoupling for Hyperspectral Image Denoising

    Authors: Yingkai Zhang, Tao Zhang, Jing Nie, Ying Fu

    Abstract: Hyperspectral image (HSI) denoising is a crucial step in enhancing the quality of HSIs. Noise modeling methods can fit noise distributions to generate synthetic HSIs to train denoising networks. However, the noise in captured HSIs is usually complex and difficult to model accurately, which significantly limits the effectiveness of these approaches. In this paper, we propose a multi-stage noise-dec… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

  30. arXiv:2511.17044  [pdf, ps, other

    cs.IR

    Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters

    Authors: Zhan Su, Fengran Mo, Jinghan Zhang, Yuchen Hui, Jiaao Sun, Jian-yun Nie

    Abstract: Parametric Retrieval-Augmented Generation (PRAG) is a RAG approach that integrates external knowledge directly into model parameters using a LoRA adapter, aiming at reducing the inference cost compared to traditional RAG. However, current PRAG approaches adopt a \textit{one-to-one} document encoding scheme, using a dedicated LoRA adapter for each individual document. This scheme introduces two maj… ▽ More

    Submitted 23 January, 2026; v1 submitted 21 November, 2025; originally announced November 2025.

  31. arXiv:2511.15580  [pdf, ps, other

    cs.CV cs.AI

    CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

    Authors: Sifan Zhou, Yichao Cao, Jiahao Nie, Yuqian Fu, Ziyu Zhao, Xiaobo Lu, Shuo Wang

    Abstract: 3D single object tracking (SOT) in LiDAR point clouds is a critical task in computer vision and autonomous driving. Despite great success having been achieved, the inherent sparsity of point clouds introduces a dual-redundancy challenge that limits existing trackers: (1) vast spatial redundancy from background noise impairs accuracy, and (2) informational redundancy within the foreground hinders e… ▽ More

    Submitted 22 November, 2025; v1 submitted 19 November, 2025; originally announced November 2025.

    Comments: Accepted by AAAI 2026 (Oral)

  32. arXiv:2511.07803  [pdf, ps, other

    cs.CY cs.AI

    Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring

    Authors: Wenhao Xu, Akshatha Arodi, Jian-Yun Nie, Arsene Fansi Tchango

    Abstract: Modern slavery affects millions of people worldwide, and regulatory frameworks such as Modern Slavery Acts now require companies to publish detailed disclosures. However, these statements are often vague and inconsistent, making manual review time-consuming and difficult to scale. While NLP offers a promising path forward, high-stakes compliance tasks require more than accurate classification: the… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

    Comments: To appear at AAAI-26 (Social Impact Track)

  33. arXiv:2511.01293  [pdf, ps, other

    cs.CV

    Detecting Generated Images by Fitting Natural Image Distributions

    Authors: Yonggang Zhang, Jun Nie, Xinmei Tian, Mingming Gong, Kun Zhang, Bo Han

    Abstract: The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the quantity and quality of available generated images. In this work, we propose a novel framework that exploits geometric differences between the data manifolds of nat… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

    Comments: 25 pages, 9 figures, NeurIPS 2025 spotlight

  34. arXiv:2510.11695  [pdf, ps, other

    cs.CL

    When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents

    Authors: Lingfei Qian, Xueqing Peng, Yan Wang, Vincent Jim Zhang, Huan He, Hanley Smith, Yi Han, Yueru He, Haohang Li, Yupeng Cao, Yangyang Yu, Alejandro Lopez-Lira, Peng Lu, Jian-Yun Nie, Guojun Xiong, Jimin Huang, Sophia Ananiadou

    Abstract: Although Large Language Model (LLM)-based agents are increasingly used in financial trading, it remains unclear whether they can reason and adapt in live markets, as most studies test models instead of agents, cover limited periods and assets, and rely on unverified data. To address these gaps, we introduce Agent Market Arena (AMA), the first lifelong, real-time benchmark for evaluating LLM-based… ▽ More

    Submitted 29 October, 2025; v1 submitted 13 October, 2025; originally announced October 2025.

  35. arXiv:2510.08886  [pdf, ps, other

    cs.CL cs.CE cs.IR

    FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

    Authors: Yan Wang, Keyi Wang, Shanshan Yang, Jaisal Patel, Jeff Zhao, Fengran Mo, Xueqing Peng, Lingfei Qian, Jimin Huang, Guojun Xiong, Yankai Chen, Víctor Gutiérrez-Basulto, Xiao-Yang Liu, Xue Liu, Jian-Yun Nie

    Abstract: Going beyond simple text processing, financial auditing requires detecting semantic, structural, and numerical inconsistencies across large-scale disclosures. As financial reports are filed in XBRL, a structured XML format governed by accounting standards, auditing becomes a structured information extraction and reasoning problem involving concept alignment, taxonomy-defined relations, and cross-d… ▽ More

    Submitted 18 February, 2026; v1 submitted 9 October, 2025; originally announced October 2025.

  36. arXiv:2510.08825  [pdf, ps, other

    cs.CL

    Search-on-Graph: Iterative Informed Navigation for Large Language Model Reasoning on Knowledge Graphs

    Authors: Jia Ao Sun, Hao Yu, Fabrizio Gotti, Fengran Mo, Yihong Wu, Yuchen Hui, Jian-Yun Nie

    Abstract: Large language models (LLMs) have demonstrated impressive reasoning abilities yet remain unreliable on knowledge-intensive, multi-hop questions -- they miss long-tail facts, hallucinate when uncertain, and their internal knowledge lags behind real-world change. Knowledge graphs (KGs) offer a structured source of relational evidence, but existing KGQA methods face fundamental trade-offs: compiling… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

  37. arXiv:2510.00977  [pdf, ps, other

    cs.LG cs.CL

    It Takes Two: Your GRPO Is Secretly DPO

    Authors: Yihong Wu, Liheng Ma, Lei Ding, Muzhi Li, Xinyu Wang, Kejia Chen, Zhan Su, Zhanguang Zhang, Chenyang Huang, Yingxue Zhang, Mark Coates, Jian-Yun Nie

    Abstract: Group Relative Policy Optimization (GRPO) has emerged as a prominent reinforcement learning algorithm for post-training Large Language Models. Different from critic-based methods such as PPO, GRPO estimates the advantage function using group-level statistics to reduce the variance of policy gradient estimators. While the prevailing view attributes GRPO's effectiveness to large group sizes for accu… ▽ More

    Submitted 29 January, 2026; v1 submitted 1 October, 2025; originally announced October 2025.

  38. arXiv:2509.24214  [pdf, ps, other

    cs.CV

    Scalable Audio-Visual Masked Autoencoders for Efficient Affective Video Facial Analysis

    Authors: Xuecheng Wu, Junxiao Xue, Xinyi Yin, Yunyun Shi, Liangyu Fu, Danlei Huang, Yifan Wang, Jia Zhang, Jiayu Nie, Jun Wang

    Abstract: Affective video facial analysis (AVFA) has emerged as a key research field for building emotion-aware intelligent systems, yet this field continues to suffer from limited data availability. In recent years, the self-supervised learning (SSL) technique of Masked Autoencoders (MAE) has gained momentum, with growing adaptations in its audio-visual contexts. While scaling has proven essential for brea… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  39. arXiv:2509.22951  [pdf, ps, other

    cs.PF cs.AI

    Tiny-QMoE

    Authors: Jack Cashman, Jiaqi Nie

    Abstract: The QMoE model provides a practical approach for compression of massive Mixture-of-Experts (MoE) models. QMoE offers a solution geared towards memory limitations that often reach terabyte scales, and it has the advantage of working with high sparsity models which implicitly lend themselves to compression techniques. QMoE also has the advantage of only taking MoE models into account and does not ev… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  40. arXiv:2509.18137  [pdf, ps, other

    cs.LG cs.AI

    LoRALib: A Standardized Benchmark for Evaluating LoRA-MoE Methods

    Authors: Shaoheng Wang, Yao Lu, Yuqi Li, Yaxin Gao, Jiaqi Nie, Shanqing Yu, Yingli Tian, Qi Xuan

    Abstract: As a parameter efficient fine-tuning (PEFT) method, low-rank adaptation (LoRA) can save significant costs in storage and computing, but its strong adaptability to a single task is often accompanied by insufficient cross-task generalization capabilities. To improve this, existing work combines LoRA with mixture-of-experts (MoE) to enhance the model's adaptability through expert modules and routing… ▽ More

    Submitted 14 September, 2025; originally announced September 2025.

  41. arXiv:2509.15473  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Breathing and Semantic Pause Detection and Exertion-Level Classification in Post-Exercise Speech

    Authors: Yuyu Wang, Wuyue Xia, Huaxiu Yao, Jingping Nie

    Abstract: Post-exercise speech contains rich physiological and linguistic cues, often marked by semantic pauses, breathing pauses, and combined breathing-semantic pauses. Detecting these events enables assessment of recovery rate, lung function, and exertion-related abnormalities. However, existing works on identifying and distinguishing different types of pauses in this context are limited. In this work, b… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: 6 pages, 3rd ACM International Workshop on Intelligent Acoustic Systems and Applications (IASA 25)

  42. arXiv:2509.13723  [pdf, ps, other

    cs.CL

    DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning

    Authors: Yaxin Gao, Yao Lu, Zongfei Zhang, Jiaqi Nie, Shanqing Yu, Qi Xuan

    Abstract: Large language models (LLMs) have achieved remarkable success in many natural language processing (NLP) tasks. To achieve more accurate output, the prompts used to drive LLMs have become increasingly longer, which incurs higher computational costs. To address this prompt inflation problem, prompt compression has been proposed. However, most existing methods require training a small auxiliary model… ▽ More

    Submitted 18 September, 2025; v1 submitted 17 September, 2025; originally announced September 2025.

  43. arXiv:2509.10070  [pdf, ps, other

    quant-ph cs.DM cs.DS math.CO

    Toward Minimum Graphic Parity Networks

    Authors: Yixin Cao, Yiren Lu, Junhong Nie, Xiaoming Sun, Guojing Tian

    Abstract: Quantum circuits composed of CNOT and $R_z$ are fundamental building blocks of many quantum algorithms, so optimizing the synthesis of such quantum circuits is crucial. We address this problem from a theoretical perspective by studying the graphic parity network synthesis problem. A graphic parity network for a graph $G$ is a quantum circuit composed solely of CNOT gates where each edge of $G$ is… ▽ More

    Submitted 12 September, 2025; originally announced September 2025.

  44. arXiv:2509.09505  [pdf, ps, other

    cs.AR

    Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference

    Authors: Haoran Wu, Can Xiao, Jiayi Nie, Xuan Guo, Binglei Lou, Jeffrey T. H. Wong, Zhiwen Mo, Cheng Zhang, Przemyslaw Forys, Wayne Luk, Hongxiang Fan, Jianyi Cheng, Timothy M. Jones, Rika Antonova, Robert Mullins, Aaron Zhao

    Abstract: LLMs now form the backbone of AI agents for a diverse array of applications, including tool use, command-line agents, and web or computer use agents. These agentic LLM inference tasks are fundamentally different from chatbot-focused inference -- they often have much larger context lengths to capture complex, prolonged inputs, such as entire webpage DOMs or complicated tool call trajectories. This,… ▽ More

    Submitted 24 September, 2025; v1 submitted 11 September, 2025; originally announced September 2025.

  45. arXiv:2508.12271  [pdf, ps, other

    cs.CV

    SNNSIR: A Simple Spiking Neural Network for Stereo Image Restoration

    Authors: Ronghua Xu, Jin Xie, Jing Nie, Jiale Cao, Yanwei Pang

    Abstract: Spiking Neural Networks (SNNs), characterized by discrete binary activations, offer high computational efficiency and low energy consumption, making them well-suited for computation-intensive tasks such as stereo image restoration. In this work, we propose SNNSIR, a simple yet effective Spiking Neural Network for Stereo Image Restoration, specifically designed under the spike-driven paradigm where… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

    Comments: 11 pages

  46. arXiv:2508.10955  [pdf, ps, other

    cs.CV cs.CL cs.MM

    Empowering Multimodal LLMs with External Tools: A Comprehensive Survey

    Authors: Wenbin An, Jiahao Nie, Yaqiang Wu, Feng Tian, Shijian Lu, Qinghua Zheng

    Abstract: By integrating the perception capabilities of multimodal encoders with the generative power of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), exemplified by GPT-4V, have achieved great success in various multimodal tasks, pointing toward a promising pathway to artificial general intelligence. Despite this progress, the limited quality of multimodal data, poor performance o… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

    Comments: 21 pages, 361 references

  47. arXiv:2508.08634  [pdf, ps, other

    cs.IR cs.CL

    Adaptive Personalized Conversational Information Retrieval

    Authors: Fengran Mo, Yuchen Hui, Yuxing Tian, Zhaoxuan Tan, Chuan Meng, Zhan Su, Kaiyu Huang, Jian-Yun Nie

    Abstract: Personalized conversational information retrieval (CIR) systems aim to satisfy users' complex information needs through multi-turn interactions by considering user profiles. However, not all search queries require personalization. The challenge lies in appropriately incorporating personalization elements into search when needed. Most existing studies implicitly incorporate users' personal informat… ▽ More

    Submitted 12 August, 2025; originally announced August 2025.

    Comments: Accepted by CIKM 2025

  48. arXiv:2508.06902  [pdf, ps, other

    cs.CV

    eMotions: A Large-Scale Dataset and Audio-Visual Fusion Network for Emotion Analysis in Short-form Videos

    Authors: Xuecheng Wu, Dingkang Yang, Danlei Huang, Xinyi Yin, Yifan Wang, Jia Zhang, Jiayu Nie, Liangyu Fu, Yang Liu, Junxiao Xue, Hadi Amirpour, Wei Zhou

    Abstract: Short-form videos (SVs) have become a vital part of our online routine for acquiring and sharing information. Their multimodal complexity poses new challenges for video analysis, highlighting the need for video emotion analysis (VEA) within the community. Given the limited availability of SVs emotion data, we introduce eMotions, a large-scale dataset consisting of 27,996 videos with full-scale ann… ▽ More

    Submitted 9 August, 2025; originally announced August 2025.

  49. arXiv:2508.04001  [pdf, ps, other

    cs.IR cs.CL

    ConvMix: A Mixed-Criteria Data Augmentation Framework for Conversational Dense Retrieval

    Authors: Fengran Mo, Jinghan Zhang, Yuchen Hui, Jia Ao Sun, Zhichao Xu, Zhan Su, Jian-Yun Nie

    Abstract: Conversational search aims to satisfy users' complex information needs via multiple-turn interactions. The key challenge lies in revealing real users' search intent from the context-dependent queries. Previous studies achieve conversational search by fine-tuning a conversational dense retriever with relevance judgments between pairs of context-dependent queries and documents. However, this trainin… ▽ More

    Submitted 12 November, 2025; v1 submitted 5 August, 2025; originally announced August 2025.

    Comments: Accepted by AAAI 2026

  50. arXiv:2508.03999  [pdf, ps, other

    cs.LG

    Tensorized Clustered LoRA Merging for Multi-Task Interference

    Authors: Zhan Su, Fengran Mo, Guojun Liang, Jinghan Zhang, Bingbing Wen, Prayag Tiwari, Jian-Yun Nie

    Abstract: Despite the success of the monolithic dense paradigm of large language models (LLMs), the LoRA adapters offer an efficient solution by fine-tuning small task-specific modules and merging them with the base model. However, in multi-task settings, merging LoRA adapters trained on heterogeneous sources frequently causes \textit{task interference}, degrading downstream performance. To address this, we… ▽ More

    Submitted 5 August, 2025; originally announced August 2025.