-
Towards Arbitrary Motion Completing via Hierarchical Continuous Representation
Authors:
Chenghao Xu,
Guangtao Lyu,
Qi Liu,
Jiexi Yan,
Muli Yang,
Cheng Deng
Abstract:
Physical motions are inherently continuous, and higher camera frame rates typically contribute to improved smoothness and temporal coherence. For the first time, we explore continuous representations of human motion sequences, featuring the ability to interpolate, inbetween, and even extrapolate any input motion sequences at arbitrary frame rates. To achieve this, we propose a novel parametric act…
▽ More
Physical motions are inherently continuous, and higher camera frame rates typically contribute to improved smoothness and temporal coherence. For the first time, we explore continuous representations of human motion sequences, featuring the ability to interpolate, inbetween, and even extrapolate any input motion sequences at arbitrary frame rates. To achieve this, we propose a novel parametric activation-induced hierarchical implicit representation framework, referred to as NAME, based on Implicit Neural Representations (INRs). Our method introduces a hierarchical temporal encoding mechanism that extracts features from motion sequences at multiple temporal scales, enabling effective capture of intricate temporal patterns. Additionally, we integrate a custom parametric activation function, powered by Fourier transformations, into the MLP-based decoder to enhance the expressiveness of the continuous representation. This parametric formulation significantly augments the model's ability to represent complex motion behaviors with high accuracy. Extensive evaluations across several benchmark datasets demonstrate the effectiveness and robustness of our proposed approach.
△ Less
Submitted 24 December, 2025;
originally announced December 2025.
-
A Turn Toward Better Alignment: Few-Shot Generative Adaptation with Equivariant Feature Rotation
Authors:
Chenghao Xu,
Qi Liu,
Jiexi Yan,
Muli Yang,
Cheng Deng
Abstract:
Few-shot image generation aims to effectively adapt a source generative model to a target domain using very few training images. Most existing approaches introduce consistency constraints-typically through instance-level or distribution-level loss functions-to directly align the distribution patterns of source and target domains within their respective latent spaces. However, these strategies ofte…
▽ More
Few-shot image generation aims to effectively adapt a source generative model to a target domain using very few training images. Most existing approaches introduce consistency constraints-typically through instance-level or distribution-level loss functions-to directly align the distribution patterns of source and target domains within their respective latent spaces. However, these strategies often fall short: overly strict constraints can amplify the negative effects of the domain gap, leading to distorted or uninformative content, while overly relaxed constraints may fail to leverage the source domain effectively. This limitation primarily stems from the inherent discrepancy in the underlying distribution structures of the source and target domains. The scarcity of target samples further compounds this issue by hindering accurate estimation of the target domain's distribution. To overcome these limitations, we propose Equivariant Feature Rotation (EFR), a novel adaptation strategy that aligns source and target domains at two complementary levels within a self-rotated proxy feature space. Specifically, we perform adaptive rotations within a parameterized Lie Group to transform both source and target features into an equivariant proxy space, where alignment is conducted. These learnable rotation matrices serve to bridge the domain gap by preserving intra-domain structural information without distortion, while the alignment optimization facilitates effective knowledge transfer from the source to the target domain. Comprehensive experiments on a variety of commonly used datasets demonstrate that our method significantly enhances the generative performance within the targeted domain.
△ Less
Submitted 24 December, 2025;
originally announced December 2025.
-
Semantic Radio Access Networks: Architecture, State-of-the-Art, and Future Directions
Authors:
Rui Meng,
Zixuan Huang,
Jingshu Yan,
Mengying Sun,
Yiming Liu,
Chenyuan Feng,
Xiaodong Xu,
Zhidi Zhang,
Song Gao,
Ping Zhang,
Tony Q. S. Quek
Abstract:
Radio Access Network (RAN) is a bridge between user devices and the core network in mobile communication systems, responsible for the transmission and reception of wireless signals and air interface management. In recent years, Semantic Communication (SemCom) has represented a transformative communication paradigm that prioritizes meaning-level transmission over conventional bit-level delivery, th…
▽ More
Radio Access Network (RAN) is a bridge between user devices and the core network in mobile communication systems, responsible for the transmission and reception of wireless signals and air interface management. In recent years, Semantic Communication (SemCom) has represented a transformative communication paradigm that prioritizes meaning-level transmission over conventional bit-level delivery, thus providing improved spectrum efficiency, anti-interference ability in complex environments, flexible resource allocation, and enhanced user experience for RAN. However, there is still a lack of comprehensive reviews on the integration of SemCom into RAN. Motivated by this, we systematically explore recent advancements in Semantic RAN (SemRAN). We begin by introducing the fundamentals of RAN and SemCom, identifying the limitations of conventional RAN, and outlining the overall architecture of SemRAN. Subsequently, we review representative techniques of SemRAN across physical layer, data link layer, network layer, and security plane. Furthermore, we envision future services and applications enabled by SemRAN, alongside its current standardization progress. Finally, we conclude by identifying critical research challenges and outlining forward-looking directions to guide subsequent investigations in this burgeoning field.
△ Less
Submitted 23 December, 2025;
originally announced December 2025.
-
Revealing Perception and Generation Dynamics in LVLMs: Mitigating Hallucinations via Validated Dominance Correction
Authors:
Guangtao Lyu,
Xinyi Cheng,
Chenghao Xu,
Qi Liu,
Muli Yang,
Fen Fang,
Huilin Chen,
Jiexi Yan,
Xu Yang,
Cheng Deng
Abstract:
Large Vision-Language Models (LVLMs) have shown remarkable capabilities, yet hallucinations remain a persistent challenge. This work presents a systematic analysis of the internal evolution of visual perception and token generation in LVLMs, revealing two key patterns. First, perception follows a three-stage GATE process: early layers perform a Global scan, intermediate layers Approach and Tighten…
▽ More
Large Vision-Language Models (LVLMs) have shown remarkable capabilities, yet hallucinations remain a persistent challenge. This work presents a systematic analysis of the internal evolution of visual perception and token generation in LVLMs, revealing two key patterns. First, perception follows a three-stage GATE process: early layers perform a Global scan, intermediate layers Approach and Tighten on core content, and later layers Explore supplementary regions. Second, generation exhibits an SAD (Subdominant Accumulation to Dominant) pattern, where hallucinated tokens arise from the repeated accumulation of subdominant tokens lacking support from attention (visual perception) or feed-forward network (internal knowledge). Guided by these findings, we devise the VDC (Validated Dominance Correction) strategy, which detects unsupported tokens and replaces them with validated dominant ones to improve output reliability. Extensive experiments across multiple models and benchmarks confirm that VDC substantially mitigates hallucinations.
△ Less
Submitted 21 December, 2025;
originally announced December 2025.
-
Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation
Authors:
Guangtao Lyu,
Chenghao Xu,
Qi Liu,
Jiexi Yan,
Muli Yang,
Fen Fang,
Cheng Deng
Abstract:
Music to 3D dance generation aims to synthesize realistic and rhythmically synchronized human dance from music. While existing methods often rely on additional genre labels to further improve dance generation, such labels are typically noisy, coarse, unavailable, or insufficient to capture the diversity of real-world music, which can result in rhythm misalignment or stylistic drift. In contrast, w…
▽ More
Music to 3D dance generation aims to synthesize realistic and rhythmically synchronized human dance from music. While existing methods often rely on additional genre labels to further improve dance generation, such labels are typically noisy, coarse, unavailable, or insufficient to capture the diversity of real-world music, which can result in rhythm misalignment or stylistic drift. In contrast, we observe that tempo, a core property reflecting musical rhythm and pace, remains relatively consistent across datasets and genres, typically ranging from 60 to 200 BPM. Based on this finding, we propose TempoMoE, a hierarchical tempo-aware Mixture-of-Experts module that enhances the diffusion model and its rhythm perception. TempoMoE organizes motion experts into tempo-structured groups for different tempo ranges, with multi-scale beat experts capturing fine- and long-range rhythmic dynamics. A Hierarchical Rhythm-Adaptive Routing dynamically selects and fuses experts from music features, enabling flexible, rhythm-aligned generation without manual genre labels. Extensive experiments demonstrate that TempoMoE achieves state-of-the-art results in dance quality and rhythm alignment.
△ Less
Submitted 21 December, 2025;
originally announced December 2025.
-
DSO-VSA: a Variable Stiffness Actuator with Decoupled Stiffness and Output Characteristics for Rehabilitation Robotics
Authors:
Maozeng Zhang,
Ke Shi,
Huijun Li,
Tongshu Chen,
Jiejun Yan,
Aiguo Song
Abstract:
Stroke-induced motor impairment often results in substantial loss of upper-limb function, creating a strong demand for rehabilitation robots that enable safe and transparent physical human-robot interaction (pHRI). Variable stiffness actuators are well suited for such applications. However, in most existing designs, stiffness is coupled with the deflection angle, complicating both modeling and con…
▽ More
Stroke-induced motor impairment often results in substantial loss of upper-limb function, creating a strong demand for rehabilitation robots that enable safe and transparent physical human-robot interaction (pHRI). Variable stiffness actuators are well suited for such applications. However, in most existing designs, stiffness is coupled with the deflection angle, complicating both modeling and control. To address this limitation, this paper presents a variable stiffness actuator featuring decoupled stiffness and output behavior for rehabilitation robotics. The system integrates a variable stiffness mechanism that combines a variable-length lever with a hypocycloidal straight-line mechanism to achieve a linear torque-deflection relationship and continuous stiffness modulation from near zero to theoretically infinite. It also incorporates a differential transmission mechanism based on a planetary gear system that enables dual-motor load sharing. A cascade PI controller is further developed on the basis of the differential configuration, in which the position-loop term jointly regulates stiffness and deflection angle, effectively suppressing stiffness fluctuations and output disturbances. The performance of prototype was experimentally validated through stiffness calibration, stiffness regulation, torque control, decoupled characteristics, and dual-motor load sharing, indicating the potential for rehabilitation exoskeletons and other pHRI systems.
△ Less
Submitted 21 December, 2025;
originally announced December 2025.
-
Lattice-decoupled rotatable stripe-like charge order within the strange metal phase of 2M-WS2
Authors:
Kebin Xiao,
Yunkai Guo,
Daran Fu,
Yuqiang Fang,
Yating Hu,
Jingming Yan,
Yucong Peng,
Yuyang Wang,
Yongkang Ju,
Peizhe Tang,
Xiangang Wan,
Fuqiang Huang,
Qi-Kun Xue,
Wei Li
Abstract:
In quantum materials, charge orders typically stabilize in specific crystallographic orientations, though their formation mechanisms may vary. Here, using low-temperature scanning tunneling microscopy (STM), we discover a lattice-decoupled rotatable stripe-like charge order coexisting with superconductivity in 2M-WS2. The charge order manifests five distinct orientations across different sample re…
▽ More
In quantum materials, charge orders typically stabilize in specific crystallographic orientations, though their formation mechanisms may vary. Here, using low-temperature scanning tunneling microscopy (STM), we discover a lattice-decoupled rotatable stripe-like charge order coexisting with superconductivity in 2M-WS2. The charge order manifests five distinct orientations across different sample regions, yet maintains an identical wavelength. This directional decoupling from host lattice challenges existing paradigms. First-principles calculations of phonon spectra and nesting function fail to explain the ordering mechanism. Intriguingly, the transition temperature of the charge orders exhibits spatial variations (21-46 K), coinciding with the temperature range of the recently reported strange metal phase in this material. This correlation suggests that the interplay between strong electronic correlations and electron-phonon coupling must be critically evaluated to elucidate the emergence of this unconventional charge order.
△ Less
Submitted 20 December, 2025;
originally announced December 2025.
-
External Hippocampus: Topological Cognitive Maps for Guiding Large Language Model Reasoning
Authors:
Jian Yan
Abstract:
This paper proposes the External Hippocampus framework, which models language model reasoning from a cognitive dynamics perspective as the flow of information energy in semantic space. Unlike traditional weight-space optimization methods, this framework constructs topological cognitive maps through dimensionality reduction projection, enabling precise navigation and intervention of energy flow at…
▽ More
This paper proposes the External Hippocampus framework, which models language model reasoning from a cognitive dynamics perspective as the flow of information energy in semantic space. Unlike traditional weight-space optimization methods, this framework constructs topological cognitive maps through dimensionality reduction projection, enabling precise navigation and intervention of energy flow at test time while avoiding substantial computational requirements and demonstrating predictable intervention patterns. The method effectively addresses the cognitive deadlock problem in multi-step reasoning for small models. Experiments on models <=7B parameters show: map-guided methods achieve 81.20% accuracy on 500 challenging problems (relative baseline +16.80%), reduce reasoning time by >= 15x, with key findings revealing that reasoning stagnation manifests as "Cognitive Vortex" and low-entropy potential wells, while temperature perturbations effectively restart energy flow. The framework requires no additional training, possesses autonomous growth capability, and provides an efficient and controllable topological-aware solution for small model reasoning.
△ Less
Submitted 23 December, 2025; v1 submitted 19 December, 2025;
originally announced December 2025.
-
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs
Authors:
Rujiao Long,
Yang Li,
Xingyao Zhang,
Weixun Wang,
Tianqianjin Lin,
Xi Zhao,
Yuchi Xu,
Wenbo Su,
Junchi Yan,
Bo Zheng
Abstract:
Exploration capacity shapes both inference-time performance and reinforcement learning (RL) training for large (vision-) language models, as stochastic sampling often yields redundant reasoning paths with little high-level diversity. This paper proposes Reasoning Palette, a novel latent-modulation framework that endows the model with a stochastic latent variable for strategic contextualization, gu…
▽ More
Exploration capacity shapes both inference-time performance and reinforcement learning (RL) training for large (vision-) language models, as stochastic sampling often yields redundant reasoning paths with little high-level diversity. This paper proposes Reasoning Palette, a novel latent-modulation framework that endows the model with a stochastic latent variable for strategic contextualization, guiding its internal planning prior to token generation. This latent context is inferred from the mean-pooled embedding of a question-answer pair via a variational autoencoder (VAE), where each sampled latent potentially encodes a distinct reasoning context. During inference, a sampled latent is decoded into learnable token prefixes and prepended to the input prompt, modulating the model's internal reasoning trajectory. In this way, the model performs internal sampling over reasoning strategies prior to output generation, which shapes the style and structure of the entire response sequence. A brief supervised fine-tuning (SFT) warm-up phase allows the model to adapt to this latent conditioning. Within RL optimization, Reasoning Palette facilitates structured exploration by enabling on-demand injection for diverse reasoning modes, significantly enhancing exploration efficiency and sustained learning capability. Experiments across multiple reasoning benchmarks demonstrate that our method enables interpretable and controllable control over the (vision-) language model's strategic behavior, thereby achieving consistent performance gains over standard RL methods.
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
Authors:
Wanghan Xu,
Yuhao Zhou,
Yifan Zhou,
Qinglong Cao,
Shuo Li,
Jia Bu,
Bo Liu,
Yixin Chen,
Xuming He,
Xiangyu Zhao,
Xiang Zhuang,
Fengxiang Wang,
Zhiwang Zhou,
Qiantai Feng,
Wenxuan Huang,
Jiaqi Wei,
Hao Wu,
Yuejin Yang,
Guangshuai Wang,
Sheng Xu,
Ziyan Huang,
Xinyao Liu,
Jiyao Liu,
Cheng Tang,
Wei Li
, et al. (82 additional authors not shown)
Abstract:
Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-remains lacking. We present an operational SGI definition grounded in the Practical Inquiry Model (PIM: Deliberation, Conception, Action, Perception) and operationalize it via four scientist-aligned tasks: deep res…
▽ More
Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-remains lacking. We present an operational SGI definition grounded in the Practical Inquiry Model (PIM: Deliberation, Conception, Action, Perception) and operationalize it via four scientist-aligned tasks: deep research, idea generation, dry/wet experiments, and experimental reasoning. SGI-Bench comprises over 1,000 expert-curated, cross-disciplinary samples inspired by Science's 125 Big Questions, enabling systematic evaluation of state-of-the-art LLMs. Results reveal gaps: low exact match (10--20%) in deep research despite step-level alignment; ideas lacking feasibility and detail; high code executability but low execution result accuracy in dry experiments; low sequence fidelity in wet protocols; and persistent multimodal comparative-reasoning challenges. We further introduce Test-Time Reinforcement Learning (TTRL), which optimizes retrieval-augmented novelty rewards at inference, enhancing hypothesis novelty without reference answer. Together, our PIM-grounded definition, workflow-centric benchmark, and empirical insights establish a foundation for AI systems that genuinely participate in scientific discovery.
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding
Authors:
Chenkai Xu,
Yijie Jin,
Jiajun Li,
Yi Tu,
Guoping Long,
Dandan Tu,
Mingcong Song,
Hongjie Si,
Tianqi Hou,
Junchi Yan,
Zhijie Deng
Abstract:
Diffusion Large Language Models (dLLMs) have demonstrated significant potential for high-speed inference. However, current confidence-driven decoding strategies are constrained by limited parallelism, typically achieving only 1--3 tokens per forward pass (TPF). In this work, we identify that the degree of parallelism during dLLM inference is highly sensitive to the Token Filling Order (TFO). Then,…
▽ More
Diffusion Large Language Models (dLLMs) have demonstrated significant potential for high-speed inference. However, current confidence-driven decoding strategies are constrained by limited parallelism, typically achieving only 1--3 tokens per forward pass (TPF). In this work, we identify that the degree of parallelism during dLLM inference is highly sensitive to the Token Filling Order (TFO). Then, we introduce Lookahead PArallel Decoding LoPA, a training-free, plug-and-play algorithm, to identify a superior TFO and hence accelerate inference. LoPA concurrently explores distinct candidate TFOs via parallel branches, and selects the one with the highest potential for future parallelism based on branch confidence. We apply LoPA to the state-of-the-art D2F model and observe a substantial enhancement in decoding efficiency. Notably, LoPA increases the TPF of D2F-Dream to 10.1 on the GSM8K while maintaining performance superior to the Dream baseline. Furthermore, to facilitate this unprecedented degree of parallelism, we develop a specialized multi-device inference system featuring Branch Parallelism (BP), which achieves a single-sample throughput of 1073.9 tokens per second under multi-GPU deployment. The code is available at https://github.com/zhijie-group/LoPA.
△ Less
Submitted 22 December, 2025; v1 submitted 18 December, 2025;
originally announced December 2025.
-
Towards Practical Large-scale Dynamical Heterogeneous Graph Embedding: Cold-start Resilient Recommendation
Authors:
Mabiao Long,
Jiaxi Liu,
Yufeng Li,
Hao Xiong,
Junchi Yan,
Kefan Wang,
Yi Cao,
Jiandong Ding
Abstract:
Deploying dynamic heterogeneous graph embeddings in production faces key challenges of scalability, data freshness, and cold-start. This paper introduces a practical, two-stage solution that balances deep graph representation with low-latency incremental updates. Our framework combines HetSGFormer, a scalable graph transformer for static learning, with Incremental Locally Linear Embedding (ILLE),…
▽ More
Deploying dynamic heterogeneous graph embeddings in production faces key challenges of scalability, data freshness, and cold-start. This paper introduces a practical, two-stage solution that balances deep graph representation with low-latency incremental updates. Our framework combines HetSGFormer, a scalable graph transformer for static learning, with Incremental Locally Linear Embedding (ILLE), a lightweight, CPU-based algorithm for real-time updates. HetSGFormer captures global structure with linear scalability, while ILLE provides rapid, targeted updates to incorporate new data, thus avoiding costly full retraining. This dual approach is cold-start resilient, leveraging the graph to create meaningful embeddings from sparse data. On billion-scale graphs, A/B tests show HetSGFormer achieved up to a 6.11% lift in Advertiser Value over previous methods, while the ILLE module added another 3.22% lift and improved embedding refresh timeliness by 83.2%. Our work provides a validated framework for deploying dynamic graph learning in production environments.
△ Less
Submitted 15 December, 2025;
originally announced December 2025.
-
Experimental Demonstration and Transformation Mechanism of Quenchable Two-dimensional Diamond
Authors:
Jiayin Li,
Guoshuai Du,
Lili Zhao,
Wuxiao Han,
Jiaxin Ming,
Shang Chen,
Pengcheng Zhao,
Lu Bai,
Jiaohui Yan,
Yubing Du,
Jiajia Feng,
Hongliang Dong,
Ke Jin,
Weigao Xu,
Bin Chen,
Jianguo Zhang,
Yabin Chen
Abstract:
Two-dimensional (2D) diamond has aroused tremendous interest in nanoelectronics and optoelectronics, owing to its superior properties and flexible characteristics compared to bulk diamond. Despite significant efforts, great challenges lie in the experimental synthesis and transformation conditions of 2D diamond. Herein, we have demonstrated the experimental preparation of high quality 2D diamond w…
▽ More
Two-dimensional (2D) diamond has aroused tremendous interest in nanoelectronics and optoelectronics, owing to its superior properties and flexible characteristics compared to bulk diamond. Despite significant efforts, great challenges lie in the experimental synthesis and transformation conditions of 2D diamond. Herein, we have demonstrated the experimental preparation of high quality 2D diamond with controlled thickness and distinguished properties, realized by laser-heating few-layer graphene in diamond anvil cell. The quenched 2D diamond exhibited narrow T2g Raman peak (linewidth ~3.6 cm-1) and intense photoluminescence of SiV- (linewidth ~6.1 nm) and NV0 centers. In terms of transformation mechanism, atomic structures of hybrid phase interfaces suggested that the intermediate rhombohedral phase subtly mediate hexagonal graphite to cubic diamond transition. Furthermore, the tunable optical bandgap and thermal stability of 2D diamond sensitively depend on its sp3 concentration. We believe our results can shed light on the structural design and preparation of many carbon allotropes and further uncover the underlying transition mechanism.
△ Less
Submitted 14 December, 2025;
originally announced December 2025.
-
Robust Variational Bayes by Min-Max Median Aggregation
Authors:
Jiawei Yan,
Ju Liu,
Weidong Liu,
Jiyuan Tu
Abstract:
We propose a robust and scalable variational Bayes (VB) framework designed to effectively handle contamination and outliers in dataset. Our approach partitions the data into $m$ disjoint subsets and formulates a joint optimization problem based on robust aggregation principles. A key insight is that the full posterior distribution is equivalent to the minimizer of the mean Kullback-Leibler (KL) di…
▽ More
We propose a robust and scalable variational Bayes (VB) framework designed to effectively handle contamination and outliers in dataset. Our approach partitions the data into $m$ disjoint subsets and formulates a joint optimization problem based on robust aggregation principles. A key insight is that the full posterior distribution is equivalent to the minimizer of the mean Kullback-Leibler (KL) divergence from the $m$-powered local posterior distributions. To enhance robustness, we replace the mean KL divergence with a min-max median formulation. The min-max formulation not only ensures consistency between the KL minimizer and the Evidence Lower Bound (ELBO) maximizer but also facilitates the establishment of improved statistical rates for the mean of variational posterior. We observe a notable discrepancy in the $m$-powered marginal log likelihood function contingent on the presence of local latent variables. To address this, we treat these two scenarios separately to guarantee the consistency of the aggregated variational posterior. Specifically, when local latent variables are present, we introduce an aggregate-and-rescale strategy. Theoretically, we provide a non-asymptotic analysis of our proposed posterior, incorporating a refined analysis of Bernstein-von Mises (BvM) theorem to accommodate a diverging number of subsets $m$. Our findings indicate that the two-stage approach yields a smaller approximation error compared to directly aggregating the $m$-powered local posteriors. Furthermore, we establish a nearly optimal statistical rate for the mean of the proposed posterior, advancing existing theories related to min-max median estimators. The efficacy of our method is demonstrated through extensive simulation studies.
△ Less
Submitted 14 December, 2025;
originally announced December 2025.
-
Analyzing Planner Design Trade-offs for MAPF under Realistic Simulation
Authors:
Jingtian Yan,
Zhifei Li,
William Kang,
Stephen F. Smith,
Jiaoyang Li
Abstract:
Multi-Agent Path Finding (MAPF) algorithms are increasingly deployed in industrial warehouses and automated manufacturing facilities, where robots must operate reliably under real-world physical constraints. However, existing MAPF evaluation frameworks typically rely on simplified robot models, leaving a substantial gap between algorithmic benchmarks and practical performance. Recent frameworks su…
▽ More
Multi-Agent Path Finding (MAPF) algorithms are increasingly deployed in industrial warehouses and automated manufacturing facilities, where robots must operate reliably under real-world physical constraints. However, existing MAPF evaluation frameworks typically rely on simplified robot models, leaving a substantial gap between algorithmic benchmarks and practical performance. Recent frameworks such as SMART, incorporate kinodynamic modeling and offer the MAPF community a platform for large-scale, realistic evaluation. Building on this capability, this work investigates how key planner design choices influence performance under realistic execution settings. We systematically study three fundamental factors: (1) the relationship between solution optimality and execution performance, (2) the sensitivity of system performance to inaccuracies in kinodynamic modeling, and (3) the interaction between model accuracy and plan optimality. Empirically, we examine these factors to understand how these design choices affect performance in realistic scenarios. We highlight open challenges and research directions to steer the community toward practical, real-world deployment.
△ Less
Submitted 10 December, 2025;
originally announced December 2025.
-
Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds
Authors:
Shaofeng Zhang,
Xuanqi Chen,
Xiangdong Zhang,
Sitong Wu,
Junchi Yan
Abstract:
Most existing self-supervised learning (SSL) approaches for 3D point clouds are dominated by generative methods based on Masked Autoencoders (MAE). However, these generative methods have been proven to struggle to capture high-level discriminative features effectively, leading to poor performance on linear probing and other downstream tasks. In contrast, contrastive methods excel in discriminative…
▽ More
Most existing self-supervised learning (SSL) approaches for 3D point clouds are dominated by generative methods based on Masked Autoencoders (MAE). However, these generative methods have been proven to struggle to capture high-level discriminative features effectively, leading to poor performance on linear probing and other downstream tasks. In contrast, contrastive methods excel in discriminative feature representation and generalization ability on image data. Despite this, contrastive learning (CL) in 3D data remains scarce. Besides, simply applying CL methods designed for 2D data to 3D fails to effectively learn 3D local details. To address these challenges, we propose a novel Dual-Branch \textbf{C}enter-\textbf{S}urrounding \textbf{Con}trast (CSCon) framework. Specifically, we apply masking to the center and surrounding parts separately, constructing dual-branch inputs with center-biased and surrounding-biased representations to better capture rich geometric information. Meanwhile, we introduce a patch-level contrastive loss to further enhance both high-level information and local sensitivity. Under the FULL and ALL protocols, CSCon achieves performance comparable to generative methods; under the MLP-LINEAR, MLP-3, and ONLY-NEW protocols, our method attains state-of-the-art results, even surpassing cross-modal approaches. In particular, under the MLP-LINEAR protocol, our method outperforms the baseline (Point-MAE) by \textbf{7.9\%}, \textbf{6.7\%}, and \textbf{10.3\%} on the three variants of ScanObjectNN, respectively. The code will be made publicly available.
△ Less
Submitted 9 December, 2025;
originally announced December 2025.
-
Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank
Authors:
Shaofeng Zhang,
Xuanqi Chen,
Ning Liao,
Haoxiang Zhao,
Xiaoxing Wang,
Haoru Tan,
Sitong Wu,
Xiaosong Jia,
Qi Fan,
Junchi Yan
Abstract:
The dominance of denoising generative models (e.g., diffusion, flow-matching) in visual synthesis is tempered by their substantial training costs and inefficiencies in representation learning. While injecting discriminative representations via auxiliary alignment has proven effective, this approach still faces key limitations: the reliance on external, pre-trained encoders introduces overhead and…
▽ More
The dominance of denoising generative models (e.g., diffusion, flow-matching) in visual synthesis is tempered by their substantial training costs and inefficiencies in representation learning. While injecting discriminative representations via auxiliary alignment has proven effective, this approach still faces key limitations: the reliance on external, pre-trained encoders introduces overhead and domain shift. A dispersed-based strategy that encourages strong separation among in-batch latent representations alleviates this specific dependency. To assess the effect of the number of negative samples in generative modeling, we propose {\mname}, a plug-and-play training framework that requires no external encoders. Our method integrates a memory bank mechanism that maintains a large, dynamically updated queue of negative samples across training iterations. This decouples the number of negatives from the mini-batch size, providing abundant and high-quality negatives for a contrastive objective without a multiplicative increase in computational cost. A low-dimensional projection head is used to further minimize memory and bandwidth overhead. {\mname} offers three principal advantages: (1) it is self-contained, eliminating dependency on pretrained vision foundation models and their associated forward-pass overhead; (2) it introduces no additional parameters or computational cost during inference; and (3) it enables substantially faster convergence, achieving superior generative quality more efficiently. On ImageNet-256, {\mname} achieves a state-of-the-art FID of \textbf{2.40} within 400k steps, significantly outperforming comparable methods.
△ Less
Submitted 13 December, 2025; v1 submitted 9 December, 2025;
originally announced December 2025.
-
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
Authors:
Yuning Gong,
Yifei Liu,
Yifan Zhan,
Muyao Niu,
Xueying Li,
Yuanjun Liao,
Jiaming Chen,
Yuanyuan Gao,
Jiaqi Chen,
Minming Chen,
Li Zhou,
Yuning Zhang,
Wei Wang,
Xiaoqing Hou,
Huaxi Huang,
Shixiang Tang,
Le Ma,
Dingwen Zhang,
Xue Yang,
Junchi Yan,
Yanchi Zhang,
Yinqiang Zheng,
Xiao Sun,
Zhihang Zhong
Abstract:
Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform fo…
▽ More
Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform for real-time various Gaussian Splatting and meshes rendering. Built on an efficient WebGPU renderer with per-frame ONNX inference, Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience. It introduces a standardized Gaussian Generator contract, which not only supports standard 3DGS rendering but also allows plug-and-play algorithms to generate or update Gaussians each frame. Such inference also enables us to apply feedforward generative post-processing. The platform further offers a plug in three.js library with a concise TypeScript API for seamless integration into existing web applications. Experiments show that, under identical 3DGS assets, Visionary achieves superior rendering efficiency compared to current Web viewers due to GPU-based primitive sorting. It already supports multiple variants, including MLP-based 3DGS, 4DGS, neural avatars, and style transformation or enhancement networks. By unifying inference and rendering directly in the browser, Visionary significantly lowers the barrier to reproduction, comparison, and deployment of 3DGS-family methods, serving as a unified World Model Carrier for both reconstructive and generative paradigms.
△ Less
Submitted 9 December, 2025;
originally announced December 2025.
-
Spatial Retrieval Augmented Autonomous Driving
Authors:
Xiaosong Jia,
Chenhe Zhang,
Yule Jiang,
Songbur Wong,
Zhiyuan Zhang,
Chen Chen,
Shaofeng Zhang,
Xuanhe Zhou,
Xue Yang,
Junchi Yan,
Yu-Gang Jiang
Abstract:
Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with t…
▽ More
Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this ``recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks.
For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.
△ Less
Submitted 7 December, 2025;
originally announced December 2025.
-
Neural reconstruction of 3D ocean wave hydrodynamics from camera sensing
Authors:
Jiabin Liu,
Zihao Zhou,
Jialei Yan,
Anxin Guo,
Alvise Benetazzo,
Hui Li
Abstract:
Precise three-dimensional (3D) reconstruction of wave free surfaces and associated velocity fields is essential for developing a comprehensive understanding of ocean physics. To address the high computational cost of dense visual reconstruction in long-term ocean wave observation tasks and the challenges introduced by persistent visual occlusions, we propose an wave free surface visual reconstruct…
▽ More
Precise three-dimensional (3D) reconstruction of wave free surfaces and associated velocity fields is essential for developing a comprehensive understanding of ocean physics. To address the high computational cost of dense visual reconstruction in long-term ocean wave observation tasks and the challenges introduced by persistent visual occlusions, we propose an wave free surface visual reconstruction neural network, which is designed as an attention-augmented pyramid architecture tailored to the multi-scale and temporally continuous characteristics of wave motions. Using physics-based constraints, we perform time-resolved reconstruction of nonlinear 3D velocity fields from the evolving free-surface boundary. Experiments under real-sea conditions demonstrate millimetre-level wave elevation prediction in the central region, dominant-frequency errors below 0.01 Hz, precise estimation of high-frequency spectral power laws, and high-fidelity 3D reconstruction of nonlinear velocity fields, while enabling dense reconstruction of two million points in only 1.35 s. Built on a stereo-vision dataset, the model outperforms conventional visual reconstruction approaches and maintains strong generalization in occluded conditions, owing to its global multi-scale attention and its learned encoding of wave propagation dynamics.
△ Less
Submitted 4 December, 2025;
originally announced December 2025.
-
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
Authors:
Ruilin Li,
Yibin Wang,
Wenhong Zhu,
Chenglin Li,
Jinghao Zhang,
Chenliang Li,
Junchi Yan,
Jiaqi Wang
Abstract:
Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for making selective edits. However, a significant gap exists between their performance in controlled, teacher-forcing evaluations and their real-world effectiveness in lifelong learning scenarios, which greatly limits…
▽ More
Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for making selective edits. However, a significant gap exists between their performance in controlled, teacher-forcing evaluations and their real-world effectiveness in lifelong learning scenarios, which greatly limits their practical applicability. This work's empirical analysis reveals two recurring issues associated with this gap: (1) Most traditional methods lead the edited model to overfit to the new fact, thereby degrading pre-trained capabilities; (2) There is a critical absence of a knowledge consolidation stage, leaving new facts insufficiently integrated into LLMs' inference-time behavior under autoregressive generation, thereby leading to a mismatch between parametric knowledge and actual generation behavior. To this end, we propose Edit-then-Consolidate, a novel knowledge editing paradigm that aims to bridge the gap between theoretical knowledge editing methods and their real-world applicability. Specifically, (1) our framework mitigates overfitting via Targeted Proximal Supervised Fine-Tuning (TPSFT) that localizes the edit via a trust-region objective to limit policy drift; (2) Then, a consolidation stage using Group Relative Policy Optimization (GRPO) aligns the edited knowledge with CoT-based inference policy by optimizing trajectory-level behavior under comprehensive reward signals. Extensive experiments demonstrate our framework consistently improves editing reliability and generalization under real-world evaluations, while better preserving locality and pre-trained capabilities.
△ Less
Submitted 4 December, 2025;
originally announced December 2025.
-
PanFoMa: A Lightweight Foundation Model and Benchmark for Pan-Cancer
Authors:
Xiaoshui Huang,
Tianlin Zhu,
Yifan Zuo,
Xue Xia,
Zonghan Wu,
Jiebin Yan,
Dingli Hua,
Zongyi Xu,
Yuming Fang,
Jian Zhang
Abstract:
Single-cell RNA sequencing (scRNA-seq) is essential for decoding tumor heterogeneity. However, pan-cancer research still faces two key challenges: learning discriminative and efficient single-cell representations, and establishing a comprehensive evaluation benchmark. In this paper, we introduce PanFoMa, a lightweight hybrid neural network that combines the strengths of Transformers and state-spac…
▽ More
Single-cell RNA sequencing (scRNA-seq) is essential for decoding tumor heterogeneity. However, pan-cancer research still faces two key challenges: learning discriminative and efficient single-cell representations, and establishing a comprehensive evaluation benchmark. In this paper, we introduce PanFoMa, a lightweight hybrid neural network that combines the strengths of Transformers and state-space models to achieve a balance between performance and efficiency. PanFoMa consists of a front-end local-context encoder with shared self-attention layers to capture complex, order-independent gene interactions; and a back-end global sequential feature decoder that efficiently integrates global context using a linear-time state-space model. This modular design preserves the expressive power of Transformers while leveraging the scalability of Mamba to enable transcriptome modeling, effectively capturing both local and global regulatory signals. To enable robust evaluation, we also construct a large-scale pan-cancer single-cell benchmark, PanFoMaBench, containing over 3.5 million high-quality cells across 33 cancer subtypes, curated through a rigorous preprocessing pipeline. Experimental results show that PanFoMa outperforms state-of-the-art models on our pan-cancer benchmark (+4.0\%) and across multiple public tasks, including cell type annotation (+7.4\%), batch integration (+4.0\%) and multi-omics integration (+3.1\%). The code is available at https://github.com/Xiaoshui-Huang/PanFoMa.
△ Less
Submitted 2 December, 2025;
originally announced December 2025.
-
BlinkBud: Detecting Hazards from Behind via Sampled Monocular 3D Detection on a Single Earbud
Authors:
Yunzhe Li,
Jiajun Yan,
Yuzhou Wei,
Kechen Liu,
Yize Zhao,
Chong Zhang,
Hongzi Zhu,
Li Lu,
Shan Chang,
Minyi Guo
Abstract:
Failing to be aware of speeding vehicles approaching from behind poses a huge threat to the road safety of pedestrians and cyclists. In this paper, we propose BlinkBud, which utilizes a single earbud and a paired phone to online detect hazardous objects approaching from behind of a user. The core idea is to accurately track visually identified objects utilizing a small number of sampled camera ima…
▽ More
Failing to be aware of speeding vehicles approaching from behind poses a huge threat to the road safety of pedestrians and cyclists. In this paper, we propose BlinkBud, which utilizes a single earbud and a paired phone to online detect hazardous objects approaching from behind of a user. The core idea is to accurately track visually identified objects utilizing a small number of sampled camera images taken from the earbud. To minimize the power consumption of the earbud and the phone while guaranteeing the best tracking accuracy, a novel 3D object tracking algorithm is devised, integrating both a Kalman filter based trajectory estimation scheme and an optimal image sampling strategy based on reinforcement learning. Moreover, the impact of constant user head movements on the tracking accuracy is significantly eliminated by leveraging the estimated pitch and yaw angles to correct the object depth estimation and align the camera coordinate system to the user's body coordinate system, respectively. We implement a prototype BlinkBud system and conduct extensive real-world experiments. Results show that BlinkBud is lightweight with ultra-low mean power consumptions of 29.8 mW and 702.6 mW on the earbud and smartphone, respectively, and can accurately detect hazards with a low average false positive ratio (FPR) and false negative ratio (FNR) of 4.90% and 1.47%, respectively.
△ Less
Submitted 1 December, 2025;
originally announced December 2025.
-
DriveVGGT: Visual Geometry Transformer for Autonomous Driving
Authors:
Xiaosong Jia,
Yanhao Liu,
Junqi You,
Renqiu Xia,
Yu Hong,
Junchi Yan
Abstract:
Feed-forward reconstruction has recently gained significant attention, with VGGT being a notable example. However, directly applying VGGT to autonomous driving (AD) systems leads to sub-optimal results due to the different priors between the two tasks. In AD systems, several important new priors need to be considered: (i) The overlap between camera views is minimal, as autonomous driving sensor se…
▽ More
Feed-forward reconstruction has recently gained significant attention, with VGGT being a notable example. However, directly applying VGGT to autonomous driving (AD) systems leads to sub-optimal results due to the different priors between the two tasks. In AD systems, several important new priors need to be considered: (i) The overlap between camera views is minimal, as autonomous driving sensor setups are designed to achieve coverage at a low cost. (ii) The camera intrinsics and extrinsics are known, which introduces more constraints on the output and also enables the estimation of absolute scale. (iii) Relative positions of all cameras remain fixed though the ego vehicle is in motion. To fully integrate these priors into a feed-forward framework, we propose DriveVGGT, a scale-aware 4D reconstruction framework specifically designed for autonomous driving data. Specifically, we propose a Temporal Video Attention (TVA) module to process multi-camera videos independently, which better leverages the spatiotemporal continuity within each single-camera sequence. Then, we propose a Multi-camera Consistency Attention (MCA) module to conduct window attention with normalized relative pose embeddings, aiming to establish consistency relationships across different cameras while restricting each token to attend only to nearby frames. Finally, we extend the standard VGGT heads by adding an absolute scale head and an ego vehicle pose head. Experiments show that DriveVGGT outperforms VGGT, StreamVGGT, fastVGGT on autonomous driving dataset while extensive ablation studies verify effectiveness of the proposed designs.
△ Less
Submitted 27 November, 2025;
originally announced November 2025.
-
Bridging Planning and Execution: Multi-Agent Path Finding Under Real-World Deadlines
Authors:
Jingtian Yan,
Shuai Zhou,
Stephen F. Smith,
Jiaoyang Li
Abstract:
The Multi-Agent Path Finding (MAPF) problem aims to find collision-free paths for multiple agents while optimizing objectives such as the sum of costs or makespan. MAPF has wide applications in domains like automated warehouses, manufacturing systems, and airport logistics. However, most MAPF formulations assume a simplified robot model for planning, which overlooks execution-time factors such as…
▽ More
The Multi-Agent Path Finding (MAPF) problem aims to find collision-free paths for multiple agents while optimizing objectives such as the sum of costs or makespan. MAPF has wide applications in domains like automated warehouses, manufacturing systems, and airport logistics. However, most MAPF formulations assume a simplified robot model for planning, which overlooks execution-time factors such as kinodynamic constraints, communication latency, and controller variability. This gap between planning and execution is problematic for time-sensitive applications. To bridge this gap, we propose REMAP, an execution-informed MAPF planning framework that can be combined with leading search-based MAPF planners with minor changes. Our framework integrates the proposed ExecTimeNet to accurately estimate execution time based on planned paths. We demonstrate our method for solving MAPF with Real-world Deadlines (MAPF-RD) problem, where agents must reach their goals before a predefined wall-clock time. We integrate our framework with two popular MAPF methods, MAPF-LNS and CBS. Experiments show that REMAP achieves up to 20% improvement in solution quality over baseline methods (e.g., constant execution speed estimators) on benchmark maps with up to 300 agents.
△ Less
Submitted 26 November, 2025;
originally announced November 2025.
-
Co-Training Vision Language Models for Remote Sensing Multi-task Learning
Authors:
Qingyun Li,
Shuran Ma,
Junwei Luo,
Yi Yu,
Yue Zhou,
Fengxiang Wang,
Xudong Lu,
Xiaoxing Wang,
Xin He,
Yushi Chen,
Xue Yang,
Junchi Yan
Abstract:
With Transformers achieving outstanding performance on individual remote sensing (RS) tasks, we are now approaching the realization of a unified model that excels across multiple tasks through multi-task learning (MTL). Compared to single-task approaches, MTL methods offer improved generalization, enhanced scalability, and greater practical applicability. Recently, vision language models (VLMs) ha…
▽ More
With Transformers achieving outstanding performance on individual remote sensing (RS) tasks, we are now approaching the realization of a unified model that excels across multiple tasks through multi-task learning (MTL). Compared to single-task approaches, MTL methods offer improved generalization, enhanced scalability, and greater practical applicability. Recently, vision language models (VLMs) have achieved promising results in RS image understanding, grounding, and ultra-high-resolution (UHR) image reasoning, respectively. Moreover, the unified text-based interface demonstrates significant potential for MTL. Hence, in this work, we present RSCoVLM, a simple yet flexible VLM baseline for RS MTL. Firstly, we create the data curation engine, including data acquisition, offline processing and integrating, as well as online loading and weighting. This data engine effectively addresses complex RS data enviroment and generates flexible vision-language conversations. Furthermore, we propose a unified dynamic-resolution strategy to address the diverse image scales inherent in RS imagery. For UHR images, we introduce the Zoom-in Chain mechanism together with its corresponding dataset, LRS-VQA-Zoom. The strategies are flexible and effectively mitigate the computational burdens. Additionally, we significantly enhance the model's object detection capability and propose a novel evaluation protocol that ensures fair comparison between VLMs and conventional detection models. Extensive experiments demonstrate that RSCoVLM achieves state-of-the-art performance across diverse tasks, outperforming existing RS VLMs and even rivaling specialized expert models. All the training and evaluating tools, model weights, and datasets have been fully open-sourced to support reproducibility. We expect that this baseline will promote further progress toward general-purpose RS models.
△ Less
Submitted 26 November, 2025;
originally announced November 2025.
-
LaGen: Towards Autoregressive LiDAR Scene Generation
Authors:
Sizhuo Zhou,
Xiaosong Jia,
Fanrui Zhang,
Junjie Li,
Juyong Zhang,
Yukang Feng,
Jianwen Sun,
Songbur Wong,
Junqi You,
Junchi Yan
Abstract:
Generative world models for autonomous driving (AD) have become a trending topic. Unlike the widely studied image modality, in this work we explore generative world models for LiDAR data. Existing generation methods for LiDAR data only support single frame generation, while existing prediction approaches require multiple frames of historical input and can only deterministically predict multiple fr…
▽ More
Generative world models for autonomous driving (AD) have become a trending topic. Unlike the widely studied image modality, in this work we explore generative world models for LiDAR data. Existing generation methods for LiDAR data only support single frame generation, while existing prediction approaches require multiple frames of historical input and can only deterministically predict multiple frames at once, lacking interactivity. Both paradigms fail to support long-horizon interactive generation. To this end, we introduce LaGen, which to the best of our knowledge is the first framework capable of frame-by-frame autoregressive generation of long-horizon LiDAR scenes. LaGen is able to take a single-frame LiDAR input as a starting point and effectively utilize bounding box information as conditions to generate high-fidelity 4D scene point clouds. In addition, we introduce a scene decoupling estimation module to enhance the model's interactive generation capability for object-level content, as well as a noise modulation module to mitigate error accumulation during long-horizon generation. We construct a protocol based on nuScenes for evaluating long-horizon LiDAR scene generation. Experimental results comprehensively demonstrate LaGen outperforms state-of-the-art LiDAR generation and prediction models, especially on the later frames.
△ Less
Submitted 26 November, 2025;
originally announced November 2025.
-
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
Authors:
Sen Nie,
Jie Zhang,
Jianxin Yan,
Shiguang Shan,
Xilin Chen
Abstract:
Adversarial attacks have evolved from simply disrupting predictions on conventional task-specific models to the more complex goal of manipulating image semantics on Large Vision-Language Models (LVLMs). However, existing methods struggle with controllability and fail to precisely manipulate the semantics of specific concepts in the image. We attribute this limitation to semantic entanglement in th…
▽ More
Adversarial attacks have evolved from simply disrupting predictions on conventional task-specific models to the more complex goal of manipulating image semantics on Large Vision-Language Models (LVLMs). However, existing methods struggle with controllability and fail to precisely manipulate the semantics of specific concepts in the image. We attribute this limitation to semantic entanglement in the patch-token representations on which adversarial attacks typically operate: global context aggregated by self-attention in the vision encoder dominates individual patch features, making them unreliable handles for precise local semantic manipulation. Our systematic investigation reveals a key insight: value features (V) computed within the transformer attention block serve as much more precise handles for manipulation. We show that V suppresses global-context channels, allowing it to retain high-entropy, disentangled local semantic information. Building on this discovery, we propose V-Attack, a novel method designed for precise local semantic attacks. V-Attack targets the value features and introduces two core components: (1) a Self-Value Enhancement module to refine V's intrinsic semantic richness, and (2) a Text-Guided Value Manipulation module that leverages text prompts to locate source concept and optimize it toward a target concept. By bypassing the entangled patch features, V-Attack achieves highly effective semantic control. Extensive experiments across diverse LVLMs, including LLaVA, InternVL, DeepseekVL and GPT-4o, show that V-Attack improves the attack success rate by an average of 36% over state-of-the-art methods, exposing critical vulnerabilities in modern visual-language understanding. Our code and data are available https://github.com/Summu77/V-Attack.
△ Less
Submitted 25 November, 2025;
originally announced November 2025.
-
Evolution of Cybersecurity Subdisciplines: A Science of Science Study
Authors:
Yao Chen,
Jeff Yan
Abstract:
The science of science is an emerging field that studies the practice of science itself. We present the first study of the cybersecurity discipline from a science of science perspective. We examine the evolution of two comparable interdisciplinary communities in cybersecurity: the Symposium on Usable Privacy and Security (SOUPS) and Financial Cryptography and Data Security (FC).
The science of science is an emerging field that studies the practice of science itself. We present the first study of the cybersecurity discipline from a science of science perspective. We examine the evolution of two comparable interdisciplinary communities in cybersecurity: the Symposium on Usable Privacy and Security (SOUPS) and Financial Cryptography and Data Security (FC).
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
Covariate Connectivity Combined Clustering for Weighted Networks
Authors:
Zeyu Hu,
Wenrui Li,
Jun Yan,
Panpan Zhang
Abstract:
Community detection is a central task in network analysis, with applications in social, biological, and technological systems. Traditional algorithms rely primarily on network topology, which can fail when community signals are partly encoded in node-specific attributes. Existing covariate-assisted methods often assume the number of clusters is known, involve computationally intensive inference, o…
▽ More
Community detection is a central task in network analysis, with applications in social, biological, and technological systems. Traditional algorithms rely primarily on network topology, which can fail when community signals are partly encoded in node-specific attributes. Existing covariate-assisted methods often assume the number of clusters is known, involve computationally intensive inference, or are not designed for weighted networks. We propose $\text{C}^4$: Covariate Connectivity Combined Clustering, an adaptive spectral clustering algorithm that integrates network connectivity and node-level covariates into a unified similarity representation. $\text{C}^4$ balances the two sources of information through a data-driven tuning parameter, estimates the number of communities via an eigengap heuristic, and avoids reliance on costly sampling-based procedures. Simulation studies show that $\text{C}^4$ achieves higher accuracy and robustness than competing approaches across diverse scenarios. Application to an airport reachability network demonstrates the method's scalability, interpretability, and practical utility for real-world weighted networks.
△ Less
Submitted 21 November, 2025;
originally announced November 2025.
-
Budget-Aware Tool-Use Enables Effective Agent Scaling
Authors:
Tengxiao Liu,
Zifeng Wang,
Jin Miao,
I-Hung Hsu,
Jun Yan,
Jiefeng Chen,
Rujun Han,
Fangyuan Xu,
Yanfei Chen,
Ke Jiang,
Samira Daruki,
Yi Liang,
William Yang Wang,
Tomas Pfister,
Chen-Yu Lee
Abstract:
Scaling test-time computation improves performance across different tasks on large language models (LLMs), which has also been extended to tool-augmented agents. For these agents, scaling involves not only "thinking" in tokens but also "acting" via tool calls. The number of tool calls directly bounds the agent's interaction with the external environment. However, we find that simply granting agent…
▽ More
Scaling test-time computation improves performance across different tasks on large language models (LLMs), which has also been extended to tool-augmented agents. For these agents, scaling involves not only "thinking" in tokens but also "acting" via tool calls. The number of tool calls directly bounds the agent's interaction with the external environment. However, we find that simply granting agents a larger tool-call budget fails to improve performance, as they lack "budget awareness" and quickly hit a performance ceiling. To address this, we study how to scale such agents effectively under explicit tool-call budgets, focusing on web search agents. We first introduce the Budget Tracker, a lightweight plug-in that provides the agent with continuous budget awareness, enabling simple yet effective scaling. We further develop BATS (Budget Aware Test-time Scaling), an advanced framework that leverages this awareness to dynamically adapt its planning and verification strategy, deciding whether to "dig deeper" on a promising lead or "pivot" to new paths based on remaining resources. To analyze cost-performance scaling in a controlled manner, we formalize a unified cost metric that jointly accounts for token and tool consumption. We provide the first systematic study on budget-constrained agents, showing that budget-aware methods produce more favorable scaling curves and push the cost-performance Pareto frontier. Our work offers empirical insights toward a more transparent and principled understanding of scaling in tool-augmented agents.
△ Less
Submitted 21 November, 2025;
originally announced November 2025.
-
How Noise Benefits AI-generated Image Detection
Authors:
Jiazhen Yan,
Ziqiang Li,
Fan Wang,
Kai Zeng,
Zhangjie Fu
Abstract:
The rapid advancement of generative models has made real and synthetic images increasingly indistinguishable. Although extensive efforts have been devoted to detecting AI-generated images, out-of-distribution generalization remains a persistent challenge. We trace this weakness to spurious shortcuts exploited during training and we also observe that small feature-space perturbations can mitigate s…
▽ More
The rapid advancement of generative models has made real and synthetic images increasingly indistinguishable. Although extensive efforts have been devoted to detecting AI-generated images, out-of-distribution generalization remains a persistent challenge. We trace this weakness to spurious shortcuts exploited during training and we also observe that small feature-space perturbations can mitigate shortcut dominance. To address this problem in a more controllable manner, we propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which jointly trains a noise generator and a detection network under a variational positive-incentive principle. Specifically, we construct positive-incentive noise in the feature space via cross-attention fusion of visual and categorical semantic features. During optimization, the noise is injected into the feature space to fine-tune the visual encoder, suppressing shortcut-sensitive directions while amplifying stable forensic cues, thereby enabling the extraction of more robust and generalized artifact representations. Comparative experiments are conducted on an open-world dataset comprising synthetic images generated by 42 distinct generative models. Our method achieves new state-of-the-art performance, with notable improvements of 5.4 in average accuracy over existing approaches.
△ Less
Submitted 20 November, 2025;
originally announced November 2025.
-
Forecasting the Constraint on the Hu-Sawicki $f(R)$ Modified Gravity in the CSST $3\times2$pt Photometric Survey
Authors:
Jun-Hui Yan,
Yan Gong,
Qi Xiong,
Xuelei Chen,
Qi Guo,
Ming Li,
Yun Liu,
Wenxiang Pei
Abstract:
We forecast the constraint on the Hu-Sawicki $f(R)$ model from the photometric survey operated by the Chinese Space Station Survey Telescope (CSST). The simulated $3\times2$pt data of galaxy clustering, weak lensing, and galaxy-galaxy lensing measurements within 100 deg$^{2}$ are used in the analysis. The mock observational maps are constructed from a light cone, redshift sampling and noise. The a…
▽ More
We forecast the constraint on the Hu-Sawicki $f(R)$ model from the photometric survey operated by the Chinese Space Station Survey Telescope (CSST). The simulated $3\times2$pt data of galaxy clustering, weak lensing, and galaxy-galaxy lensing measurements within 100 deg$^{2}$ are used in the analysis. The mock observational maps are constructed from a light cone, redshift sampling and noise. The angular power spectra are measured with pseudo-$C_\ell$ estimators and compared to theory in the same basis using validated weighting functions and an analytic covariance matrix that includes Gaussian, connected non-Gaussian, and super-sample terms. We model the theoretical spectra using two methods. The first one uses MGCAMB to compute the linear modified-gravity clustering power spectra, and the second one adopts the FREmu emulator with a baseline of nonlinear $Λ$CDM prescription. Parameter inference is performed with Cobaya, and the cosmological and modified-gravity parameters are sampled within the emulator training domain, which is jointly fitted with the systematic parameters. We find that the predictions from the two methods are in good agreement at the overlapping large scales, and the emulator method can correctly provide additional high-$\ell$ information. The $1σ$ upper bounds of $\log_{10}|f_{R0}|$ are found to be $<-5.42$ for cosmic shear only case and $<-5.29$ for the 100 deg$^2$ CSST $3\times2$pt probe. The full CSST photometric survey with 17,500 deg$^2$ survey area is expected to further improve the constraint precision by about one order of magnitude. Our results demonstrate that the CSST $3\times2$pt survey can deliver strict tests on $f(R)$ gravity.
△ Less
Submitted 11 December, 2025; v1 submitted 20 November, 2025;
originally announced November 2025.
-
Asymmetric Ramsey numbers of trees
Authors:
Jun Yan
Abstract:
Let $n\geqν$, let $T$ be an $n$-vertex tree with bipartition class sizes $t_1\geq t_2$, and let $S$ be a $ν$-vertex tree with bipartition class sizes $τ_1\geqτ_2$. Using four natural constructions, we show that the Ramsey number $R(T,S)$ is lower bounded by $\underline{R}(T,S)=\max\{n+τ_2,ν+\min\{t_2,ν\},\min\{2t_1,2ν\},2τ_1\}-1$.
Our main result shows that there exists a constant $c>0$, such th…
▽ More
Let $n\geqν$, let $T$ be an $n$-vertex tree with bipartition class sizes $t_1\geq t_2$, and let $S$ be a $ν$-vertex tree with bipartition class sizes $τ_1\geqτ_2$. Using four natural constructions, we show that the Ramsey number $R(T,S)$ is lower bounded by $\underline{R}(T,S)=\max\{n+τ_2,ν+\min\{t_2,ν\},\min\{2t_1,2ν\},2τ_1\}-1$.
Our main result shows that there exists a constant $c>0$, such that for all sufficiently large integers $n\geqν$, if (i) $Δ(T)\leq cn/\log n$ and $Δ(S)\leq cν/\logν$, (ii) $τ_2\geq t_2$, and (iii) $ν\geq t_1$, then $R(T,S)=\underline{R}(T,S)$. In particular, this determines the exact Ramsey numbers for a large family of pairs of trees. We also provide examples showing that $R(T,S)$ can exceed $\underline{R}(T,S)$ if any one of the three assumptions (i), (ii), and (iii) is removed.
△ Less
Submitted 19 November, 2025;
originally announced November 2025.
-
Know Your Intent: An Autonomous Multi-Perspective LLM Agent Framework for DeFi User Transaction Intent Mining
Authors:
Qian'ang Mao,
Yuxuan Zhang,
Jiaman Chen,
Wenjun Zhou,
Jiaqi Yan
Abstract:
As Decentralized Finance (DeFi) develops, understanding user intent behind DeFi transactions is crucial yet challenging due to complex smart contract interactions, multifaceted on-/off-chain factors, and opaque hex logs. Existing methods lack deep semantic insight. To address this, we propose the Transaction Intent Mining (TIM) framework. TIM leverages a DeFi intent taxonomy built on grounded theo…
▽ More
As Decentralized Finance (DeFi) develops, understanding user intent behind DeFi transactions is crucial yet challenging due to complex smart contract interactions, multifaceted on-/off-chain factors, and opaque hex logs. Existing methods lack deep semantic insight. To address this, we propose the Transaction Intent Mining (TIM) framework. TIM leverages a DeFi intent taxonomy built on grounded theory and a multi-agent Large Language Model (LLM) system to robustly infer user intents. A Meta-Level Planner dynamically coordinates domain experts to decompose multiple perspective-specific intent analyses into solvable subtasks. Question Solvers handle the tasks with multi-modal on/off-chain data. While a Cognitive Evaluator mitigates LLM hallucinations and ensures verifiability. Experiments show that TIM significantly outperforms machine learning models, single LLMs, and single Agent baselines. We also analyze core challenges in intent inference. This work helps provide a more reliable understanding of user motivations in DeFi, offering context-aware explanations for complex blockchain activity.
△ Less
Submitted 19 November, 2025;
originally announced November 2025.
-
DGS-Net: Distillation-Guided Gradient Surgery for CLIP Fine-Tuning in AI-Generated Image Detection
Authors:
Jiazhen Yan,
Ziqiang Li,
Fan Wang,
Boyu Wang,
Zhangjie Fu
Abstract:
The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic…
▽ More
The rapid progress of generative models such as GANs and diffusion models has led to the widespread proliferation of AI-generated images, raising concerns about misinformation, privacy violations, and trust erosion in digital media. Although large-scale multimodal models like CLIP offer strong transferable representations for detecting synthetic content, fine-tuning them often induces catastrophic forgetting, which degrades pre-trained priors and limits cross-domain generalization. To address this issue, we propose the Distillation-guided Gradient Surgery Network (DGS-Net), a novel framework that preserves transferable pre-trained priors while suppressing task-irrelevant components. Specifically, we introduce a gradient-space decomposition that separates harmful and beneficial descent directions during optimization. By projecting task gradients onto the orthogonal complement of harmful directions and aligning with beneficial ones distilled from a frozen CLIP encoder, DGS-Net achieves unified optimization of prior preservation and irrelevant suppression. Extensive experiments on 50 generative models demonstrate that our method outperforms state-of-the-art approaches by an average margin of 6.6, achieving superior detection performance and generalization across diverse generation techniques.
△ Less
Submitted 17 November, 2025;
originally announced November 2025.
-
MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics
Authors:
Jing Li,
Yifan Wang,
Jiafeng Yan,
Renlong Zhang,
Bin Yang
Abstract:
Infrared and visible image fusion aims to integrate complementary multi-modal information into a single fused result. However, existing methods 1) fail to account for the degradation visible images under adverse weather conditions, thereby compromising fusion performance; and 2) rely on fixed network architectures, limiting their adaptability to diverse degradation scenarios. To address these issu…
▽ More
Infrared and visible image fusion aims to integrate complementary multi-modal information into a single fused result. However, existing methods 1) fail to account for the degradation visible images under adverse weather conditions, thereby compromising fusion performance; and 2) rely on fixed network architectures, limiting their adaptability to diverse degradation scenarios. To address these issues, we propose a one-stop degradation-aware image fusion framework for multi-degradation scenarios driven by a large language model (MdaIF). Given the distinct scattering characteristics of different degradation scenarios (e.g., haze, rain, and snow) in atmospheric transmission, a mixture-of-experts (MoE) system is introduced to tackle image fusion across multiple degradation scenarios. To adaptively extract diverse weather-aware degradation knowledge and scene feature representations, collectively referred to as the semantic prior, we employ a pre-trained vision-language model (VLM) in our framework. Guided by the semantic prior, we propose degradation-aware channel attention module (DCAM), which employ degradation prototype decomposition to facilitate multi-modal feature interaction in channel domain. In addition, to achieve effective expert routing, the semantic prior and channel-domain modulated features are utilized to guide the MoE, enabling robust image fusion in complex degradation scenarios. Extensive experiments validate the effectiveness of our MdaIF, demonstrating superior performance over SOTA methods.
△ Less
Submitted 16 November, 2025;
originally announced November 2025.
-
Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models
Authors:
Yongxian Wei,
Yilin Zhao,
Li Shen,
Xinrui Chen,
Runxi Cheng,
Sinan Du,
Hao Yu,
Gang Liu,
Jiahong Yan,
Chun Yuan,
Dian Li
Abstract:
Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of re…
▽ More
Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of reasoning in problem generation, leading to shallow problem variants. In this paper, we develop a problem generator that reasons explicitly to plan problem directions before synthesis and adapts difficulty to the solver's ability. Specifically, we construct related problem pairs and augment them with intermediate problem-design CoT produced by a reasoning model. These data bootstrap problem-design strategies from the generator. Then, we treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty and produce complementary problems near the edge of the solver's competence. Extensive experiments on 10 mathematical and general reasoning benchmarks show that our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models. Moreover, a solver trained on the synthesized data provides improved rewards for continued generator training, enabling co-evolution and yielding a further 0.7% performance gain. Our code will be made publicly available here.
△ Less
Submitted 15 December, 2025; v1 submitted 12 November, 2025;
originally announced November 2025.
-
Diagnostics for Semiparametric Accelerated Failure Time Models with R Package afttest
Authors:
Woojung Bae,
Dongrak Choi,
Jun Yan,
Sangwook Kang
Abstract:
The semiparametric accelerated failure time (AFT) model is a useful alternative to the widely used Cox proportional hazard model, which directly links the logarithm of the failure time to the covariates, yielding more interpretable regression coefficients. However, diagnostic procedures for the semiparametric AFT model have received relatively little attention. This paper introduces afttest, an R…
▽ More
The semiparametric accelerated failure time (AFT) model is a useful alternative to the widely used Cox proportional hazard model, which directly links the logarithm of the failure time to the covariates, yielding more interpretable regression coefficients. However, diagnostic procedures for the semiparametric AFT model have received relatively little attention. This paper introduces afttest, an R package that implements recently developed diagnostic tools for the semiparametric AFT model. The package supports diagnostic procedures for models fitted with either rank-based or least-squares methods. It provides functions to assess model assumptions, including the overall adequacy, the link function, and functional form of each covariate. The test statistics are of Kolmogorov-type suprema of transformed aggregated martingale residual processes. The p-values are obtained by approximating the null distribution with an efficient multiplier bootstrap procedure. Additionally, the package offers graphical tools to compare the observed stochastic processes with a number of approximated realizations. Applications of the package to the well-known Mayo clinic primary biliary cirrhosis study are presented.
△ Less
Submitted 12 November, 2025;
originally announced November 2025.
-
GenePheno: Interpretable Gene Knockout-Induced Phenotype Abnormality Prediction from Gene Sequences
Authors:
Jingquan Yan,
Yuwei Miao,
Lei Yu,
Yuzhi Guo,
Xue Xiao,
Lin Xu,
Junzhou Huang
Abstract:
Exploring how genetic sequences shape phenotypes is a fundamental challenge in biology and a key step toward scalable, hypothesis-driven experimentation. The task is complicated by the large modality gap between sequences and phenotypes, as well as the pleiotropic nature of gene-phenotype relationships. Existing sequence-based efforts focus on the degree to which variants of specific genes alter a…
▽ More
Exploring how genetic sequences shape phenotypes is a fundamental challenge in biology and a key step toward scalable, hypothesis-driven experimentation. The task is complicated by the large modality gap between sequences and phenotypes, as well as the pleiotropic nature of gene-phenotype relationships. Existing sequence-based efforts focus on the degree to which variants of specific genes alter a limited set of phenotypes, while general gene knockout induced phenotype abnormality prediction methods heavily rely on curated genetic information as inputs, which limits scalability and generalizability. As a result, the task of broadly predicting the presence of multiple phenotype abnormalities under gene knockout directly from gene sequences remains underexplored. We introduce GenePheno, the first interpretable multi-label prediction framework that predicts knockout induced phenotypic abnormalities from gene sequences. GenePheno employs a contrastive multi-label learning objective that captures inter-phenotype correlations, complemented by an exclusive regularization that enforces biological consistency. It further incorporates a gene function bottleneck layer, offering human interpretable concepts that reflect functional mechanisms behind phenotype formation. To support progress in this area, we curate four datasets with canonical gene sequences as input and multi-label phenotypic abnormalities induced by gene knockouts as targets. Across these datasets, GenePheno achieves state-of-the-art gene-centric $F_{\text{max}}$ and phenotype-centric AUC, and case studies demonstrate its ability to reveal gene functional mechanisms.
△ Less
Submitted 14 November, 2025; v1 submitted 12 November, 2025;
originally announced November 2025.
-
OG-PCL: Efficient Sparse Point Cloud Processing for Human Activity Recognition
Authors:
Jiuqi Yan,
Chendong Xu,
Dongyu Liu
Abstract:
Human activity recognition (HAR) with millimeter-wave (mmWave) radar offers a privacy-preserving and robust alternative to camera- and wearable-based approaches. In this work, we propose the Occupancy-Gated Parallel-CNN Bi-LSTM (OG-PCL) network to process sparse 3D radar point clouds produced by mmWave sensing. Designed for lightweight deployment, the parameter size of the proposed OG-PCL is only…
▽ More
Human activity recognition (HAR) with millimeter-wave (mmWave) radar offers a privacy-preserving and robust alternative to camera- and wearable-based approaches. In this work, we propose the Occupancy-Gated Parallel-CNN Bi-LSTM (OG-PCL) network to process sparse 3D radar point clouds produced by mmWave sensing. Designed for lightweight deployment, the parameter size of the proposed OG-PCL is only 0.83M and achieves 91.75 accuracy on the RadHAR dataset, outperforming those existing baselines such as 2D CNN, PointNet, and 3D CNN methods. We validate the advantages of the tri-view parallel structure in preserving spatial information across three dimensions while maintaining efficiency through ablation studies. We further introduce the Occupancy-Gated Convolution (OGConv) block and demonstrate the necessity of its occupancy compensation mechanism for handling sparse point clouds. The proposed OG-PCL thus offers a compact yet accurate framework for real-time radar-based HAR on lightweight platforms.
△ Less
Submitted 11 November, 2025;
originally announced November 2025.
-
Guidelines for Building Indexes on Partially Cache-Coherent CXL Shared Memory
Authors:
Fangnuo Wu,
Mingkai Dong,
Wenjun Cai,
Jingsheng Yan,
Haibo Chen
Abstract:
The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like Compute Express Link (CXL). However, PCC's relaxation of global cache coherence compromises the correctness of existing single-machine software.
This paper focuses on building consistent and efficie…
▽ More
The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like Compute Express Link (CXL). However, PCC's relaxation of global cache coherence compromises the correctness of existing single-machine software.
This paper focuses on building consistent and efficient indexes on PCC platforms. We present that existing indexes designed for cache-coherent platforms can be made consistent on PCC platforms following SP guidelines, i.e., we identify \emph{sync-data} and \emph{protected-data} according to the index's concurrency control mechanisms, and synchronize them accordingly. However, conversion with SP guidelines introduces performance overhead. To mitigate the overhead, we identify several unique performance bottlenecks on PCC platforms, and propose P$^3$ guidelines (i.e., using Out-of-\underline{P}lace update, Re\underline{P}licated shared variable, S\underline{P}eculative Reading) to improve the efficiency of converted indexes on PCC platforms.
With SP and P$^3$ guidelines, we convert and optimize two indexes (CLevelHash and BwTree) for PCC platforms. Evaluation shows that converted indexes' throughput improves up to 16$\times$ following P$^3$ guidelines, and the optimized indexes outperform their message-passing-based and disaggregated-memory-based counterparts by up to 16$\times$ and 19$\times$.
△ Less
Submitted 9 November, 2025;
originally announced November 2025.
-
When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms
Authors:
Qibing Ren,
Zhijie Zheng,
Jiaxuan Guo,
Junchi Yan,
Lizhuang Ma,
Jing Shao
Abstract:
In this work, we study the risks of collective financial fraud in large-scale multi-agent systems powered by large language model (LLM) agents. We investigate whether agents can collaborate in fraudulent behaviors, how such collaboration amplifies risks, and what factors influence fraud success. To support this research, we present MultiAgentFraudBench, a large-scale benchmark for simulating finan…
▽ More
In this work, we study the risks of collective financial fraud in large-scale multi-agent systems powered by large language model (LLM) agents. We investigate whether agents can collaborate in fraudulent behaviors, how such collaboration amplifies risks, and what factors influence fraud success. To support this research, we present MultiAgentFraudBench, a large-scale benchmark for simulating financial fraud scenarios based on realistic online interactions. The benchmark covers 28 typical online fraud scenarios, spanning the full fraud lifecycle across both public and private domains. We further analyze key factors affecting fraud success, including interaction depth, activity level, and fine-grained collaboration failure modes. Finally, we propose a series of mitigation strategies, including adding content-level warnings to fraudulent posts and dialogues, using LLMs as monitors to block potentially malicious agents, and fostering group resilience through information sharing at the societal level. Notably, we observe that malicious agents can adapt to environmental interventions. Our findings highlight the real-world risks of multi-agent financial fraud and suggest practical measures for mitigating them. Code is available at https://github.com/zheng977/MutiAgent4Fraud.
△ Less
Submitted 9 November, 2025;
originally announced November 2025.
-
Nonparametric Block Bootstrap Kolmogorov-Smirnov Goodness-of-Fit Test
Authors:
Mathew Chandy,
Elizabeth Schifano,
Jun Yan,
Xianyang Zhang
Abstract:
The Kolmogorov--Smirnov (KS) test is a widely used statistical test that assesses the conformity of a sample to a specified distribution. Its efficacy, however, diminishes with serially dependent data and when parameters within the hypothesized distribution are unknown. For independent data, parametric and nonparametric bootstrap procedures are available to adjust for estimated parameters. For ser…
▽ More
The Kolmogorov--Smirnov (KS) test is a widely used statistical test that assesses the conformity of a sample to a specified distribution. Its efficacy, however, diminishes with serially dependent data and when parameters within the hypothesized distribution are unknown. For independent data, parametric and nonparametric bootstrap procedures are available to adjust for estimated parameters. For serially dependent stationary data, parametric bootstrap has been developed with a working serial dependence structure. A counterpart for the nonparametric bootstrap approach, which needs a bias correction, has not been studied. Addressing this gap, our study introduces a bias correction method employing a nonparametric block bootstrap, which approximates the distribution of the KS statistic in assessing the goodness-of-fit of the marginal distribution of a stationary series, accounting for unspecified serial dependence and unspecified parameters. We assess its effectiveness through simulations, scrutinizing both its size and power. The practicality of our method is further illustrated with an examination of stock returns from the S\&P 500 index, showcasing its utility in real-world applications.
△ Less
Submitted 7 November, 2025;
originally announced November 2025.
-
PECL: A Heterogeneous Parallel Multi-Domain Network for Radar-Based Human Activity Recognition
Authors:
Jiuqi Yan,
Chendong Xu,
Dongyu Liu
Abstract:
Radar systems are increasingly favored for medical applications because they provide non-intrusive monitoring with high privacy and robustness to lighting conditions. However, existing research typically relies on single-domain radar signals and overlooks the temporal dependencies inherent in human activity, which complicates the classification of similar actions. To address this issue, we designe…
▽ More
Radar systems are increasingly favored for medical applications because they provide non-intrusive monitoring with high privacy and robustness to lighting conditions. However, existing research typically relies on single-domain radar signals and overlooks the temporal dependencies inherent in human activity, which complicates the classification of similar actions. To address this issue, we designed the Parallel-EfficientNet-CBAM-LSTM (PECL) network to process data in three complementary domains: Range-Time, Doppler-Time, and Range-Doppler. PECL combines a channel-spatial attention module and temporal units to capture more features and dynamic dependencies during action sequences, improving both accuracy and robustness. The experimental results show that PECL achieves an accuracy of 96.16% on the same dataset, outperforming existing methods by at least 4.78%. PECL also performs best in distinguishing between easily confused actions. Despite its strong performance, PECL maintains moderate model complexity, with 23.42M parameters and 1324.82M FLOPs. Its parameter-efficient design further reduces computational cost.
△ Less
Submitted 7 November, 2025;
originally announced November 2025.
-
Detecting FRB by DANCE: a method based on DEnsity ANalysis and Cluster Extraction
Authors:
Mao Yuan,
Jiarui Niu,
Yi Feng,
Xu-ning Lv,
Chenchen Miao,
Lingqi Meng,
Bo Peng,
Li Deng,
Jingye Yan,
Weiwei Zhu
Abstract:
Fast radio bursts (FRBs) are transient signals exhibiting diverse strengths and emission bandwidths. Traditional single-pulse search techniques are widely employed for FRB detection; yet weak, narrow-band bursts often remain undetectable due to low signal-to-noise ratios (SNR) in integrated profiles. We developed DANCE, a detection tool based on cluster analysis of the original spectrum. It is spe…
▽ More
Fast radio bursts (FRBs) are transient signals exhibiting diverse strengths and emission bandwidths. Traditional single-pulse search techniques are widely employed for FRB detection; yet weak, narrow-band bursts often remain undetectable due to low signal-to-noise ratios (SNR) in integrated profiles. We developed DANCE, a detection tool based on cluster analysis of the original spectrum. It is specifically designed to detect and isolate weak, narrow-band FRBs, providing direct visual identification of their emission properties. This method performs density clustering on reconstructed, RFI-cleaned observational data, enabling the extraction of targeted clusters in time-frequency domain that correspond to the genuine FRB emission range. Our simulations show that DANCE successfully extracts all true signals with SNR~>5 and achieves a detection precision exceeding 93%. Furthermore, through the practical detection of FRB 20201124A, DANCE has demonstrated a significant advantage in finding previously undetectable weak bursts, particularly those with distinct narrow-band features or occurring in proximity to stronger bursts.
△ Less
Submitted 6 November, 2025;
originally announced November 2025.
-
AStF: Motion Style Transfer via Adaptive Statistics Fusor
Authors:
Hanmo Chen,
Chenghao Xu,
Jiexi Yan,
Cheng Deng
Abstract:
Human motion style transfer allows characters to appear less rigidity and more realism with specific style. Traditional arbitrary image style transfer typically process mean and variance which is proved effective. Meanwhile, similar methods have been adapted for motion style transfer. However, due to the fundamental differences between images and motion, relying on mean and variance is insufficien…
▽ More
Human motion style transfer allows characters to appear less rigidity and more realism with specific style. Traditional arbitrary image style transfer typically process mean and variance which is proved effective. Meanwhile, similar methods have been adapted for motion style transfer. However, due to the fundamental differences between images and motion, relying on mean and variance is insufficient to fully capture the complex dynamic patterns and spatiotemporal coherence properties of motion data. Building upon this, our key insight is to bring two more coefficient, skewness and kurtosis, into the analysis of motion style. Specifically, we propose a novel Adaptive Statistics Fusor (AStF) which consists of Style Disentanglement Module (SDM) and High-Order Multi-Statistics Attention (HOS-Attn). We trained our AStF in conjunction with a Motion Consistency Regularization (MCR) discriminator. Experimental results show that, by providing a more comprehensive model of the spatiotemporal statistical patterns inherent in dynamic styles, our proposed AStF shows proficiency superiority in motion style transfers over state-of-the-arts. Our code and model are available at https://github.com/CHMimilanlan/AStF.
△ Less
Submitted 6 November, 2025;
originally announced November 2025.
-
Exploring Cosmological Constraints of the Void-Lensing Cross-Correlation in the CSST Photometric Survey
Authors:
Qi Xiong,
Yan Gong,
Junhui Yan,
Furen Deng,
Hengjie Lin,
Xingchen Zhou,
Xuelei Chen,
Qi Guo,
Ming Li,
Yun Liu,
Wenxiang Pei
Abstract:
We investigate the cosmological constraints from the void-lensing cross-correlation assuming the $w$CDM model for the Chinese Space Station Survey Telescope (CSST) photometric survey. Using Jiutian simulations, we construct a mock galaxy catalog to $z=3$ covering 100 deg$^2$, which incorporates the instrumental and observational effects of the CSST. We divide the galaxy sample into seven photometr…
▽ More
We investigate the cosmological constraints from the void-lensing cross-correlation assuming the $w$CDM model for the Chinese Space Station Survey Telescope (CSST) photometric survey. Using Jiutian simulations, we construct a mock galaxy catalog to $z=3$ covering 100 deg$^2$, which incorporates the instrumental and observational effects of the CSST. We divide the galaxy sample into seven photometric-redshift (photo-$z$) tomographic bins and identify 2D voids within each bin using the Voronoi tessellation and watershed algorithm. We measure the angular cross-power spectrum between the void distribution and the weak lensing signal, and estimate the covariance matrix via jackknife resampling combined with pseudo-$C_{\ell}$ approach to account for the partial sky correction. We employ the Halo Void Dust Model (HVDM) to model the void-matter cross-power spectrum and adopt the Markov Chain Monte Carlo (MCMC) technique to implement the constraints on the cosmological and void parameters. We find that our method can accurately extract the cosmological information, and the constraint accuracies of some cosmological parameters from the void-lensing analysis are comparable or even tighter than the weak lensing only case. This demonstrates that the void-lensing serves as an effective cosmological probe and a valuable complement to galaxy photometric surveys, particularly for the Stage-IV surveys targeting the high-redshift Universe.
△ Less
Submitted 6 November, 2025;
originally announced November 2025.
-
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
Authors:
Wenrui Cai,
Chengyu Wang,
Junbing Yan,
Jun Huang,
Xiangzhong Fang
Abstract:
Recently, the demand for small and efficient reasoning models to support real-world applications has driven the development of knowledge distillation techniques that balance reasoning performance and inference speed. In this paper, we further extend the DistilQwen model family, initialized from the Qwen models, by introducing four model series specifically designed to meet industrial requirements.…
▽ More
Recently, the demand for small and efficient reasoning models to support real-world applications has driven the development of knowledge distillation techniques that balance reasoning performance and inference speed. In this paper, we further extend the DistilQwen model family, initialized from the Qwen models, by introducing four model series specifically designed to meet industrial requirements. The distilled model collection comprises: (1) slow-thinking models, optimized for reasoning tasks that require high accuracy; (2) two series of adaptive-thinking models, which dynamically adjust reasoning strategies based on input tasks to maximize efficiency across diverse scenarios; and (3) distilled reward models, which enable further reinforcement learning of reasoning models using distilled knowledge. Comprehensive evaluations across multiple benchmarks demonstrate both high inference efficiency and strong reasoning performance for these models, as well as the practical utility of distilled reward models. We further show that these models support industry practitioners by providing scalable training and inference functionalities on the Alibaba Cloud PAI (Platform for Artificial Intelligence) platform.
△ Less
Submitted 3 November, 2025;
originally announced November 2025.
-
Diffusion Models at the Drug Discovery Frontier: A Review on Generating Small Molecules versus Therapeutic Peptides
Authors:
Yiquan Wang,
Yahui Ma,
Yuhan Chang,
Jiayao Yan,
Jialin Zhang,
Minnuo Cai,
Kai Wei
Abstract:
Diffusion models have emerged as a leading framework in generative modeling, poised to transform the traditionally slow and costly process of drug discovery. This review provides a systematic comparison of their application in designing two principal therapeutic modalities: small molecules and therapeutic peptides. We dissect how the unified framework of iterative denoising is adapted to the disti…
▽ More
Diffusion models have emerged as a leading framework in generative modeling, poised to transform the traditionally slow and costly process of drug discovery. This review provides a systematic comparison of their application in designing two principal therapeutic modalities: small molecules and therapeutic peptides. We dissect how the unified framework of iterative denoising is adapted to the distinct molecular representations, chemical spaces, and design objectives of each modality. For small molecules, these models excel at structure-based design, generating novel, pocket-fitting ligands with desired physicochemical properties, yet face the critical hurdle of ensuring chemical synthesizability. Conversely, for therapeutic peptides, the focus shifts to generating functional sequences and designing de novo structures, where the primary challenges are achieving biological stability against proteolysis, ensuring proper folding, and minimizing immunogenicity. Despite these distinct challenges, both domains face shared hurdles: the scarcity of high-quality experimental data, the reliance on inaccurate scoring functions for validation, and the crucial need for experimental validation. We conclude that the full potential of diffusion models will be unlocked by bridging these modality-specific gaps and integrating them into automated, closed-loop Design-Build-Test-Learn (DBTL) platforms, thereby shifting the paradigm from mere chemical exploration to the on-demand engineering of novel~therapeutics.
△ Less
Submitted 26 November, 2025; v1 submitted 31 October, 2025;
originally announced November 2025.