-
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
Authors:
Fei Tang,
Bofan Chen,
Zhengxi Lu,
Tongbo Chen,
Songqin Nong,
Tao Jiang,
Wenhao Xu,
Weiming Lu,
Jun Xiao,
Yueting Zhuang,
Yongliang Shen
Abstract:
GUI grounding, which localizes interface elements from screenshots given natural language queries, remains challenging for small icons and dense layouts. Test-time zoom-in methods improve localization by cropping and re-running inference at higher resolution, but apply cropping uniformly across all instances with fixed crop sizes, ignoring whether the model is actually uncertain on each case. We p…
▽ More
GUI grounding, which localizes interface elements from screenshots given natural language queries, remains challenging for small icons and dense layouts. Test-time zoom-in methods improve localization by cropping and re-running inference at higher resolution, but apply cropping uniformly across all instances with fixed crop sizes, ignoring whether the model is actually uncertain on each case. We propose \textbf{UI-Zoomer}, a training-free adaptive zoom-in framework that treats both the trigger and scale of zoom-in as a prediction uncertainty quantification problem. A confidence-aware gate fuses spatial consensus among stochastic candidates with token-level generation confidence to selectively trigger zoom-in only when localization is uncertain. When triggered, an uncertainty-driven crop sizing module decomposes prediction variance into inter-sample positional spread and intra-sample box extent, deriving a per-instance crop radius via the law of total variance. Extensive experiments on ScreenSpot-Pro, UI-Vision, and ScreenSpot-v2 demonstrate consistent improvements over strong baselines across multiple model architectures, achieving gains of up to +13.4\%, +10.3\%, and +4.2\% respectively, with no additional training required.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Data-driven Learning of Probabilistic Model of Binary Droplet Collision for Spray Simulation
Authors:
Weiming Xu,
Tao Yang,
Peng Zhang
Abstract:
Binary droplet collisions are ubiquitous in dense sprays. Traditional deterministic models cannot adequately represent transitional and stochastic behaviors of binary droplet collision. To bridge this gap, we developed a probabilistic model by using a machine learning approach, the Light Gradient-Boosting Machine (LightGBM). The model was trained on a comprehensive dataset of 33,540 experimental c…
▽ More
Binary droplet collisions are ubiquitous in dense sprays. Traditional deterministic models cannot adequately represent transitional and stochastic behaviors of binary droplet collision. To bridge this gap, we developed a probabilistic model by using a machine learning approach, the Light Gradient-Boosting Machine (LightGBM). The model was trained on a comprehensive dataset of 33,540 experimental cases covering eight collision regimes across broad ranges of Weber number, Ohnesorge number, impact parameter, size ratio, and ambient pressure. The resulting machine learning classifier captures highly nonlinear regime boundaries with 99.2% accuracy and retains sensitivity in transitional regions. To facilitate its implementation in spray simulation, the model was translated into a probabilistic form, a multinomial logistic regression, which preserves 93.2% accuracy and maps continuous inter-regime transitions. A biased-dice sampling mechanism then converts these probabilities into definite yet stochastic outcomes. This work presents the first probabilistic, high-dimensional droplet collision model derived from experimental data, offering a physically consistent, comprehensive, and user-friendly solution for spray simulation.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Visual Preference Optimization with Rubric Rewards
Authors:
Ya-Qi Yu,
Fangyu Hong,
Xiangyang Qu,
Hao Wang,
Gaojie Wu,
Qiaoyu Luo,
Nuo Xu,
Huixin Wang,
Wuheng Xu,
Yongxin Liao,
Zihao Chen,
Haonan Li,
Ziming Li,
Dezhi Peng,
Minghui Liao,
Jihao Wu,
Haoyu Ren,
Dandan Tu
Abstract:
The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reasoning. We propose rDPO, a preference optimization framework based on instance-specific rubrics. For e…
▽ More
The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality differences that matter in multimodal tasks. Existing pipelines often rely on off-policy perturbations or coarse outcome-based signals, which are not well suited to fine-grained visual reasoning. We propose rDPO, a preference optimization framework based on instance-specific rubrics. For each image-instruction pair, we create a checklist-style rubric of essential and additional criteria to score responses from any possible policies. The instruction-rubric pool is built offline and reused during the construction of on-policy data. On public reward modeling benchmarks, rubric-based prompting massively improves a 30B-A3B judge and brings it close to GPT-5.4. On public downstream benchmarks, rubric-based filtering raises the macro average to 82.69, whereas outcome-based filtering drops it to 75.82 from 81.14. When evaluating scalability on a comprehensive benchmark, rDPO achieves 61.01, markedly outperforming the style-constrained baseline (52.36) and surpassing the 59.48 base model. Together, these results show that visual preference optimization benefits from combining on-policy data construction with instance-specific criterion-level feedback.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Rapid LoRA Aggregation for Wireless Channel Adaptation in Open-Set Radio Frequency Fingerprinting
Authors:
Mingxi Zhang,
Renjie Xie,
Jincheng Wang,
Guyue Li,
Wei Xu
Abstract:
Radio frequency fingerprints (RFFs) enable secure wireless authentication but struggle in open-set scenarios with unknown devices and varying channels. Existing methods face challenges in generalization and incur high computational costs. We propose a lightweight, self-adaptive RFF extraction framework using Low-Rank Adaptation (LoRA). By pretraining LoRA modules per environment, our method enable…
▽ More
Radio frequency fingerprints (RFFs) enable secure wireless authentication but struggle in open-set scenarios with unknown devices and varying channels. Existing methods face challenges in generalization and incur high computational costs. We propose a lightweight, self-adaptive RFF extraction framework using Low-Rank Adaptation (LoRA). By pretraining LoRA modules per environment, our method enables fast adaptation to unseen channel conditions without full retraining. During inference, a weighted combination of LoRAs dynamically enhances feature extraction. Experimental results demonstrate a 15% reduction in equal error rate (EER) compared to non-finetuned baselines and an 83% decrease in training time relative to full fine-tuning, using the same training dataset. This approach provides a scalable and efficient solution for open-set RFF authentication in dynamic wireless vehicular networks.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Nanoscale electrothermal-switch superconducting diode for electrically programmable superconducting circuits
Authors:
Tianyu Li,
Jiong Li,
Chong Li,
Peiyuan Huang,
Nuo-Zhou Yang,
Wuyue Xu,
Wen-Cheng Yue,
Yang-Yang Lyu,
Yihuang Xiong,
Xuecou Tu,
Tao Tao,
Xiaoqing Jia,
Qing-Hu Chen,
Huabing Wang,
Peiheng Wu,
Yong-Lei Wang
Abstract:
Superconducting diodes enable dissipationless directional transport, yet achieving electrical tunability and scalability remains a major challenge for circuit-level integration. Here, we demonstrate an electrothermal-switch superconducting diode in which a gate-controlled nanoscale hotspot dynamically breaks inversion symmetry in a superconducting nanowire. This mechanism gives rise to two coexist…
▽ More
Superconducting diodes enable dissipationless directional transport, yet achieving electrical tunability and scalability remains a major challenge for circuit-level integration. Here, we demonstrate an electrothermal-switch superconducting diode in which a gate-controlled nanoscale hotspot dynamically breaks inversion symmetry in a superconducting nanowire. This mechanism gives rise to two coexisting nonreciprocal transport regimes-one associated with a nonreciprocal superconducting-to-normal transition and the other with ratchet-like vortex dynamics-both originating from the same electrothermal-switch process. The diode exhibits efficiencies up to 42% and 60% for the two regimes, respectively, and can be electrically switched on, off, or reversed in polarity in situ by applying a small gate current. These capabilities enable programmable superconducting circuits that realize electrically reconfigurable full-wave and half-wave rectification. The lithography-compatible design, high performance, and gate-controlled functionality establish a scalable platform for programmable superconducting electronics and hybrid quantum systems.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
Authors:
Zhaofen Wu,
Hanrong Zhang,
Fulin Lin,
Wujiang Xu,
Xinran Xu,
Yankai Chen,
Henry Peng Zou,
Shaowen Chen,
Weizhi Zhang,
Xue Liu,
Philip S. Yu,
Hongwei Wang
Abstract:
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often st…
▽ More
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to evolving narratives. To address this, we propose GAM, a hierarchical Graph-based Agentic Memory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention. By isolating ongoing dialogue in an event progression graph and integrating it into a topic associative network only upon semantic shifts, our approach minimizes interference while preserving long-term consistency. Additionally, we introduce a graph-guided, multi-factor retrieval strategy to enhance context precision. Experiments on LoCoMo and LongDialQA indicate that our method consistently outperforms state-of-the-art baselines in both reasoning accuracy and efficiency.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
A collaborative agent with two lightweight synergistic models for autonomous crystal materials research
Authors:
Tongyu Shi,
Yutang Li,
Zhanyuan Li,
Qian Liu,
Jie Zhou,
Wenhe Xu,
Yang Li,
Dawei Dai,
Rui He,
Wenhua Zhou,
Jiahong Wang,
Xue-Feng Yu
Abstract:
Current large language models require hundreds of billions of parameters yet struggle with domain-specific reasoning and tool coordination in materials science. Here, we present MatBrain, a lightweight collaborative agent system with two synergistic models specialization for crystal materials research. MatBrain employs a dual-model architecture: Mat-R1 (30B parameters) as the analytical model prov…
▽ More
Current large language models require hundreds of billions of parameters yet struggle with domain-specific reasoning and tool coordination in materials science. Here, we present MatBrain, a lightweight collaborative agent system with two synergistic models specialization for crystal materials research. MatBrain employs a dual-model architecture: Mat-R1 (30B parameters) as the analytical model providing expert-level domain reasoning, and Mat-T1 (14B parameters) as the executive model orchestrating tool-based actions. Entropy analysis confirms that this architecture resolves the conflict between tool planning and analytical reasoning by decoupling their distinct entropy dynamics. Enabled by this dual-model architecture and structural efficiency, MatBrain significantly outperforms larger general-purpose models while reducing the hardware deployment barrier by over 95%. MatBrain exhibits versatility across structure generation, property prediction, and synthesis planning tasks. Applied to catalyst design, MatBrain generated 30,000 candidate structures and identified 38 promising materials within 48 hours, achieving approximately 100-fold acceleration over traditional approaches. These results demonstrate the potential of lightweight collaborative intelligence for advancing materials research capabilities.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Geometry-Aware Localized Watermarking for Copyright Protection in Embedding-as-a-Service
Authors:
Zhimin Chen,
Xiaojie Liang,
Wenbo Xu,
Yuxuan Liu,
Wei Lu
Abstract:
Embedding-as-a-Service (EaaS) has become an important semantic infrastructure for natural language and multimedia applications, but it is highly vulnerable to model stealing and copyright infringement. Existing EaaS watermarking methods face a fundamental robustness--utility--verifiability tension: trigger-based methods are fragile to paraphrasing, transformation-based methods are sensitive to dim…
▽ More
Embedding-as-a-Service (EaaS) has become an important semantic infrastructure for natural language and multimedia applications, but it is highly vulnerable to model stealing and copyright infringement. Existing EaaS watermarking methods face a fundamental robustness--utility--verifiability tension: trigger-based methods are fragile to paraphrasing, transformation-based methods are sensitive to dimensional perturbation, and region-based methods may incur false positives due to coincidental geometric affinity.
To address this problem, we propose GeoMark, a geometry-aware localized watermarking framework for EaaS copyright protection. GeoMark uses a natural in-manifold embedding as a shared watermark target, constructs geometry-separated anchors with explicit target--anchor margins, and activates watermark injection only within adaptive local neighborhoods. This design decouples where watermarking is triggered from what ownership is attributed to, achieving localized triggering and centralized attribution.
Experiments on four benchmark datasets show that GeoMark preserves downstream utility and geometric fidelity while maintaining robust copyright verification under paraphrasing, dimensional perturbation, and CSE (Clustering, Selection, Elimination) attacks, with improved verification stability and low false-positive risk.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing
Authors:
Xi Chen,
Wei Xue,
Yike Guo
Abstract:
Role-playing has garnered rising attention as it provides a strong foundation for human-machine interaction and facilitates sociological research. However, current work is confined to textual modalities, neglecting speech, which plays a predominant role in daily life, thus limiting genuine role-playing. To bridge this gap, we conceptualize and benchmark speech role-playing through ActorMindBench,…
▽ More
Role-playing has garnered rising attention as it provides a strong foundation for human-machine interaction and facilitates sociological research. However, current work is confined to textual modalities, neglecting speech, which plays a predominant role in daily life, thus limiting genuine role-playing. To bridge this gap, we conceptualize and benchmark speech role-playing through ActorMindBench, and we present a corresponding reasoning framework, called ActorMind. Specifically, (1) Speech Role-Playing enables models to deliver spontaneous responses with personalized verbal traits based on their role, the scene, and spoken dialogue. (2) ActorMindBench is a hierarchical benchmark comprises Utterance-Level content with 7,653 utterances, Scene-Level content with 313 scenes, and Role-Level content with 6 roles. (3) ActorMind is an off-the-shelf, multi-agent, chain-of-though style reasoning framework that emulates how human actors perform in theaters. Concretely, ActorMind first reads its assigned role description via Eye Agent, then comprehends emotional cues within contextual spoken dialogues through Ear Agent. Subsequently, Brain Agent generates a descriptive emotional state, and finally, Mouth Agent delivers the scripts infused with corresponding emotion state. Experimental results demonstrate the effectiveness of ActorMind in enhancing speech role-playing.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing
Authors:
Zeyue Tian,
Binxin Yang,
Zhaoyang Liu,
Jiexuan Zhang,
Ruibin Yuan,
Hubery Yin,
Qifeng Chen,
Chen Li,
Jing Lv,
Wei Xue,
Yike Guo
Abstract:
Recent progress in multimodal models has spurred rapid advances in audio understanding, generation, and editing. However, these capabilities are typically addressed by specialized models, leaving the development of a truly unified framework that can seamlessly integrate all three tasks underexplored. While some pioneering works have explored unifying audio understanding and generation, they often…
▽ More
Recent progress in multimodal models has spurred rapid advances in audio understanding, generation, and editing. However, these capabilities are typically addressed by specialized models, leaving the development of a truly unified framework that can seamlessly integrate all three tasks underexplored. While some pioneering works have explored unifying audio understanding and generation, they often remain confined to specific domains. To address this, we introduce Audio-Omni, the first end-to-end framework to unify generation and editing across general sound, music, and speech domains, with integrated multi-modal understanding capabilities. Our architecture synergizes a frozen Multimodal Large Language Model for high-level reasoning with a trainable Diffusion Transformer for high-fidelity synthesis. To overcome the critical data scarcity in audio editing, we construct AudioEdit, a new large-scale dataset comprising over one million meticulously curated editing pairs. Extensive experiments demonstrate that Audio-Omni achieves state-of-the-art performance across a suite of benchmarks, outperforming prior unified approaches while achieving performance on par with or superior to specialized expert models. Beyond its core capabilities, Audio-Omni exhibits remarkable inherited capabilities, including knowledge-augmented reasoning generation, in-context generation, and zero-shot cross-lingual control for audio generation, highlighting a promising direction toward universal generative audio intelligence. The code, model, and dataset will be publicly released on https://zeyuet.github.io/Audio-Omni.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness
Authors:
Xiaoning Dong,
Chengyan Wu,
Yajie Wen,
Yu Chen,
Yun Xue,
Jing Zhang,
Wei Xu,
Bolei Ma
Abstract:
Large Language Models (LLMs) can generate factually inaccurate content even if they have corresponding knowledge, which critically undermines their reliability. Existing approaches attempt to mitigate this by incorporating uncertainty in QA prompt during training, but these numerical scores lack the semantic richness for LLM to properly understand its internal states of trustworthiness and honestn…
▽ More
Large Language Models (LLMs) can generate factually inaccurate content even if they have corresponding knowledge, which critically undermines their reliability. Existing approaches attempt to mitigate this by incorporating uncertainty in QA prompt during training, but these numerical scores lack the semantic richness for LLM to properly understand its internal states of trustworthiness and honestness, leading to insufficient factuality alignment. We introduce FAITH (Factuality Alignment through Integrating Trustworthiness and Honestness), a post-training framework for factuality alignment that integrates natural-language uncertainty signals with external knowledge. Specifically, we augment training datasets by computing confidence scores and semantic entropy from LLM outputs and mapping them into a knowledge state quadrant that describes the model's internal knowledge possession (trustworthiness) and answering behaviors (honestness) in natural language. Based on this enhanced data, we design a reward function that considers both correctness and uncertainty signals, and fine-tune the LLM using the Proximal Policy Optimization (PPO) algorithm. To further mitigate weakly grounded responses, we design a retrieval-augmented module that retrieves relevant external passages, improving the consistency between internal and external knowledge representations. Extensive experiments on four knowledge-intensive benchmarks demonstrate that FAITH enhances the factual accuracy and truthfulness of LLMs.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Joint Observation of SGR J1935+2154 with \textit{Insight}-HXMT and KM40m during the active episode of October 2022
Authors:
Wang-Chen Xue,
Wen-Jun Tan,
Yu-Xiang Huang,
Xiao-Bo Li,
Long-Fei Hao,
Shao-Lin Xiong,
Ce Cai,
Chen-Wei Wang,
Yue Wang,
Ke-Jia Lee,
Heng Xu,
Peng Zhang,
Ming-Yu Ge,
Hao-Xuan Guo,
Yue Huang,
Cheng-Kui Li,
Jia-Cong Liu,
Yang-Zhao Ren,
Shuo Xiao,
Sheng-Lun Xie,
Shu-Xu Yi,
Zheng-Hang Yu,
Jin-Peng Zhang,
Yan-Qiu Zhang,
Chao Zheng
, et al. (10 additional authors not shown)
Abstract:
SGR J1935+2154 is the unique magnetar so far from which fast radio bursts have been detected. In October 2022, it resumed its burst activity, and we implemented a dedicated target-of-opportunity (ToO) observation on it from Oct. 13th to Nov. 1st, 2022 (about 940 ks in total) with \textit{Insight}-HXMT, while the KM40m radio telescope observed this source for about 1400 hours since Oct. 15th. We se…
▽ More
SGR J1935+2154 is the unique magnetar so far from which fast radio bursts have been detected. In October 2022, it resumed its burst activity, and we implemented a dedicated target-of-opportunity (ToO) observation on it from Oct. 13th to Nov. 1st, 2022 (about 940 ks in total) with \textit{Insight}-HXMT, while the KM40m radio telescope observed this source for about 1400 hours since Oct. 15th. We searched the LE, ME, and HE data of \textit{Insight}-HXMT in the overlapping observation time windows with the KM40m radio telescope and revealed 60 magnetar X-ray bursts (MXBs), while KM40m only detected 1 radio burst. In particular, we find that there is an X-ray burst on October 21 (denoted as MXB 221021) temporally associated with this radio burst. Interestingly, this association event shows very different morphology from those X-ray and radio association events from this source reported before (e.g., MXB/FRB 200428). Moreover, we systematically analyzed the temporal and spectral properties of the sample of MXBs during this observation and found that % the (radio-associated) MXB 221021 shows some different properties from other MXBs without associated radio bursts. These findings shed new light on the physical mechanisms of X-ray bursts and radio burst emission in magnetars.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
A Framework for Predicting Entanglement Spectra of Gapless Symmetry-Protected Topological States in One Dimension
Authors:
Wen-Tao Xu,
Frank Pollmann,
Michael Knap
Abstract:
The concept of gapped symmetry-protected topological (SPT) states has been generalized to gapless SPT (gSPT) states. Similar to gapped SPT states, gSPT states in one dimension exhibit universal degeneracies in their entanglement spectra. The entanglement spectra of gSPT states are further described by boundary conformal field theories, whose systematic prediction is a key open question. To address…
▽ More
The concept of gapped symmetry-protected topological (SPT) states has been generalized to gapless SPT (gSPT) states. Similar to gapped SPT states, gSPT states in one dimension exhibit universal degeneracies in their entanglement spectra. The entanglement spectra of gSPT states are further described by boundary conformal field theories, whose systematic prediction is a key open question. To address this problem, we focus on the class of gSPT states that are obtained by applying unitary SPT entanglers to trivial, critical states in one dimension. We find that the reduced density matrix of a non-trivial gSPT state can be obtained, either exactly or approximately, by applying a quantum channel to the reduced density matrix of the trivial gSPT state. This quantum channel acts only near the entanglement cut and modifies its corresponding conformal boundary condition, allowing us in turn to predict the boundary conformal field theory describing the entanglement spectra. We apply this framework to gSPT states protected by various symmetries and having different central charges, and further analyze the stability of boundary conditions of the entanglement cut. Our work thereby provides a framework for systematically analyzing and understanding the entanglement spectra of gSPT states.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Graph-RHO: Critical-path-aware Heterogeneous Graph Network for Long-Horizon Flexible Job-Shop Scheduling
Authors:
Yujie Li,
Jiuniu Wang,
Mugen Peng,
Guangzuo Li,
Wenjia Xu
Abstract:
Long-horizon Flexible Job-Shop Scheduling~(FJSP) presents a formidable combinatorial challenge due to complex, interdependent decisions spanning extended time horizons. While learning-based Rolling Horizon Optimization~(RHO) has emerged as a promising paradigm to accelerate solving by identifying and fixing invariant operations, its effectiveness is hindered by the structural complexity of FJSP. E…
▽ More
Long-horizon Flexible Job-Shop Scheduling~(FJSP) presents a formidable combinatorial challenge due to complex, interdependent decisions spanning extended time horizons. While learning-based Rolling Horizon Optimization~(RHO) has emerged as a promising paradigm to accelerate solving by identifying and fixing invariant operations, its effectiveness is hindered by the structural complexity of FJSP. Existing methods often fail to capture intricate graph-structured dependencies and ignore the asymmetric costs of prediction errors, in which misclassifying critical-path operations is significantly more detrimental than misclassifying non-critical ones. Furthermore, dynamic shifts in predictive confidence during the rolling process make static pruning thresholds inadequate. To address these limitations, we propose Graph-RHO, a novel critical-path-aware graph-based RHO framework. First, we introduce a topology-aware heterogeneous graph network that encodes subproblems as operation-machine graphs with multi-relational edges, leveraging edge-feature-aware message passing to predict operation stability. Second, we incorporate a critical-path-aware mechanism that injects inductive biases during training to distinguish highly sensitive bottleneck operations from robust ones. Third, we devise an adaptive thresholding strategy that dynamically calibrates decision boundaries based on online uncertainty estimation to align model predictions with the solver's search space. Extensive experiments on standard benchmarks demonstrate that \mbox{Graph-RHO} establishes a new state of the art in solution quality and computational efficiency. Remarkably, it exhibits exceptional zero-shot generalization, reducing solve time by over 30\% on large-scale instances (2000 operations) while achieving superior solution quality. Our code is available \href{https://github.com/IntelliSensing/Graph-RHO}{here}.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Inverse Energy Cascade in Turbulent Taylor-Couette Flows
Authors:
Changquan Zhou,
Hua-Shu Dou,
Lin Niu,
Wenqian Xu
Abstract:
The inverse energy cascade in turbulent Taylor-Couette flow is studied in line with the results of the large eddy simulation. The simulation results show that the inverse energy cascade first occurs within the core region of the flow channel of the Taylor-Couette flow at higher Reynolds number. It is uncovered that this phenomenon is induced by the pulsed zero shear stress resulting from the singu…
▽ More
The inverse energy cascade in turbulent Taylor-Couette flow is studied in line with the results of the large eddy simulation. The simulation results show that the inverse energy cascade first occurs within the core region of the flow channel of the Taylor-Couette flow at higher Reynolds number. It is uncovered that this phenomenon is induced by the pulsed zero shear stress resulting from the singularities of the Navier-Stokes equation. In the core area between the two cylinders, the shear stress is nearly zero at higher Reynolds number. The turbulence generated there has high turbulent energy due to discontinuity of the tangential velocity. Since the energy transfer between the fluid layers is inhibited due to the low shear stress, the turbulent energy cannot be transferred along the radial direction, and small-scale vortices with high turbulent energy are produced. These small-scale vortices are located with the large-scale vortices and cannot be dissipated owing to low shear stress. A peak in the energy spectrum at middle frequency (or wave number) is formed due to the concentration of the small-scale vortices. As the number of the singular points of the Navier-Stokes equation increases with the increasing Reynolds number, the region with zero shear stress expands along the radial direction, intensifying nonlinear instability and energy accumulation. This, in turn, leads to more prominent peaks in the energy spectrum, resulting in a more pronounced inverse energy cascade.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
ClawBench: Can AI Agents Complete Everyday Online Tasks?
Authors:
Yuxuan Zhang,
Yubo Wang,
Yipeng Zhu,
Penghui Du,
Junwen Miao,
Xuan Lu,
Wendong Xu,
Yunzhuo Hao,
Songcheng Cai,
Xiaochen Wang,
Huaisong Zhang,
Xian Wu,
Yi Lu,
Minyi Lei,
Kai Zou,
Huifeng Yin,
Ping Nie,
Liang Chen,
Dongfu Jiang,
Wenhu Chen,
Kelsey R. Allen
Abstract:
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accomplish regularly in their lives and work, spanning 144 live platforms across 15 c…
▽ More
AI agents may be able to automate your inbox, but can they automate other routine aspects of your life? Everyday online tasks offer a realistic yet unsolved testbed for evaluating the next generation of AI agents. To this end, we introduce ClawBench, an evaluation framework of 153 simple tasks that people need to accomplish regularly in their lives and work, spanning 144 live platforms across 15 categories, from completing purchases and booking appointments to submitting job applications. These tasks require demanding capabilities beyond existing benchmarks, such as obtaining relevant information from user-provided documents, navigating multi-step workflows across diverse platforms, and write-heavy operations like filling in many detailed forms correctly. Unlike existing benchmarks that evaluate agents in offline sandboxes with static pages, ClawBench operates on production websites, preserving the full complexity, dynamic nature, and challenges of real-world web interaction. A lightweight interception layer captures and blocks only the final submission request, ensuring safe evaluation without real-world side effects. Our evaluations of 7 frontier models show that both proprietary and open-source models can complete only a small portion of these tasks. For example, Claude Sonnet 4.6 achieves only 33.3%. Progress on ClawBench brings us closer to AI agents that can function as reliable general-purpose assistants.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
GAN-based Domain Adaptation for Image-aware Layout Generation in Advertising Poster Design
Authors:
Chenchen Xu,
Min Zhou,
Tiezheng Ge,
Weiwei Xu
Abstract:
Layout plays a crucial role in graphic design and poster generation. Recently, the application of deep learning models for layout generation has gained significant attention. This paper focuses on using a GAN-based model conditioned on images to generate advertising poster graphic layouts, requiring a dataset of paired product images and layouts. To address this task, we introduce the Content-awar…
▽ More
Layout plays a crucial role in graphic design and poster generation. Recently, the application of deep learning models for layout generation has gained significant attention. This paper focuses on using a GAN-based model conditioned on images to generate advertising poster graphic layouts, requiring a dataset of paired product images and layouts. To address this task, we introduce the Content-aware Graphic Layout Dataset (CGL-Dataset), consisting of 60,548 paired inpainted posters with annotations and 121,000 clean product images. The inpainting artifacts introduce a domain gap between the inpainted posters and clean images. To bridge this gap, we design two GAN-based models. The first model, CGL-GAN, uses Gaussian blur on the inpainted regions to generate layouts. The second model combines unsupervised domain adaptation by introducing a GAN with a pixel-level discriminator (PD), abbreviated as PDA-GAN, to generate image-aware layouts based on the visual texture of input images. The PD is connected to shallow-level feature maps and computes the GAN loss for each input-image pixel. Additionally, we propose three novel content-aware metrics to assess the model's ability to capture the intricate relationships between graphic elements and image content. Quantitative and qualitative evaluations demonstrate that PDA-GAN achieves state-of-the-art performance and generates high-quality image-aware layouts.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
CRB-Based Waveform Optimization for MIMO ISAC Systems With One-Bit ADCs
Authors:
Qi Lin,
Hong Shen,
Wei Xu,
Chunming Zhao
Abstract:
This paper studies the transmit waveform optimization for a quantized multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system, where one-bit analog-to-digital converters (ADCs) are employed to enable a low-cost and power-efficient hardware implementation. Focusing on the parameter estimation task, we propose two novel Cramér-Rao bounds (CRBs) for both point-like ta…
▽ More
This paper studies the transmit waveform optimization for a quantized multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system, where one-bit analog-to-digital converters (ADCs) are employed to enable a low-cost and power-efficient hardware implementation. Focusing on the parameter estimation task, we propose two novel Cramér-Rao bounds (CRBs) for both point-like target (PT) and extended target (ET) to characterize the impact of quantization distortion on the estimation accuracy, where associated estimation methods are also developed to approach these theoretical CRBs. Moreover, with the goal of jointly enhancing the sensing and communication performances, we formulate the bi-criterion ISAC waveform optimization problem by minimizing the derived CRB objectives subject to a communication symbol error probability (SEP) constraint and a total power constraint, which, due to the high nonlinearity of the one-bit CRBs, are extremely nonconvex. To yield a high-quality suboptimal solution, we develop an efficient alternating direction method of multipliers (ADMM) framework which exploits the majorization-minimization (MM) technique to address the nonconvex issue. Simulation results verify that the one-bit CRBs are tight for characterizing the quantized estimation performance and the proposed estimation methods also show clear performance advantages over the existing benchmark schemes. Furthermore, a flexible trade-off between the CRB and the SEP performance can be achieved by the developed ADMM framework, demonstrating the effectiveness of the optimized ISAC waveform.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Not all tokens contribute equally to diffusion learning
Authors:
Guoqing Zhang,
Lu Shi,
Wanru Xu,
Linna Zhang,
Sen Wang,
Fangfang Wang,
Yigang Cen
Abstract:
With the rapid development of conditional diffusion models, significant progress has been made in text-to-video generation. However, we observe that these models often neglect semantically important tokens during inference, leading to biased or incomplete generations under classifier-free guidance. We attribute this issue to two key factors: distributional bias caused by the long-tailed token freq…
▽ More
With the rapid development of conditional diffusion models, significant progress has been made in text-to-video generation. However, we observe that these models often neglect semantically important tokens during inference, leading to biased or incomplete generations under classifier-free guidance. We attribute this issue to two key factors: distributional bias caused by the long-tailed token frequency in training data, and spatial misalignment in cross-attention where semantically important tokens are overshadowed by less informative ones. To address these issues, we propose Distribution-Aware Rectification and Spatial Ensemble (DARE), a unified framework that improves semantic guidance in diffusion models from the perspectives of distributional debiasing and spatial consistency. First, we introduce Distribution-Rectified Classifier-Free Guidance (DR-CFG), which regularizes the training process by dynamically suppressing dominant tokens with low semantic density, encouraging the model to better capture underrepresented semantic cues and learn a more balanced conditional distribution. This design mitigates the risk of the model distribution overfitting to tokens with low semantic density. Second, we propose Spatial Representation Alignment (SRA), which adaptively reweights cross-attention maps according to token importance and enforces representation consistency, enabling semantically important tokens to exert stronger spatial guidance during generation. This mechanism effectively prevents low semantic-density tokens from dominating the attention allocation, thereby avoiding the dilution of the spatial and distributional guidance provided by high semantic-density tokens. Extensive experiments on multiple benchmark datasets demonstrate that DARE consistently improves generation fidelity and semantic alignment, achieving significant gains over existing approaches.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Solitary wave structure of transitional flow in the wake of a sphere
Authors:
Lin Niu,
Hua-Shu Dou,
Changquan Zhou,
Wenqian Xu
Abstract:
The soliton-like coherent structure (SCS), which has been verified to exist in both transitional and turbulent boundary layers1-4, still poses a challenge in the understanding of its formation and behavior. In our previous study (Niu et al.5), the SCS was also found to exist in the transitional wake flow behind a sphere. In present study, the formation and evolution of the SCS is further investiga…
▽ More
The soliton-like coherent structure (SCS), which has been verified to exist in both transitional and turbulent boundary layers1-4, still poses a challenge in the understanding of its formation and behavior. In our previous study (Niu et al.5), the SCS was also found to exist in the transitional wake flow behind a sphere. In present study, the formation and evolution of the SCS is further investigated at four Reynolds numbers by numerical simulation. The results show that at the early stage of the turbulence transition, the SCS appears as a form of wave packet during the Tollmien-Schlichting (T-S) wave stage. With the increase of the Reynolds number, the SCS reaches its maximum amplitude downstream where the velocity discontinuity occurs. This position is located after the breakdown of the T-S wave and the three-dimensional structure is formed. Then, the SCS conserves its shape and amplitude over a long distance downstream. The relationships among the SCS, the spikes, the vortex structures, and the high-shear layers are further analyzed. It is found that the SCS in the wake flow has similarities to the phenomena observed in boundary layer flows during the turbulent transition. The vortex structures and high-shear layers mostly wrap around the border of the SCS. The vortex structure is considered to be as a consequence of the development of the SCS rather than its cause.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
A Multi-Agent Framework for Automated Exploit Generation with Constraint-Guided Comprehension and Reflection
Authors:
Siyi Chen,
Tianhan Luo,
Shijian Wu,
Xiangyu Liu,
Yilin Zhou,
Qi Li,
Wenyuan Xu
Abstract:
Open-source libraries are widely used in modern software development, introducing significant security vulnerabilities. While static analysis tools can identify potential vulnerabilities at scale, they often generate overwhelming reports with high false positive rates. Automated Exploit Generation (AEG) emerges as a promising solution to confirm vulnerability authenticity by generating an exploit.…
▽ More
Open-source libraries are widely used in modern software development, introducing significant security vulnerabilities. While static analysis tools can identify potential vulnerabilities at scale, they often generate overwhelming reports with high false positive rates. Automated Exploit Generation (AEG) emerges as a promising solution to confirm vulnerability authenticity by generating an exploit. However, traditional AEG approaches based on fuzzing or symbolic execution face path coverage and constraint-solving problems. Although LLMs show great potential for AEG, how to effectively leverage them to comprehend vulnerabilities and generate corresponding exploits is still an open question.
To address these challenges, we propose Vulnsage, a multi-agent framework for AEG. Vulnsage simulates human security researchers' workflows by decomposing the complex AEG process into multiple specialized sub-agents: Code Analyzer Agent, Code Generation Agent, Validation Agent, and a set of Reflection Agents, orchestrated by a central supervisor through iterative cycles. Given a target program, the Code Analyzer Agent performs static analysis to identify potential vulnerabilities and collects relevant information for each one. The Code Generation Agent then utilizes an LLM to generate candidate exploits. The Validation Agent and Reflection Agents form a feedback-driven self-refinement loop that uses execution traces and runtime error analysis to either improve the exploit iteratively or reason about the false positive alert.
Experimental evaluation demonstrates that Vulnsage succeeds in generating 34.64\% more exploits than state-of-the-art tools such as \explodejs. Furthermore, Vulnsage has successfully discovered and verified 146 zero-day vulnerabilities in real-world scenarios, demonstrating its practical effectiveness for assisting security assessment in software supply chains.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA
Authors:
Hye Sun Yun,
Geetika Kapoor,
Michael Mackert,
Ramez Kouzy,
Wei Xu,
Junyi Jessy Li,
Byron C. Wallace
Abstract:
Patients are increasingly turning to large language models (LLMs) with medical questions that are complex and difficult to articulate clearly. However, LLMs are sensitive to prompt phrasings and can be influenced by the way questions are worded. Ideally, LLMs should respond consistently regardless of phrasing, particularly when grounded in the same underlying evidence. We investigate this through…
▽ More
Patients are increasingly turning to large language models (LLMs) with medical questions that are complex and difficult to articulate clearly. However, LLMs are sensitive to prompt phrasings and can be influenced by the way questions are worded. Ideally, LLMs should respond consistently regardless of phrasing, particularly when grounded in the same underlying evidence. We investigate this through a systematic evaluation in a controlled retrieval-augmented generation (RAG) setting for medical question answering (QA), where expert-selected documents are used rather than retrieved automatically. We examine two dimensions of patient query variation: question framing (positive vs. negative) and language style (technical vs. plain language). We construct a dataset of 6,614 query pairs grounded in clinical trial abstracts and evaluate response consistency across eight LLMs. Our findings show that positively- and negatively-framed pairs are significantly more likely to produce contradictory conclusions than same-framing pairs. This framing effect is further amplified in multi-turn conversations, where sustained persuasion increases inconsistency. We find no significant interaction between framing and language style. Our results demonstrate that LLM responses in medical QA can be systematically influenced through query phrasing alone, even when grounded in the same evidence, highlighting the importance of phrasing robustness as an evaluation criterion for RAG-based systems in high-stakes settings.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation
Authors:
Junwei Pan,
Wei Xue,
Chao Zhou,
Xing Zhou,
Lunan Fan,
Yanbo Wang,
Haoran Xin,
Zhiyu Hu,
Yaozheng Wang,
Fengye Xu,
Yurong Yang,
Xiaotian Li,
Junbang Huo,
Wentao Ning,
Yuliang Sun,
Chengguo Yin,
Jun Zhang,
Shudong Huang,
Lei Xiao,
Huan Yu,
Irwin King,
Haijie Gu,
Jie Jiang
Abstract:
Generative recommender systems are rapidly emerging as a new paradigm for recommendation, where collaborative identifiers and/or multi-modal content are mapped into discrete token spaces and user behavior is modelled with autoregressive sequence models. Despite progress on multi-modal recommendation datasets, there is still a lack of public benchmarks that jointly offer large-scale, realistic and…
▽ More
Generative recommender systems are rapidly emerging as a new paradigm for recommendation, where collaborative identifiers and/or multi-modal content are mapped into discrete token spaces and user behavior is modelled with autoregressive sequence models. Despite progress on multi-modal recommendation datasets, there is still a lack of public benchmarks that jointly offer large-scale, realistic and fully all-modality data designed specifically for generative recommendation (GR) in industrial advertising. To foster research in this direction, we organised the Tencent Advertising Algorithm Challenge 2025, a global competition built on top of two all-modality datasets for GR: TencentGR-1M and TencentGR-10M. Both datasets are constructed from real de-identified Tencent Ads logs and contain rich collaborative IDs and multi-modal representations extracted with state-of-the-art embedding models. The preliminary track (TencentGR-1M) provides 1 million user sequences with up to 100 interacted items each, where each interaction is labeled with exposure and click signals, while the final track (TencentGR-10M) scales this to 10 million users and explicitly distinguishes between click and conversion events at both the sequence and target level. This paper presents the task definition, data construction process, feature schema, baseline GR model, evaluation protocol, and key findings from top-ranked and award-winning solutions. Our datasets focus on multi-modal sequence generation in an advertising setting and introduce weighted evaluation for high-value conversion events. We release our datasets at https://huggingface.co/datasets/TAAC2025 and baseline implementations at https://github.com/TencentAdvertisingAlgorithmCompetition/baseline_2025 to enable future research on all-modality generative recommendation at an industrial scale. The official website is https://algo.qq.com/2025.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
STDDN: A Physics-Guided Deep Learning Framework for Crowd Simulation
Authors:
Zijin Liu,
Xu Geng,
Wenshuai Xu,
Xiang Zhao,
Yan Xia,
You Song
Abstract:
Accurate crowd simulation is crucial for public safety management, emergency evacuation planning, and intelligent transportation systems. However, existing methods, which typically model crowds as a collection of independent individual trajectories, are limited in their ability to capture macroscopic physical laws. This microscopic approach often leads to error accumulation and compromises simulat…
▽ More
Accurate crowd simulation is crucial for public safety management, emergency evacuation planning, and intelligent transportation systems. However, existing methods, which typically model crowds as a collection of independent individual trajectories, are limited in their ability to capture macroscopic physical laws. This microscopic approach often leads to error accumulation and compromises simulation stability. Furthermore, deep learning-driven methods tend to suffer from low inference efficiency and high computational overhead, making them impractical for large-scale, efficient simulations. To address these challenges, we propose the Spatio-Temporal Decoupled Differential Equation Network (STDDN), a novel framework that guides microscopic trajectory prediction with macroscopic physics. We innovatively introduce the continuity equation from fluid dynamics as a strong physical constraint. A Neural Ordinary Differential Equation (Neural ODE) is employed to model the macroscopic density evolution driven by individual movements, thereby physically regularizing the microscopic trajectory prediction model. We design a density-velocity coupled dynamic graph learning module to formulate the derivative of the density field within the Neural ODE, effectively mitigating error accumulation. We also propose a differentiable density mapping module to eliminate discontinuous gradients caused by discretization and introduce a cross-grid detection module to accurately model the impact of individual cross-grid movements on local density changes. The proposed STDDN method has demonstrated significantly superior simulation performance compared to state-of-the-art methods on long-term tasks across four real-world datasets, as well as a major reduction in inference latency.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
The Holographic QCD Axion in Five Dimensions
Authors:
Csaba Csáki,
Eric Kuflik,
Wei Xue,
Taewook Youn
Abstract:
We present a holographic construction of the QCD axion based on a warped 5D model. A key ingredient of our setup is the introduction of a bulk scalar field $θ$, which is holographically dual to the topological operator of QCD. This makes the relation among the axion, the $η'$, and the anomalies transparent. We identify the bulk modes corresponding to the $η'$ and axion states, and show that an adj…
▽ More
We present a holographic construction of the QCD axion based on a warped 5D model. A key ingredient of our setup is the introduction of a bulk scalar field $θ$, which is holographically dual to the topological operator of QCD. This makes the relation among the axion, the $η'$, and the anomalies transparent. We identify the bulk modes corresponding to the $η'$ and axion states, and show that an adjustment analogous to that of the usual 4D axion takes place. We identify the origin of the axion quality problem in this framework and show that a large degree of axion compositeness is needed to solve it. We also find that, in the limit of a high quality axion, the physical axion state is predominantly contained in the bulk gauge field.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
Beyond Fixed Inference: Quantitative Flow Matching for Adaptive Image Denoising
Authors:
Jigang Duan,
Genwei Ma,
Xu Jiang,
Wenfeng Xu,
Ping Yang,
Xing Zhao
Abstract:
Diffusion and flow-based generative models have shown strong potential for image restoration. However, image denoising under unknown and varying noise conditions remains challenging, because the learned vector fields may become inconsistent across different noise levels, leading to degraded restoration quality under mismatch between training and inference. To address this issue, we propose a quant…
▽ More
Diffusion and flow-based generative models have shown strong potential for image restoration. However, image denoising under unknown and varying noise conditions remains challenging, because the learned vector fields may become inconsistent across different noise levels, leading to degraded restoration quality under mismatch between training and inference. To address this issue, we propose a quantitative flow matching framework for adaptive image denoising. The method first estimates the input noise level from local pixel statistics, and then uses this quantitative estimate to adapt the inference trajectory, including the starting point, the number of integration steps, and the step-size schedule. In this way, the denoising process is better aligned with the actual corruption level of each input, reducing unnecessary computation for lightly corrupted images while providing sufficient refinement for heavily degraded ones. By coupling quantitative noise estimation with noise-adaptive flow inference, the proposed method improves both restoration accuracy and inference efficiency. Extensive experiments on natural, medical, and microscopy images demonstrate its robustness and strong generalization across diverse noise levels and imaging conditions.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
GECAM discovery of a peculiar magnetar X-ray burst (MXB 221120) from SGR J1935+2154 associated with a fast radio burst
Authors:
Wen-Jun Tan,
Yue Wang,
Chen-Wei Wang,
Shao-Lin Xiong,
Xiao-Bo Li,
Shuang-Nan Zhang,
Ce Cai,
Wang-Chen Xue,
Peng Zhang,
Bo-Bing Wu,
Zheng-Hua An,
Ming Gao,
Ming-Yu Ge,
Ke Gong,
Dong-Ya Guo,
Hao-Xuan Guo,
Long-Fei Hao,
Yue Huang,
Yu-Xiang Huang,
Ke-Jia Lee,
Bing Li,
Kui-Cheng Li,
Xin-Qiao Li,
Jia-Cong Liu,
Xiao-Jing Liu
, et al. (28 additional authors not shown)
Abstract:
Fast radio bursts (FRBs) are enigmatic cosmic transients of millisecond duration observed in the radio band. The identification of FRB-associated magnetar X-ray bursts (MXBs) from galactic magnetar SGR J1935+2154 suggests that at least a fraction of FRBs can be produced from magnetar activity. However, the sample size of FRB-associated MXBs is still very small. Here we report a bright and peculiar…
▽ More
Fast radio bursts (FRBs) are enigmatic cosmic transients of millisecond duration observed in the radio band. The identification of FRB-associated magnetar X-ray bursts (MXBs) from galactic magnetar SGR J1935+2154 suggests that at least a fraction of FRBs can be produced from magnetar activity. However, the sample size of FRB-associated MXBs is still very small. Here we report a bright and peculiar FRB-associated MXB from SGR J1935+2154 detected by GECAM on November 20, 2022, dubbed MXB 221120. We find that both temporal and spectral properties of MXB 221120 exhibit distinctive features. Its light curve could be generally described by a single FRED function with superposition of several narrow pulses. Interestingly, we identify a possible QPO feature with center frequency of ~18 Hz in this MXB. The time-integrated spectrum is best fitted by a blackbody model with temperature (kT ) of 18.6 keV, rendering it the first thermal spectrum FRB-associated MXB from SGR J1935+2154. Compared to other MXBs with single emission episode, MXB 221120 has longer duration and higher blackbody temperature, making it an outlier in the burst sample. These results indicate that MXB 221120 may be produced by a special mechanism with extreme physical conditions.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
Behavior and Sublinear Algorithm for Opinion Disagreement on Noisy Social Networks
Authors:
Wanyue Xu,
Yubo Sun,
Mingzhe Zhu,
Zuobai Zhang,
Zhongzhi Zhang
Abstract:
The phenomenon of opinion disagreement has been empirically observed and reported in the literature, which is affected by various factors, such as the structure of social networks. An important discovery in network science is that most real-life networks, including social networks, are scale-free and sparse. In this paper, we study noisy opinion dynamics in sparse scale-free social networks to unc…
▽ More
The phenomenon of opinion disagreement has been empirically observed and reported in the literature, which is affected by various factors, such as the structure of social networks. An important discovery in network science is that most real-life networks, including social networks, are scale-free and sparse. In this paper, we study noisy opinion dynamics in sparse scale-free social networks to uncover the influence of power-law topology on opinion disagreement. We adopt the popular discrete-time DeGroot model for opinion dynamics in a graph, where nodes' opinions are subject to white noise. We first study opinion disagreement in many realistic and model networks with a scale-free topology, which approaches a constant, indicating that a scale-free structure is resistant to noise in the opinion dynamics. Moreover, existing algorithms for estimating opinion disagreement are computationally impractical for large-scale networks due to their high computational complexity. To solve this challenge, we introduce a sublinear-time algorithm to approximate this quantity with a theoretically guaranteed error. This algorithm efficiently simulates truncated random walks starting from a subset of nodes while preserving accurate estimation. Extensive experiments demonstrate its efficiency, accuracy, and scalability.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
DOne: Decoupling Structure and Rendering for High-Fidelity Design-to-Code Generation
Authors:
Xinhao Huang,
Jinke Yu,
Wenhao Xu,
Zeyi Wen,
Ying Zhou,
Junzhuo Liu,
Junhao Ji,
Zulong Chen
Abstract:
While Vision Language Models (VLMs) have shown promise in Design-to-Code generation, they suffer from a "holistic bottleneck-failing to reconcile high-level structural hierarchy with fine-grained visual details, often resulting in layout distortions or generic placeholders. To bridge this gap, we propose DOne, an end-to-end framework that decouples structure understanding from element rendering. D…
▽ More
While Vision Language Models (VLMs) have shown promise in Design-to-Code generation, they suffer from a "holistic bottleneck-failing to reconcile high-level structural hierarchy with fine-grained visual details, often resulting in layout distortions or generic placeholders. To bridge this gap, we propose DOne, an end-to-end framework that decouples structure understanding from element rendering. DOne introduces (1) a learned layout segmentation module to decompose complex designs, avoiding the limitations of heuristic cropping; (2) a specialized hybrid element retriever to handle the extreme aspect ratios and densities of UI components; and (3) a schema-guided generation paradigm that bridges layout and code. To rigorously assess performance, we introduce HiFi2Code, a benchmark featuring significantly higher layout complexity than existing datasets. Extensive evaluations on the HiFi2Code demonstrate that DOne outperforms exiting methods in both high-level visual similarity (e.g., over 10% in GPT Score) and fine-grained element alignment. Human evaluations confirm a 3 times productivity gain with higher visual fidelity.
△ Less
Submitted 11 March, 2026;
originally announced April 2026.
-
Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis
Authors:
Xingxing Weng,
Ruifeng Ni,
Chao Pang,
XiangYu Hao,
Yishan Wang,
Xiaokang Zhang,
Wei Xu,
Gui-Song Xia
Abstract:
Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually adapt without catastrophic forgetting. Despite its practical importance, the con…
▽ More
Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually adapt without catastrophic forgetting. Despite its practical importance, the continual learning capability of RS VLMs remains underexplored, and no dedicated benchmark currently exists. In this work, we present CLeaRS, a comprehensive benchmark for continual vision-language learning in remote sensing. CLeaRS comprises 10 curated subsets with over 207k image-text pairs, spanning diverse interpretation tasks, sensing modalities, and application scenarios. We further define three evaluation protocols: long-horizon, modality-incremental, and task-incremental settings, to systematically assess continual adaptation. Extensive benchmarking of diverse vision-language models reveals catastrophic forgetting across all settings. Moreover, representative continual learning methods, when adapted to RS VLMs, exhibit limited effectiveness in handling task, instruction, and modality transitions. Our findings underscore the need for developing continual learning methods tailored to RS VLMs.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
ASI-Evolve: AI Accelerates AI
Authors:
Weixian Xu,
Tiantian Mi,
Yixiu Liu,
Yang Nan,
Zhimeng Zhou,
Lyumanshan Ye,
Lin Zhang,
Yu Qiao,
Pengfei Liu
Abstract:
Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic framework for AI-for-AI research that closes this loop through a learn-design-expe…
▽ More
Can AI accelerate the development of AI itself? While recent agentic systems have shown strong performance on well-scoped tasks with rapid feedback, it remains unclear whether they can tackle the costly, long-horizon, and weakly supervised research loops that drive real AI progress. We present ASI-Evolve, an agentic framework for AI-for-AI research that closes this loop through a learn-design-experiment-analyze cycle. ASI-Evolve augments standard evolutionary agents with two key components: a cognition base that injects accumulated human priors into each round of exploration, and a dedicated analyzer that distills complex experimental outcomes into reusable insights for future iterations. To our knowledge, ASI-Evolve is the first unified framework to demonstrate AI-driven discovery across three central components of AI development: data, architectures, and learning algorithms. In neural architecture design, it discovered 105 SOTA linear attention architectures, with the best discovered model surpassing DeltaNet by +0.97 points, nearly 3x the gain of recent human-designed improvements. In pretraining data curation, the evolved pipeline improves average benchmark performance by +3.96 points, with gains exceeding 18 points on MMLU. In reinforcement learning algorithm design, discovered algorithms outperform GRPO by up to +12.5 points on AMC32, +11.67 points on AIME24, and +5.04 points on OlympiadBench. We further provide initial evidence that this AI-for-AI paradigm can transfer beyond the AI stack through experiments in mathematics and biomedicine. Together, these results suggest that ASI-Evolve represents a promising step toward enabling AI to accelerate AI across the foundational stages of development, offering early evidence for the feasibility of closed-loop AI research.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
Causality-inspired Federated Learning for Dynamic Spatio-Temporal Graphs
Authors:
Yuxuan Liu,
Wenchao Xu,
Haozhao Wang,
Zhiming He,
Zhaofeng Shi,
Chongyang Xu,
Peichao Wang,
Boyuan Zhang
Abstract:
Federated Graph Learning (FGL) has emerged as a powerful paradigm for decentralized training of graph neural networks while preserving data privacy. However, existing FGL methods are predominantly designed for static graphs and rely on parameter averaging or distribution alignment, which implicitly assume that all features are equally transferable across clients, overlooking both the spatial and t…
▽ More
Federated Graph Learning (FGL) has emerged as a powerful paradigm for decentralized training of graph neural networks while preserving data privacy. However, existing FGL methods are predominantly designed for static graphs and rely on parameter averaging or distribution alignment, which implicitly assume that all features are equally transferable across clients, overlooking both the spatial and temporal heterogeneity and the presence of client-specific knowledge in real-world graphs. In this work, we identify that such assumptions create a vicious cycle of spurious representation entanglement, client-specific interference, and negative transfer, degrading generalization performance in Federated Learning over Dynamic Spatio-Temporal Graphs (FSTG). To address this issue, we propose a novel causality-inspired framework named SC-FSGL, which explicitly decouples transferable causal knowledge from client-specific noise through representation-level interventions. Specifically, we introduce a Conditional Separation Module that simulates soft interventions through client conditioned masks, enabling the disentanglement of invariant spatio-temporal causal factors from spurious signals and mitigating representation entanglement caused by client heterogeneity. In addition, we propose a Causal Codebook that clusters causal prototypes and aligns local representations via contrastive learning, promoting cross-client consistency and facilitating knowledge sharing across diverse spatio-temporal patterns. Experiments on five diverse heterogeneity Spatio-Temporal Graph (STG) datasets show that SC-FSGL outperforms state-of-the-art methods.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
Comprehensive Measurement of Spectral Evolution in a GRB Flare: High Time-Resolution Insights into the "Double-Tracking" Phenomenon
Authors:
Zheng-Hang Yu,
Wen-Jun Tan,
Chen-Wei Wang,
Shao-Lin Xiong,
Chao Zheng,
Peng Zhang,
Hao-Xuan Guo,
Zheng-Hua An,
Ce Cai,
Min Gao,
Ke Gong,
Dong-Ya Guo,
Yue Huang,
Bing Li,
Cheng-Kui Li,
Xiao-Bo Li,
Xin-Qiao Li,
Jia-Cong Liu,
Ya-Qing Liu,
Xiao-Jing Liu,
Xiang Ma,
Wen-Xi Peng,
Rui Qiao,
Yang-Zhao Ren,
Li-Ming Song
, et al. (19 additional authors not shown)
Abstract:
The spectral evolution characteristics of the prompt emission in gamma-ray bursts (GRBs) have been extensively studied, but detailed investigations of spectral evolution in a GRB flare remain lacking. In this work, we present the first analysis of spectral parameter evolution in a GRB flare through high time-resolved spectral fitting of the Brightest Flare in GRB 221009A. We find that the $α$-Flux…
▽ More
The spectral evolution characteristics of the prompt emission in gamma-ray bursts (GRBs) have been extensively studied, but detailed investigations of spectral evolution in a GRB flare remain lacking. In this work, we present the first analysis of spectral parameter evolution in a GRB flare through high time-resolved spectral fitting of the Brightest Flare in GRB 221009A. We find that the $α$-Flux, $E_p$-Flux, and $E_p$-$α$ relationships during both the overall phase and the rise phase of flare can be well described by simple power-law model, showing positive correlations. Therefore, we conclude that Brightest Flare exhibits "Double-tracking" behavior. Since values of $α$ do not exceed the synchrotron "death line" (-2/3), we explain this phenomenon using a magnetic dissipation synchrotron radiation model. In the decay phase of flare, the $E_p$-Flux and $E_p$-$α$ correlations become notably flatter, with their power-law indices decreasing significantly compared to those in the rise phase. This may be due to the fact that the next flare begins to erupt before the Brightest Flare has completely ended, resulting in the combined effects of both two flares. Our study of spectral parameter relations of the Brightest Flare provides new insights into the radiation mechanisms of both GRB prompt emission and flares.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Bundle EXTRA for Decentralized Optimization
Authors:
Haijuan Liu,
Zhuoqing Zheng,
Cong Li,
Wenying Xu,
Xuyang Wu
Abstract:
Decentralized primal-dual methods are widely used for solving decentralized optimization problems, but their updates often rely on the potentially crude first-order Taylor approximations of the objective functions, which can limit convergence speed. To overcome this, we replace the first-order Taylor approximation in the primal update of EXTRA, which can be interpreted as a primal-dual method, wit…
▽ More
Decentralized primal-dual methods are widely used for solving decentralized optimization problems, but their updates often rely on the potentially crude first-order Taylor approximations of the objective functions, which can limit convergence speed. To overcome this, we replace the first-order Taylor approximation in the primal update of EXTRA, which can be interpreted as a primal-dual method, with a more accurate multi-cut bundle model, resulting in a fully decentralized bundle EXTRA method. The bundle model incorporates historical information to improve the approximation accuracy, potentially leading to faster convergence. Under mild assumptions, we show that a KKT residual converges to zero. Numerical experiments on decentralized least-squares problems demonstrate that, compared to EXTRA, the bundle EXTRA method converges faster and is more robust to step-size choices.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
You Only Erase Once: Erasing Anything without Bringing Unexpected Content
Authors:
Yixing Zhu,
Qing Zhang,
Wenju Xu,
Wei-Shi Zheng
Abstract:
We present YOEO, an approach for object erasure. Unlike recent diffusion-based methods which struggle to erase target objects without generating unexpected content within the masked regions due to lack of sufficient paired training data and explicit constraint on content generation, our method allows to produce high-quality object erasure results free of unwanted objects or artifacts while faithfu…
▽ More
We present YOEO, an approach for object erasure. Unlike recent diffusion-based methods which struggle to erase target objects without generating unexpected content within the masked regions due to lack of sufficient paired training data and explicit constraint on content generation, our method allows to produce high-quality object erasure results free of unwanted objects or artifacts while faithfully preserving the overall context coherence to the surrounding content. We achieve this goal by training an object erasure diffusion model on unpaired data containing only large-scale real-world images, under the supervision of a sundries detector and a context coherence loss that are built upon an entity segmentation model. To enable more efficient training and inference, a diffusion distillation strategy is employed to train for a few-step erasure diffusion model. Extensive experiments show that our method outperforms the state-of-the-art object erasure methods. Code will be available at https://zyxunh.github.io/YOEO-ProjectPage/.
△ Less
Submitted 29 March, 2026;
originally announced March 2026.
-
RAGent: Physics-Aware Agentic Reasoning for Training-Free mmWave Human Activity Recognition
Authors:
Mingda Han,
Huanqi Yang,
Zehua Sun,
Wenhao Li,
Yanni Yang,
Guoming Zhang,
Yetong Cao,
Weitao Xu,
Pengfei Hu
Abstract:
Millimeter-wave (mmWave) radar enables privacy-preserving human activity recognition (HAR), yet real-world deployment remains hindered by costly annotation and poor transferability under domain shift. Although prior efforts partially alleviate these challenges, most still require retraining or adaptation for each new deployment setting. This keeps mmWave HAR in a repeated collect-tune-redeploy cyc…
▽ More
Millimeter-wave (mmWave) radar enables privacy-preserving human activity recognition (HAR), yet real-world deployment remains hindered by costly annotation and poor transferability under domain shift. Although prior efforts partially alleviate these challenges, most still require retraining or adaptation for each new deployment setting. This keeps mmWave HAR in a repeated collect-tune-redeploy cycle, making scalable real-world deployment difficult. In this paper, we present RAGent, a deployment-time training-free framework for mmWave HAR that reformulates recognition as evidence-grounded inference over reusable radar knowledge rather than deployment-specific model optimization. Offline, RAGent constructs a reusable radar knowledge base through constrained cross-modal supervision, where a Vision-Language Model (VLM) transfers activity semantics from synchronized videos to paired radar segments without manual radar annotation. At deployment time, RAGent recognizes activities from radar alone by retrieving physically comparable precedents in an explicit kinematic space and resolving the final label through structured multi-role reasoning. The reasoning protocol is further refined offline through zero-gradient self-evolution. Extensive experiments on a self-collected dataset show that RAGent achieves 93.39% accuracy without per-domain retraining or target-domain adaptation, while generalizing robustly across domains.
△ Less
Submitted 29 March, 2026;
originally announced March 2026.
-
VoxAnchor: Grounding Speech Authenticity in Throat Vibration via mmWave Radar
Authors:
Mingda Han,
Huanqi Yang,
Chaoqun Li,
Wenhao Li,
Guoming Zhang,
Yanni Yang,
Yetong Cao,
Weitao Xu,
Pengfei Hu
Abstract:
Rapid advances in speech synthesis and audio editing have made realistic forgeries increasingly accessible, yet existing detection methods remain vulnerable to tampering or depend on visual/wearable sensors. In this paper, we present VoxAnchor, a system that physically grounds audio authentication in vocal dynamics by leveraging the inherent coherence between speech acoustics and radar-sensed thro…
▽ More
Rapid advances in speech synthesis and audio editing have made realistic forgeries increasingly accessible, yet existing detection methods remain vulnerable to tampering or depend on visual/wearable sensors. In this paper, we present VoxAnchor, a system that physically grounds audio authentication in vocal dynamics by leveraging the inherent coherence between speech acoustics and radar-sensed throat vibrations. VoxAnchor uses contactless millimeter-wave radar to capture fine-grained throat vibrations that are tightly coupled with human speech production, establishing a hard-to-forge anchor rooted in human physiology. The design comprises three main components: (1) a cross-modal frame-work that uses modality-specific encoders and contrastive learning to detect subtle mismatches at word granularity; (2) a phase-aware pipeline that extracts physically consistent, temporally faithful throat vibrations; and (3) a dual-stage strategy that combines signal-level onset detection and semantic-level coherence to align asynchronous radar and audio streams. Unlike liveness detection, which only confirms whether speech occurred, VoxAnchor verifies what was spoken through word-level content consistency, exposing localized edits that preserve identity and global authenticity cues. Extensive evaluations show that VoxAnchor achieves robust, fine-grained detection across diverse forgeries (editing, splicing, replay, deepfake) and conditions, with an overall EER of 0.017, low latency, and modest computational cost.
△ Less
Submitted 29 March, 2026;
originally announced March 2026.
-
SGS-Intrinsic: Semantic-Invariant Gaussian Splatting for Sparse-View Indoor Inverse Rendering
Authors:
Jiahao Niu,
Rongjia Zheng,
Wenju Xu,
Wei-Shi Zheng,
Qing Zhang
Abstract:
We present SGS-Intrinsic, an indoor inverse rendering framework that works well for sparse-view images. Unlike existing 3D Gaussian Splatting (3DGS) based methods that focus on object-centric reconstruction and fail to work under sparse view settings, our method allows to achieve high-quality geometry reconstruction and accurate disentanglement of material and illumination. The core idea is to con…
▽ More
We present SGS-Intrinsic, an indoor inverse rendering framework that works well for sparse-view images. Unlike existing 3D Gaussian Splatting (3DGS) based methods that focus on object-centric reconstruction and fail to work under sparse view settings, our method allows to achieve high-quality geometry reconstruction and accurate disentanglement of material and illumination. The core idea is to construct a dense and geometry-consistent Gaussian semantic field guided by semantic and geometric priors, providing a reliable foundation for subsequent inverse rendering. Building upon this, we perform material-illumination disentanglement by combining a hybrid illumination model and material prior to effectively capture illumination-material interactions. To mitigate the impact of cast shadows and enhance the robustness of material recovery, we introduce illumination-invariant material constraint together with a deshadowing model. Extensive experiments on benchmark datasets show that our method consistently improves both reconstruction fidelity and inverse rendering quality over existing 3DGS-based inverse rendering approaches. Our code is available at https://github.com/GrumpySloths/SGS_Intrinsic.github.io.
△ Less
Submitted 31 March, 2026; v1 submitted 29 March, 2026;
originally announced March 2026.
-
The First Issue Matters: Linking Task-Level Characteristics to Long-Term Newcomer Retention in OSS
Authors:
Yichen Hao,
Weiwei Xu,
Kai Gao,
Xiaofang Zhang
Abstract:
Sustaining newcomer participation is critical for the long-term health of open-source communities. Although prior research has explored various task recommendation approaches to help newcomers resolve their first-issue, these methods overlook how characteristics of first-issues may influence newcomers' long-term retention, limiting our understanding of whether initial success leads to sustained pa…
▽ More
Sustaining newcomer participation is critical for the long-term health of open-source communities. Although prior research has explored various task recommendation approaches to help newcomers resolve their first-issue, these methods overlook how characteristics of first-issues may influence newcomers' long-term retention, limiting our understanding of whether initial success leads to sustained participation and hindering effective onboarding design. In this paper, we conduct a large-scale empirical study to examine how first-issue characteristics affect newcomer retention. We combine predictive analysis, interpretability techniques, and causal inference to estimate the causal effects of issue characteristics on retention outcomes. The prediction task supports the interpretation and shows that interaction-related characteristics exhibit stronger associations with retention than intrinsic issue attributes. The causal analysis further reveals that issues reported by moderately experienced contributors, accompanied by moderate discussion intensity and participation from project members, and neutral or slightly negative comment sentiment, have higher retention potential. These findings provide actionable insights for OSS maintainers on designing issue management practices that better support long-term newcomer retention.
△ Less
Submitted 28 March, 2026;
originally announced March 2026.
-
Diachronic Modeling of Tonal Coherence on the Tonnetz Across Classical and Popular Repertoires
Authors:
Weilun Xu,
Edward Hall,
Martin Rohrmeier
Abstract:
How do different musical traditions achieve tonal coherence? Most computational measures to date have analysed tonal coherence in terms of a single dimension, whereas a multi-dimensional analyses have not been sufficiently explored. We propose a new model drawing on the concept of the Tonnetz -- we define two partially independent measures: \emph{tonal focus}, the concentration of pitch content ne…
▽ More
How do different musical traditions achieve tonal coherence? Most computational measures to date have analysed tonal coherence in terms of a single dimension, whereas a multi-dimensional analyses have not been sufficiently explored. We propose a new model drawing on the concept of the Tonnetz -- we define two partially independent measures: \emph{tonal focus}, the concentration of pitch content near a tonal center; and \emph{tonal connection}, the degree to which pitch content reflects structured intervallic pathways back to that center. Analyzing over 2,800 pieces from Western classical and popular traditions, we find that these traditions occupy overlapping yet distinguishable regions of the two-dimensional space. Popular music shows higher tonal focus, while classical music exhibits higher tonal connection. Our complementary measures ground the differences between different tonal styles in quantitative evidence, and offer interpretable dimensions for computational music analysis and controllable generation.
△ Less
Submitted 27 March, 2026;
originally announced March 2026.
-
Beyond Viewpoint Generalization: What Multi-View Demonstrations Offer and How to Synthesize Them for Robot Manipulation?
Authors:
Boyang Cai,
Qiwei Liang,
Jiawei Li,
Shihang Weng,
Zhaoxin Zhang,
Tao Lin,
Xiangyu Chen,
Wenjie Zhang,
Jiaqi Mao,
Weisheng Xu,
Bin Yang,
Jiaming Liang,
Junhao Cai,
Renjing Xu
Abstract:
Does multi-view demonstration truly improve robot manipulation, or merely enhance cross-view robustness? We present a systematic study quantifying the performance gains, scaling behavior, and underlying mechanisms of multi-view data for robot manipulation. Controlled experiments show that, under both fixed and randomized backgrounds, multi-view demonstrations consistently improve single-view polic…
▽ More
Does multi-view demonstration truly improve robot manipulation, or merely enhance cross-view robustness? We present a systematic study quantifying the performance gains, scaling behavior, and underlying mechanisms of multi-view data for robot manipulation. Controlled experiments show that, under both fixed and randomized backgrounds, multi-view demonstrations consistently improve single-view policy success and generalization. Performance varies non-monotonically with view coverage, revealing effective regimes rather than a simple "more is better" trend. Notably, multi-view data breaks the scaling limitation of single-view datasets and continues to raise performance ceilings after saturation. Mechanistic analysis shows that multi-view learning promotes manipulation-relevant visual representations, better aligns the action head with the learned feature distribution, and reduces overfitting. Motivated by the importance of multi-view data and its scarcity in large-scale robotic datasets, as well as the difficulty of collecting additional viewpoints in real world settings, we propose RoboNVS, a geometry-aware self-supervised framework that synthesizes novel-view videos from monocular inputs. The generated data consistently improves downstream policies in both simulation and real-world environments.
△ Less
Submitted 23 March, 2026;
originally announced March 2026.
-
Single-Pulse Study of the Pseudo-nulling Pulsar PSR J1820-0509 Based on FAST Observations
Authors:
Zefeng Tu,
Rushuang Zhao,
Hui Liu,
Biping Gong,
D. Li,
P. Wang,
Chenchen Miao,
Q. J. Zhi,
S. J. Dang,
S. D. Wang,
Q. Zhou,
Z. J. Zhang,
Xu Zhu,
R. W. Tian,
H. W. Xu,
Yi Zhou,
D. Y. Yan
Abstract:
Using two observations obtained with the Five-hundred-meter Aperture Spherical radio Telescope (FAST), we present a detailed single-pulse analysis of the high-nulling pulsar PSR J1820-0509. We measure an exceptionally high nulling fraction of approximately 81.78%, significantly exceeding previous estimates from Parkes observations. The single-pulse energy distribution exhibits a clear bimodal stru…
▽ More
Using two observations obtained with the Five-hundred-meter Aperture Spherical radio Telescope (FAST), we present a detailed single-pulse analysis of the high-nulling pulsar PSR J1820-0509. We measure an exceptionally high nulling fraction of approximately 81.78%, significantly exceeding previous estimates from Parkes observations. The single-pulse energy distribution exhibits a clear bimodal structure, consistent with classical nulling behavior. However, stacking the identified null pulses reveals a statistically significant residual profile above the noise level, indicating that the nulls correspond to a very weak emission state rather than a complete cessation of radio emission.
The pulsar shows clustered burst activities spanning several hundred rotation periods, with prominent quasi-periodicities at 1191 +/- 81 and 590 +/- 15 pulse periods in the two observations. Based on temporal clustering and integrated profile morphology, we identify three distinct emission modes (A, B, and C) and a pseudo-null state (D). These modes exhibit systematic differences in pulse morphology, polarization, and energy statistics. The pulse width-energy relations reveal clear transitions between low- and high-energy regimes. The energy distributions of Modes A and C are well described by lognormal functions, while Mode B follows a composite Gaussian-lognormal distribution.
These results suggest that the radio emission of PSR J1820-0509 is governed by multiple quasi-stable magnetospheric states. The presence of weak emission during pseudo-nulls, together with systematic mode-dependent variations, supports the interpretation that pulsar nulling reflects transitions between different magnetospheric activity levels rather than a complete shutdown of emission.
△ Less
Submitted 27 March, 2026;
originally announced March 2026.
-
Joint Sensing and Covert Communications in RIS-NOMA Systems
Authors:
Jiayi Lei,
Xidong Mu,
Tiankui Zhang,
Wenjun Xu,
Ping Zhang
Abstract:
A reconfigurable intelligent surface (RIS)-assisted non-orthogonal multiple access (NOMA) system is investigated, where the transmitter (Alice) is a dual functional radar communication (DFRC) base station (BS) that aims to sense the location of a potential warden (Willie), while simultaneously transmitting public and covert signals to the legitimate users, Carol and Bob, respectively. Both cases o…
▽ More
A reconfigurable intelligent surface (RIS)-assisted non-orthogonal multiple access (NOMA) system is investigated, where the transmitter (Alice) is a dual functional radar communication (DFRC) base station (BS) that aims to sense the location of a potential warden (Willie), while simultaneously transmitting public and covert signals to the legitimate users, Carol and Bob, respectively. Both cases of known and unknown Willie locations are considered. For the known-location case, assuming perfect channel state information (CSI) at Willie, a covert rate maximization is formulated with the joint optimization of active and passive beamforming, which is solved using successive convex approximation (SCA), penalty method, and semidefinite relaxation (SDR). For the unknown-location case, we propose to estimate Willie's location via radar sensing and develop a sensing-based imperfect CSI model. In particular, the CSI error uncertainty is bounded by the sensing accuracy, which is characterized by the Cramer-Rao bound (CRB). Subsequently, a robust communication rate maximization problem is formulated under the constraints on quality-of-service (QoS) of Carol, sensing accuracy, and covertness level. The Schur complement and S-procedure are employed to handle the non-convex constraints. Numerical results compare the system performance under the two cases, and demonstrate the significant covert performance superiority of the sensing-based imperfect CSI model and NOMA over the general norm-bounded imperfect CSI model and the orthogonal multiple access scheme. Furthermore, the dual yet contradictory effects of sensing on covert communications are revealed. It is also found that Alice primarily utilizes Carol's signal for sensing, while allocating almost all of Bob's signal for communication.
△ Less
Submitted 27 March, 2026;
originally announced March 2026.
-
$θ$ Angle and Axial Anomaly in Holographic QCD
Authors:
Csaba Csáki,
Eric Kuflik,
Wei Xue,
Taewook Youn
Abstract:
We present a bottom-up holographic description of the QCD $θ$-vacuum and the $U(1)_A$ anomaly in five dimensions. The multi-branched $θ$-vacuum structure emerges geometrically from a higher-dimensional gauge field, while the axial anomaly is realized through a Stückelberg coupling that is dual to a Chern-Simons term. In this framework, the $η'$ meson appears as a zero mode of bulk fluctuations, an…
▽ More
We present a bottom-up holographic description of the QCD $θ$-vacuum and the $U(1)_A$ anomaly in five dimensions. The multi-branched $θ$-vacuum structure emerges geometrically from a higher-dimensional gauge field, while the axial anomaly is realized through a Stückelberg coupling that is dual to a Chern-Simons term. In this framework, the $η'$ meson appears as a zero mode of bulk fluctuations, and its mass arises from the anomaly-induced Stückelberg term. The construction provides a transparent holographic derivation of the anomaly contribution to the $η'$ mass and naturally reproduces the Witten-Veneziano relation between the $η'$ mass and the Yang-Mills topological susceptibility.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference
Authors:
Sk Miraj Ahmed,
Xi Yu,
Yunqi Li,
Yuewei Lin,
Wei Xu
Abstract:
Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcodes, or both. Existing multimodal methods often treat taxonomy as a flat label spa…
▽ More
Accurate biodiversity identification from large-scale field data is a foundational problem with direct impact on ecology, conservation, and environmental monitoring. In practice, the core task is taxonomic prediction - inferring order, family, genus, or species from imperfect inputs such as specimen images, DNA barcodes, or both. Existing multimodal methods often treat taxonomy as a flat label space and therefore fail to encode the hierarchical structure of biological classification, which is critical for robustness under noise and missing modalities. We present two end-to-end variants for hierarchy-aware multimodal learning: CLiBD-HiR, which introduces Hierarchical Information Regularization (HiR) to shape embedding geometry across taxonomic levels, yielding structured and noise-robust representations; and CLiBD-HiR-Fuse, which additionally trains a lightweight fusion predictor that supports image-only, DNA-only, or joint inference and is resilient to modality corruption. Across large-scale biodiversity benchmarks, our approach improves taxonomic classification accuracy by over 14 percent compared to strong multimodal baselines, with particularly large gains under partial and corrupted DNA conditions. These results highlight that explicitly encoding biological hierarchy, together with flexible fusion, is key for practical biodiversity foundation models.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following
Authors:
Tianjun Pan,
Xuan Lin,
Wenyan Yang,
Qianyu He,
Shisong Chen,
Licai Qi,
Wanqing Xu,
Hongwei Feng,
Bo Xu,
Yanghua Xiao
Abstract:
Rubric-based evaluation has become a prevailing paradigm for evaluating instruction following in large language models (LLMs). Despite its widespread use, the reliability of these rubric-level evaluations remains unclear, calling for meta-evaluation. However, prior meta-evaluation efforts largely focus on the response level, failing to assess the fine-grained judgment accuracy that rubric-based ev…
▽ More
Rubric-based evaluation has become a prevailing paradigm for evaluating instruction following in large language models (LLMs). Despite its widespread use, the reliability of these rubric-level evaluations remains unclear, calling for meta-evaluation. However, prior meta-evaluation efforts largely focus on the response level, failing to assess the fine-grained judgment accuracy that rubric-based evaluation relies on. To bridge this gap, we introduce RubricEval. Our benchmark features: (1) the first rubric-level meta-evaluation benchmark for instruction following, (2) diverse instructions and responses spanning multiple categories and model sources, and (3) a substantial set of 3,486 quality-controlled instances, along with Easy/Hard subsets that better differentiates judge performance. Our experiments reveal that rubric-level judging remains far from solved: even GPT-4o, a widely adopted judge in instruction-following benchmarks, achieves only 55.97% on Hard subset. Considering evaluation paradigm, rubric-level evaluation outperforms checklist-level, explicit reasoning improves accuracy, and both together reduce inter-judge variance. Through our established rubric taxonomy, we further identify common failure modes and offer actionable insights for reliable instruction-following evaluation.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
Beam Test Characterization of Silicon Microstrip Detector Flight-Model Ladders for the AMS-02 Upgrade
Authors:
Dexing Miao,
Giovanni Ambrosi,
Mattia Barbanera,
Baasansuren Batsukh,
Hengyi Cai,
Mengke Cai,
Xudong Cai,
Yuman Cai,
Yuan-Hann Chang,
Shanzhen Chen,
Hsin-Yi Chou,
Xingzhu Cui,
Mingyi Dong,
Matteo Duranti,
Ke Gong,
Mingjie Feng,
Valerio Formato,
Yisheng Fu,
Daojin Hong,
Maria Ionica,
Xiaojie Jiang,
Yaozu Jiang,
Liangchenglong Jin,
Shengjie Jin,
Vladimir Koutsenko
, et al. (34 additional authors not shown)
Abstract:
The AMS-02 experiment plans to install a new silicon microstrip tracker layer (Layer-0) on top of the existing detector, increasing the cosmic-ray acceptance by a factor of 3. Layer-0 employs a design in which multiple silicon microstrip detectors (SSDs) are connected in series to form long detector ladders. We present a detailed performance study of the flight-model ladders using a 350~GeV mixed…
▽ More
The AMS-02 experiment plans to install a new silicon microstrip tracker layer (Layer-0) on top of the existing detector, increasing the cosmic-ray acceptance by a factor of 3. Layer-0 employs a design in which multiple silicon microstrip detectors (SSDs) are connected in series to form long detector ladders. We present a detailed performance study of the flight-model ladders using a 350~GeV mixed hadron beam at the CERN SPS. The study focuses on the following aspects: (i) the performance of ladders with different numbers of SSDs, for which the intrinsic spatial resolution at normal incidence varies from $9.5~μ\mathrm{m}$ to $11.4~μ\mathrm{m}$ for ladders composed of 8 to 12 SSDs; (ii) the response consistency for particles impacting on the \emph{Head} and \emph{Tail} regions of the ladder; and (iii) the dependence of the detector performance on the particle incidence angle.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
A Telescope System for Charge and Position Measurement of High Energy Nuclei
Authors:
Dexing Miao,
Zhiyu Xiang,
Giovanni Ambrosi,
Mattia Barbanera,
Baasansuren Batsukh,
Mengke Cai,
Xudong Cai,
Yuan-Hann Chang,
Shanzhen Chen,
Hsin-Yi Chou,
Xingzhu Cui,
Mingyi Dong,
Matteo Duranti,
Ke Gong,
Mingjie Feng,
Valerio Formato,
Daojin Hong,
Maria Ionica,
Xiaojie Jiang,
Yaozu Jiang,
Liangchenglong Jin,
Shengjie Jin,
Vladimir Koutsenko,
Tiange Li,
Zuhao Li
, et al. (21 additional authors not shown)
Abstract:
A high-granularity telescope system with a large sensitive area and low material budget has been developed for high-energy heavy ion beam tests. The telescope consists of nine layers of silicon microstrip detectors (SSDs), whose performance was validated through a heavy ion beam test at the CERN SPS. A hybrid machine learning algorithm is proposed to address the challenges of nuclear charge measur…
▽ More
A high-granularity telescope system with a large sensitive area and low material budget has been developed for high-energy heavy ion beam tests. The telescope consists of nine layers of silicon microstrip detectors (SSDs), whose performance was validated through a heavy ion beam test at the CERN SPS. A hybrid machine learning algorithm is proposed to address the challenges of nuclear charge measurement with SSDs. The system achieves a spatial resolution of $\mathcal{O}(1) \,$\SI{}{\micro\metre} and a charge resolution better than 0.16 charge units for nuclei from $Z = 1$ to $Z = 29$, with a sensitive area of $8 \times 8 \, \mathrm{cm}^2$. To the best of our knowledge, this represents the most precise charge and spatial resolution simultaneously achieved by a silicon telescope to date.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
Authors:
Yicheng Zou,
Dongsheng Zhu,
Lin Zhu,
Tong Zhu,
Yunhua Zhou,
Peiheng Zhou,
Xinyu Zhou,
Dongzhan Zhou,
Zhiwang Zhou,
Yuhao Zhou,
Bowen Zhou,
Zhanping Zhong,
Zhijie Zhong,
Haiteng Zhao,
Penghao Zhao,
Xiaomeng Zhao,
Zhiyuan Zhao,
Yechen Zhang,
Jin Zhang,
Wenwei Zhang,
Hongjie Zhang,
Zhuo Zhang,
Wenlong Zhang,
Bo Zhang,
Chao Zhang
, et al. (152 additional authors not shown)
Abstract:
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertis…
▽ More
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, including chemistry, materials, life sciences, and earth sciences. Achieving this massive scale is made possible by the robust infrastructure support of XTuner and LMDeploy, which facilitates highly efficient Reinforcement Learning (RL) training at the 1-trillion parameter level while ensuring strict precision consistency between training and inference. By seamlessly integrating these advancements, Intern-S1-Pro further fortifies the fusion of general and specialized intelligence, working as a Specializable Generalist, demonstrating its position in the top tier of open-source models for general capabilities, while outperforming proprietary models in the depth of specialized scientific tasks.
△ Less
Submitted 2 April, 2026; v1 submitted 26 March, 2026;
originally announced March 2026.
-
Unbiased Multimodal Reranking for Long-Tail Short-Video Search
Authors:
Wenyi Xu,
Feiran Zhu,
Songyang Li,
Renzhe Zhou,
Chao Zhang,
Chenglei Dai,
Yuren Mao,
Yunjun Gao,
Yi Zhang
Abstract:
Kuaishou serving hundreds of millions of searches daily, the quality of short-video search is paramount. However, it suffers from a severe Matthew effect on long-tail queries: sparse user behavior data causes models to amplify low-quality content such as clickbait and shallow content. The recent advancements in Large Language Models (LLMs) offer a new paradigm, as their inherent world knowledge pr…
▽ More
Kuaishou serving hundreds of millions of searches daily, the quality of short-video search is paramount. However, it suffers from a severe Matthew effect on long-tail queries: sparse user behavior data causes models to amplify low-quality content such as clickbait and shallow content. The recent advancements in Large Language Models (LLMs) offer a new paradigm, as their inherent world knowledge provides a powerful mechanism to assess content quality, agnostic to sparse user interactions. To this end, we propose a LLM-driven multimodal reranking framework, which estimates user experience without real user behavior. The approach involves a two-stage training process: the first stage uses multimodal evidence to construct high-quality annotations for supervised fine-tuning, while the second stage incorporates pairwise preference optimization to help the model learn partial orderings among candidates. At inference time, the resulting experience scores are used to promote high-quality but underexposed videos in reranking, and further guide page-level optimization through reinforcement learning. Experiments show that the proposed method achieves consistent improvements over strong baselines in offline metrics including AUC, NDCG@K, and human preference judgement. An online A/B test covering 15\% of traffic further demonstrates gains in both user experience and consumption metrics, confirming the practical value of the approach in long-tail video search scenarios.
△ Less
Submitted 30 March, 2026; v1 submitted 25 March, 2026;
originally announced March 2026.