A regularized truncated finite element method for degenerate parabolic stochastic PDE on non-compact graph

Authors: Jianbo Cui, Mihály Kovács, Derui Sheng

Abstract: We study the numerical approximation of a class of degenerate parabolic stochastic partial differential equations on non-compact metric graphs, which naturally arise in the asymptotic analysis of Hamiltonian flows under small noise perturbations. The numerical discretization of these equations faces several challenges, including the non-compactness of the graph, the degeneracy of the differential… ▽ More We study the numerical approximation of a class of degenerate parabolic stochastic partial differential equations on non-compact metric graphs, which naturally arise in the asymptotic analysis of Hamiltonian flows under small noise perturbations. The numerical discretization of these equations faces several challenges, including the non-compactness of the graph, the degeneracy of the differential operator near vertices, and the non-symmetry of the associated bilinear form. To address these issues, we propose a multi-step numerical strategy combining graph truncation, localized coefficient regularization, and finite element spatial discretization. By incorporating localization techniques, tightness arguments, and resolvent estimates, we establish the strong convergence of the proposed scheme in a weighted $L^2$-space. Our results provide a systematic methodology that is potentially extensible to more general non-compact graphs and degenerate operators. △ Less

Submitted 13 April, 2026; originally announced April 2026.

arXiv:2604.09054 [pdf, ps, other]

HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation

Authors: Jian Zhu, Jianwei Cui, Shihao Chen, Yubang Zhang, Cheng Luo

Abstract: We present HAFM, a system that generates instrumental music audio to accompany input vocals. Given isolated singing voice, HAFM produces a coherent instrumental accompaniment that can be directly mixed with the input to create complete music. We propose three key innovations over prior work: (1) a dual-rate codec tokenization scheme using HuBERT semantic tokens at 50\,Hz for vocals and EnCodec aco… ▽ More We present HAFM, a system that generates instrumental music audio to accompany input vocals. Given isolated singing voice, HAFM produces a coherent instrumental accompaniment that can be directly mixed with the input to create complete music. We propose three key innovations over prior work: (1) a dual-rate codec tokenization scheme using HuBERT semantic tokens at 50\,Hz for vocals and EnCodec acoustic tokens at 75\,Hz for instrumentals, enabling time-aligned yet rate-independent modeling; (2) a three-stage hierarchical autoregressive architecture (semantic to coarse acoustic to fine acoustic) with interleaved multi-codebook prediction and classifier-free guidance; and (3) modern Transformer design choices including QK-norm, GEGLU activations, RMSNorm, and T5-style relative position bias for improved training stability and sequence generalization. Experiments on MUSDB18 demonstrate that HAFM achieves a Fréchet Audio Distance (FAD) of 2.08 on isolated vocal inputs, outperforming retrieval baselines and matching prior state-of-the-art systems with fewer parameters. The source code is available at https://github.com/HackerHyper/HAFM. △ Less

Submitted 12 April, 2026; v1 submitted 10 April, 2026; originally announced April 2026.

Comments: Music Accompaniment Generation, Music Foundation Model

arXiv:2604.08698 [pdf, ps, other]

EvoLen: Evolution-Guided Tokenization for DNA Language Model

Authors: Nan Huang, Xiaoxiao Zhou, Junxia Cui, Mario Tapia-Pacheco, Tiffany Amariuta, Yang Li, Jingbo Shang

Abstract: Tokens serve as the basic units of representation in DNA language models (DNALMs), yet their design remains underexplored. Unlike natural language, DNA lacks inherent token boundaries or predefined compositional rules, making tokenization a fundamental modeling decision rather than a naturally specified one. While existing approaches like byte-pair encoding (BPE) excel at capturing token structure… ▽ More Tokens serve as the basic units of representation in DNA language models (DNALMs), yet their design remains underexplored. Unlike natural language, DNA lacks inherent token boundaries or predefined compositional rules, making tokenization a fundamental modeling decision rather than a naturally specified one. While existing approaches like byte-pair encoding (BPE) excel at capturing token structures that reflect human-generated linguistic regularities, DNA is organized by biological function and evolutionary constraint rather than linguistic convention. We argue that DNA tokenization should prioritize functional sequence patterns like regulatory motifs-short, recurring segments under evolutionary constraint and typically preserved across species. We incorporate evolutionary information directly into the tokenization process through EvoLen, a tokenizer that combines evolutionary stratification with length-aware decoding to better preserve motif-scale functional sequence units. EvoLen uses cross-species evolutionary signals to group DNA sequences, trains separate BPE tokenizers on each group, merges the resulting vocabularies via a rule prioritizing preserved patterns, and applies length-aware decoding with dynamic programming. Through controlled experiments, EvoLen improves the preservation of functional sequence patterns, differentiation across genomic contexts, and alignment with evolutionary constraint, while matching or outperforming standard BPE across diverse DNALM benchmarks. These results demonstrate that tokenization introduces a critical inductive bias and that incorporating evolutionary information yields more biologically meaningful and interpretable sequence representations. △ Less

Submitted 9 April, 2026; originally announced April 2026.

arXiv:2604.04198 [pdf, ps, other]

DriveVA: Video Action Models are Zero-Shot Drivers

Authors: Mengmeng Liu, Diankun Zhang, Jiuming Liu, Jianfeng Cui, Hongwei Xie, Guang Chen, Hangjun Ye, Michael Ying Yang, Francesco Nex, Hao Cheng

Abstract: Generalization is a central challenge in autonomous driving, as real-world deployment requires robust performance under unseen scenarios, sensor domains, and environmental conditions. Recent world-model-based planning methods have shown strong capabilities in scene understanding and multi-modal future prediction, yet their generalization across datasets and sensor configurations remains limited. I… ▽ More Generalization is a central challenge in autonomous driving, as real-world deployment requires robust performance under unseen scenarios, sensor domains, and environmental conditions. Recent world-model-based planning methods have shown strong capabilities in scene understanding and multi-modal future prediction, yet their generalization across datasets and sensor configurations remains limited. In addition, their loosely coupled planning paradigm often leads to poor video-trajectory consistency during visual imagination. To overcome these limitations, we propose DriveVA, a novel autonomous driving world model that jointly decodes future visual forecasts and action sequences in a shared latent generative process. DriveVA inherits rich priors on motion dynamics and physical plausibility from well-pretrained large-scale video generation models to capture continuous spatiotemporal evolution and causal interaction patterns. To this end, DriveVA employs a DiT-based decoder to jointly predict future action sequences (trajectories) and videos, enabling tighter alignment between planning and scene evolution. We also introduce a video continuation strategy to strengthen long-duration rollout consistency. DriveVA achieves an impressive closed-loop performance of 90.9 PDM score on the challenge NAVSIM. Extensive experiments also demonstrate the zero-shot capability and cross-domain generalization of DriveVA, which reduces average L2 error and collision rate by 78.9% and 83.3% on nuScenes and 52.5% and 52.4% on the Bench2drive built on CARLA v2 compared with the state-of-the-art world-model-based planner. △ Less

Submitted 5 April, 2026; originally announced April 2026.

arXiv:2604.04135 [pdf, ps, other]

NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

Authors: Shuhong Liu, Chenyu Bao, Ziteng Cui, Xuangeng Chu, Bin Ren, Lin Gu, Xiang Chen, Mingrui Li, Long Ma, Marcos V. Conde, Radu Timofte, Yun Liu, Ryo Umagami, Tomohiro Hashimoto, Zijian Hu, Yuan Gan, Tianhan Xu, Yusuke Kurose, Tatsuya Harada, Junwei Yuan, Gengjia Chang, Xining Ge, Mache You, Qida Cao, Zeliang Li , et al. (81 additional authors not shown)

Abstract: This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participa… ▽ More This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participants registered for the competition, of whom 33 teams submitted valid results. We thoroughly evaluate the submitted approaches against state-of-the-art baselines, revealing significant progress in 3D reconstruction under adverse conditions. Our analysis highlights shared design principles among top-performing methods and provides insights into effective strategies for handling 3D scene degradation. △ Less

Submitted 5 April, 2026; originally announced April 2026.

arXiv:2604.04007 [pdf, ps, other]

Tits Alternative in groups with proper product actions on proper Gromov-hyperbolic spaces

Authors: Jiaqi Cui, Renxing Wan

Abstract: In this paper, we study groups with property (PPH), i.e., there exist finitely many proper Gromov-hyperbolic spaces $X_1,\ldots, X_l$ on which $G$ acts cocompactly such that the diagonal action of $G$ on the $\ell^1$-product $\prod_{i=1}^lX_i$ is proper. We show that any finitely generated subgroup of a finitely generated group with property (PPH) either is amenable or contains $F_2$. Furthermor… ▽ More In this paper, we study groups with property (PPH), i.e., there exist finitely many proper Gromov-hyperbolic spaces $X_1,\ldots, X_l$ on which $G$ acts cocompactly such that the diagonal action of $G$ on the $\ell^1$-product $\prod_{i=1}^lX_i$ is proper. We show that any finitely generated subgroup of a finitely generated group with property (PPH) either is amenable or contains $F_2$. Furthermore, we study groups with property (PPT), i.e., groups with property (PPH) so that $X_1,\cdots,X_l$ are all proper quasi-trees. We show that any finitely generated subgroup of a finitely generated group with property (PPT) either is virtually (locally-finite)-by-$\mathbb{Z}^n$ or contains $F_2$. Additionally, we establish that for a non-elementary hyperbolic group $G$, $G$ admits a proper diagonal action on a finite product of regular trees if and only if $G$ has property (PPT). This result transforms a question posed by Button \cite{But19} into the problem of whether every non-elementary hyperbolic group has property (PPT). △ Less

Submitted 5 April, 2026; originally announced April 2026.

Comments: 17pages, 1 figure. This paper is based on a revised version of Part I of our previous preprint 2505.09454v1

MSC Class: 20F65

arXiv:2604.03509 [pdf]

Applications of Large Language Models in Radiation Oncology: From Workflow Automation to Clinical Intelligence

Authors: Yuzhen Ding, Jason Holmes, Yuexing Hao, Zhengliang Liu, Peilong Wang, Junjie Cui, Meiyun Cao, Caiwen Jiang, Shuoyang Wei, Lin Zhao, Chenbin Liu, Lian Zhang, Yunze Yang, Tianming Liu, Wei Liu

Abstract: Large language models (LLMs) have emerged as transformative tools in medicine, with strong capabilities in language understanding, reasoning, and structured information extraction. Radiation oncology is particularly well suited for LLM integration due to its data-intensive workflows, reliance on structured guidelines, and documentation burden. This review summarizes recent applications, including… ▽ More Large language models (LLMs) have emerged as transformative tools in medicine, with strong capabilities in language understanding, reasoning, and structured information extraction. Radiation oncology is particularly well suited for LLM integration due to its data-intensive workflows, reliance on structured guidelines, and documentation burden. This review summarizes recent applications, including domain-specific fine-tuning for decision support, automated nomenclature standardization, registry curation using autonomous LLM agents, and protocol-aware radiotherapy plan evaluation using modular retrieval-augmented generation (RAG). Additional applications include patient safety analysis through incident classification and root cause analysis, electronic health record (EHR)-integrated communication, CT simulation order summarization, daily readiness briefings, and patient education systems. Emerging multimodal approaches enable context-aware contouring, while early studies show LLMs can assist treatment planning by interpreting dosimetric feedback. Together, these advances highlight a shift toward clinically grounded, auditable, and workflow-integrated AI systems that enhance efficiency, safety, and patient engagement. △ Less

Submitted 3 April, 2026; originally announced April 2026.

arXiv:2604.02674 [pdf, ps, other]

Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems

Authors: Kavana Venkatesh, Jiaming Cui

Abstract: Large Language Model (LLM) multi-agent systems are increasingly deployed as interacting agent societies, yet scaling these systems often yields diminishing or unstable returns, the causes of which remain poorly understood. We present the first large-scale empirical study of coordination dynamics in LLM-based multi-agent systems, introducing an atomic event-level formulation that reconstructs reaso… ▽ More Large Language Model (LLM) multi-agent systems are increasingly deployed as interacting agent societies, yet scaling these systems often yields diminishing or unstable returns, the causes of which remain poorly understood. We present the first large-scale empirical study of coordination dynamics in LLM-based multi-agent systems, introducing an atomic event-level formulation that reconstructs reasoning as cascades of coordination. Analyzing over 1.5 Million interactions across tasks, topologies, and scales, we uncover three coupled laws: coordination follows heavy-tailed cascades, concentrates via preferential attachment into intellectual elites, and produces increasingly frequent extreme events as system size grows. We show that these effects are coupled through a single structural mechanism: an integration bottleneck, in which coordination expansion scales with system size while consolidation does not, producing large but weakly integrated reasoning processes. To test this mechanism, we introduce Deficit-Triggered Integration (DTI), which selectively increases integration under imbalance. DTI improves performance precisely where coordination fails, without suppressing large-scale reasoning. Together, our results establish quantitative laws of collective cognition and identify coordination structure as a fundamental, previously unmeasured axis for understanding and improving scalable multi-agent intelligence. △ Less

Submitted 2 April, 2026; originally announced April 2026.

arXiv:2604.00368 [pdf, ps, other]

TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving

Authors: Feng Ren, Ruoyu Qin, Teng Ma, Shangming Cai, Zheng Liu, Chao Lei, Dejiang Zhu, Ke Yang, Zheming Li, Jialei Cui, Weixiao Huang, Yikai Zhao, Yineng Zhang, Hao Wu, Xiang Gao, Yuhao Fu, Jinlei Jiang, Yongwei Wu, Mingxing Zhang

Abstract: Modern GPU clusters are built upon a complex hierarchy of heterogeneous interconnects, ranging from multi-rail RDMA to proprietary fabrics such as Multi-Node NVLink and Ascend UB. Orchestrating these diverse links effectively remains a critical challenge in disaggregated LLM serving. Operating Mooncake TE on thousands of GPUs exposed a critical limitation shared by existing frameworks: imperative,… ▽ More Modern GPU clusters are built upon a complex hierarchy of heterogeneous interconnects, ranging from multi-rail RDMA to proprietary fabrics such as Multi-Node NVLink and Ascend UB. Orchestrating these diverse links effectively remains a critical challenge in disaggregated LLM serving. Operating Mooncake TE on thousands of GPUs exposed a critical limitation shared by existing frameworks: imperative, statically bound path selection. This rigidity forces engines to rely on state-blind striping that ignores congestion signals, creating communication silos, wasting multi-rail bandwidth due to head-of-line blocking, and leading to operational fragility where routine faults require manual intervention. We present TENT, a data-movement engine that decouples transfer intent from physical execution. Instead of locking workloads to fixed backends, TENT unifies heterogeneous interconnects into a single dynamic resource pool. Applications simply declare transfer intents, while TENT dynamically decomposes elephant flows into fine-grained slices and "sprays" them across links based on instantaneous link quality. This telemetry-driven orchestration eliminates head-of-line blocking and enables transparent, sub-50 ms self-healing by rerouting slices around failures without application logic. TENT serves as the production data plane for LLM inference and RL pipelines at multiple industrial sites. Our evaluation on H800 HGX clusters shows that TENT outperforms state-of-the-art baselines, including Mooncake TE, NIXL, and UCCL. In LLM inference with SGLang HiCache, TENT achieves up to 1.36x higher throughput and 26% lower P90 TTFT than Mooncake TE. In RL pipelines, TENT accelerates parameter updates in Moonshot Checkpoint Engine by 20-26%. △ Less

Submitted 31 March, 2026; originally announced April 2026.

arXiv:2603.24437 [pdf, ps, other]

Search for the decay $B^+ \rightarrow K^+τ^+τ^-$ using data from the Belle and Belle II experiments

Authors: Belle, Belle II Collaborations, :, M. Abumusabh, I. Adachi, K. Adamczyk, A. Aggarwal, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, V. Babu , et al. (414 additional authors not shown)

Abstract: We report a search for the rare decay $B^{+} \rightarrow K^{+} τ^{+} τ^{-}$ using $1.2 \times 10^9$ $Υ(4S)$ mesons produced near threshold in electron-positron collisions and collected by the Belle and Belle~II experiments. We fully reconstruct the hadronic decay of one $B$ meson produced in the $Υ(4S)\rightarrow B^{+} B^{-}$ decay, and search for $B^{\pm}\rightarrow K^{\pm} τ^{+}τ^{-}$ candidates… ▽ More We report a search for the rare decay $B^{+} \rightarrow K^{+} τ^{+} τ^{-}$ using $1.2 \times 10^9$ $Υ(4S)$ mesons produced near threshold in electron-positron collisions and collected by the Belle and Belle~II experiments. We fully reconstruct the hadronic decay of one $B$ meson produced in the $Υ(4S)\rightarrow B^{+} B^{-}$ decay, and search for $B^{\pm}\rightarrow K^{\pm} τ^{+}τ^{-}$ candidates among the remaining collision products, reconstructing a charged kaon and leptonic decays of the $τ$ leptons. We optimize the selection for best sensitivity and look for an excess over background at low values of the residual energy detected in the calorimeter after full event reconstruction. We observe no significant excess and set the limit $\mathcal{B}(B^{+}\rightarrow K^{+}τ^{+}τ^{-})< 0.56\times 10^{-3}$ at the 90% confidence level, improving on the only previous result by a factor of four. △ Less

Submitted 25 March, 2026; originally announced March 2026.

Report number: Belle II preprint 2026-003, KEK preprint 2025-43

arXiv:2603.21579 [pdf]

TERS-ABNet: A Deep Learning Approach for Automated Single-Molecule Structure Reconstruction with Atomic Precision from TERS Mapping

Authors: Jie Cui, Yao Zhang, Yang Zhang, Yi Luo, Zhen-Chao Dong

Abstract: Determining the chemical structure for a single molecule on surface from spectroscopic data represents a challenging high-dimensional inverse problem. Tip-enhanced Raman spectroscopy (TERS) enables chemically specific imaging of single molecules with sub-nanometer spatial resolution, yet reconstructing complete molecular structures from TERS maps remains difficult owing to the ambiguous vibrationa… ▽ More Determining the chemical structure for a single molecule on surface from spectroscopic data represents a challenging high-dimensional inverse problem. Tip-enhanced Raman spectroscopy (TERS) enables chemically specific imaging of single molecules with sub-nanometer spatial resolution, yet reconstructing complete molecular structures from TERS maps remains difficult owing to the ambiguous vibrational signatures and reliance on expert interpretation. Here, we introduce TERS-ABNet, a deep-learning framework that formulates single-molecule structure determination from spectroscopic images as an image-to-graph inference task. Using a "two-track" architecture, the model jointly predicts probabilistic atom and bond maps, enabling direct construction of explicit atom-bond graphs without relying on predefined chemical rules. Trained on simulated datasets, TERS-ABNet achieves about 94% atom-type classification accuracy (with a mean coordinate error of about 0.23 Å), enabling to reliably recovering molecular connectivity and fully reconstruct single-molecule structure from its TERS maps. The framework generalizes across varying spatial resolutions and structural complexity through transfer learning, and successfully reconstructs the atomic structure of a single porphyrin molecule from experimental TERS data. This work establishes a general deep-learning strategy for inferring explicit atom-bond graph representations from high-dimensional spectroscopic imaging data, providing a new pathway towards automated molecular structure determination in nanoscale characterization. △ Less

Submitted 23 March, 2026; originally announced March 2026.

arXiv:2603.21521 [pdf]

Ultrafast microwave sensing and automatic recognition of dynamic objects in open world using programmable surface plasmonic neural networks

Authors: Qian Ma, Ze Gu, Zi Rui Feng, Qian Wen Wu, Yu Ming Ning, Zhi Qiao Han, Rui Si Li, Xinxin Gao, Tie Jun Cui

Abstract: The evolution toward next-generation intelligent sensing requires microwave systems to move beyond static detection and achieve high-speed and adaptive perception of dynamic scenes. However, the existing microwave sensing systems have bottlenecks owing to their sequential digital processing chain, limiting the refresh rates to hundreds of hertz, while the existing integrated microwave processors a… ▽ More The evolution toward next-generation intelligent sensing requires microwave systems to move beyond static detection and achieve high-speed and adaptive perception of dynamic scenes. However, the existing microwave sensing systems have bottlenecks owing to their sequential digital processing chain, limiting the refresh rates to hundreds of hertz, while the existing integrated microwave processors are lack of programmable and scalable capabilities for robust and open-world deployment. To break the bottlenecks, here we report a programmable surface plasmonic neural network (P-SPNN) that enables real-time microwave sensing and automatic recognition of dynamic objects in open-world environment. With a perception latency of 25 ns and a refresh rate exceeding 10 kHz, the P-SPNN system operates more than two orders of magnitude faster than the conventional millimeter-wave sensors, while achieving an energy efficiency of 17 TOPS per W. With 288 programmable phase-modulated neurons, we demonstrate real time and robust classification of persons and cars with 91-97% accuracy in the open road scenarios. By further integrating beam-scanning function, P-SPNN enables multi-dimensional spatial temporal frequency sensing without the digital preprocessing. These results establish P-SPNN as a programmable, scalable, and low-power platform for high-speed perception tasks in realistic world, with broad implications for autonomous driving, intelligent sensing, and next-generation artificial intelligence hardware. △ Less

Submitted 22 March, 2026; originally announced March 2026.

arXiv:2603.20193 [pdf, ps, other]

From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

Authors: Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Jing-Hao Xue, Hao Li, Salman Khan, Zhiqiang Shen

Abstract: Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are untouched or only trivially modified, while subtle yet consequential edits outside the mask are treated as natural. We reformulate VLM image tampering from coarse region labels to a pixel-grounded, meaning and language-aware task. First, we introduc… ▽ More Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true edit signal: many pixels inside a mask are untouched or only trivially modified, while subtle yet consequential edits outside the mask are treated as natural. We reformulate VLM image tampering from coarse region labels to a pixel-grounded, meaning and language-aware task. First, we introduce a taxonomy spanning edit primitives (replace/remove/splice/inpaint/attribute/colorization, etc.) and their semantic class of tampered object, linking low-level changes to high-level understanding. Second, we release a new benchmark with per-pixel tamper maps and paired category supervision to evaluate detection and classification within a unified protocol. Third, we propose a training framework and evaluation metrics that quantify pixel-level correctness with localization to assess confidence or prediction on true edit intensity, and further measure tamper meaning understanding via semantics-aware classification and natural language descriptions for the predicted regions. We also re-evaluate the existing strong segmentation/localization baselines on recent strong tamper detectors and reveal substantial over- and under-scoring using mask-only metrics, and expose failure modes on micro-edits and off-mask changes. Our framework advances the field from masks to pixels, meanings and language descriptions, establishing a rigorous standard for tamper localization, semantic classification and description. Code and benchmark data are available at https://github.com/VILA-Lab/PIXAR. △ Less

Submitted 20 March, 2026; originally announced March 2026.

Comments: Code and data at: https://github.com/VILA-Lab/PIXAR (Accepted in CVPR 2026 Findings, but not opted in)

arXiv:2603.18844 [pdf, ps, other]

The multi-objective portfolio model for oil and gas exploration drilling projects selection and its operator-enhanced NSGA-II based solution

Authors: Chao Min, Junyi Cui, Stanisław Migórski, Yonglan Xie, Qingxia Zhang, Jun Peng

Abstract: Drilling investment is pivotal to operational planning in oil and gas (O\&G) exploration. Conventional deployment relies heavily on fragmented expert assessments of geological and economic factors, with limited integration ability of information. As the tool of portfolio show strong potential for mitigating uncertainty and selecting superior drilling plans, this study develops a multi-objective me… ▽ More Drilling investment is pivotal to operational planning in oil and gas (O\&G) exploration. Conventional deployment relies heavily on fragmented expert assessments of geological and economic factors, with limited integration ability of information. As the tool of portfolio show strong potential for mitigating uncertainty and selecting superior drilling plans, this study develops a multi-objective mean-variance portfolio model that accounts for geological-parameter uncertainty, enabling an effective risk-return trade-off and optimal selection. First, the probabilistic distribution of geological-parameters for prospect-list projects is obtained through expert-elicited priors. And considering the selection of the drilling projects as a portfolio, an optimization model is formulated jointly to describe the return and risk of short-term plan, under different constraints. Second, an improved OE-NSGA-II algorithm is proposed specifically for this model, in which (1) a directional crossover operator is designed to embed improving directions in objective space-derived from dominance and objective differences-into recombination, and (2) a structure-aware mutation operator is designed to prioritize high-utility bit flips via probabilistic sampling with feasibility repair, thus improving the search ability for superior Pareto solutions. Finally, using the case of 2023 exploration drilling deployment for verification, and then apply the validated method to the 2024 deployment to support decision-making. The results indicate that the proposed approach offers a reusable solution for drilling portfolio optimization in O\&G exploration. △ Less

Submitted 19 March, 2026; originally announced March 2026.

arXiv:2603.16447 [pdf, ps, other]

ProgressiveAvatars: Progressive Animatable 3D Gaussian Avatars

Authors: Kaiwen Song, Jinkai Cui, Juyong Zhang

Abstract: In practical real-time XR and telepresence applications, network and computing resources fluctuate frequently. Therefore, a progressive 3D representation is needed. To this end, we propose ProgressiveAvatars, a progressive avatar representation built on a hierarchy of 3D Gaussians grown by adaptive implicit subdivision on a template mesh. 3D Gaussians are defined in face-local coordinates to remai… ▽ More In practical real-time XR and telepresence applications, network and computing resources fluctuate frequently. Therefore, a progressive 3D representation is needed. To this end, we propose ProgressiveAvatars, a progressive avatar representation built on a hierarchy of 3D Gaussians grown by adaptive implicit subdivision on a template mesh. 3D Gaussians are defined in face-local coordinates to remain animatable under varying expressions and head motion across multiple detail levels. The hierarchy expands when screen-space signals indicate a lack of detail, allocating resources to important areas. Leveraging importance ranking, ProgressiveAvatars supports incremental loading and rendering, adding new Gaussians as they arrive while preserving previous content, thus achieving smooth quality improvements across varying bandwidths. ProgressiveAvatars enables progressive delivery and progressive rendering under fluctuating network bandwidth and varying compute and memory resources. △ Less

Submitted 17 March, 2026; originally announced March 2026.

Comments: Accepted to CVPR 2026, Project page: https://ustc3dv.github.io/ProgressiveAvatars/

arXiv:2603.14327 [pdf, ps, other]

OmniClone: Engineering a Robust, All-Rounder Whole-Body Humanoid Teleoperation System

Authors: Yixuan Li, Le Ma, Yutang Lin, Yushi Du, Mengya Liu, Kaizhe Hu, Jieming Cui, Yixin Zhu, Wei Liang, Baoxiong Jia, Siyuan Huang

Abstract: Whole-body humanoid teleoperation enables humans to remotely control humanoid robots, serving as both a real-time operational tool and a scalable engine for collecting demonstrations for autonomous learning. Despite recent advances, existing systems are validated using aggregate metrics that conflate distinct motion regimes, masking critical failure modes. This lack of diagnostic granularity, comp… ▽ More Whole-body humanoid teleoperation enables humans to remotely control humanoid robots, serving as both a real-time operational tool and a scalable engine for collecting demonstrations for autonomous learning. Despite recent advances, existing systems are validated using aggregate metrics that conflate distinct motion regimes, masking critical failure modes. This lack of diagnostic granularity, compounded by tightly coupled and labor-intensive system configurations, hinders robust real-world deployment. A key open challenge is building a teleoperation system that is simultaneously robust, versatile, and affordable for practical use. Here we present OmniClone, a whole-body humanoid teleoperation system that achieves high-fidelity, multi-skill control on a single consumer GPU with modest data requirements. Central to our approach is OmniBench, a diagnostic benchmark that evaluates policies across stratified motion categories and difficulty levels on unseen motions, exposing the narrow specialization of prior systems. Guided by these diagnostics, we identify an optimized training data recipe and integrate system-level improvements: subject-agnostic retargeting and robust communication, that collectively reduce Mean Per-Joint Position Error (MPJPE) by over 66% while requiring orders-of-magnitude fewer computational resources than comparable methods. Crucially, OmniClone is control-source-agnostic: a single unified policy supports real-time teleoperation, generated motion playback, and Vision-Language-Action (VLA) models, while generalizing across operators of vastly different body proportions. By uniting diagnostic evaluation with practical engineering, OmniClone provides an accessible foundation for scalable humanoid teleoperation and autonomous learning. △ Less

Submitted 15 March, 2026; originally announced March 2026.

Comments: Website: https://omniclone.github.io/

arXiv:2603.12930 [pdf, ps, other]

Rethinking VLMs for Image Forgery Detection and Localization

Authors: Shaofeng Guo, Jiequan Cui, Richang Hong

Abstract: With the rapid rise of Artificial Intelligence Generated Content (AIGC), image manipulation has become increasingly accessible, posing significant challenges for image forgery detection and localization (IFDL). In this paper, we study how to fully leverage vision-language models (VLMs) to assist the IFDL task. In particular, we observe that priors from VLMs hardly benefit the detection and localiz… ▽ More With the rapid rise of Artificial Intelligence Generated Content (AIGC), image manipulation has become increasingly accessible, posing significant challenges for image forgery detection and localization (IFDL). In this paper, we study how to fully leverage vision-language models (VLMs) to assist the IFDL task. In particular, we observe that priors from VLMs hardly benefit the detection and localization performance and even have negative effects due to their inherent biases toward semantic plausibility rather than authenticity. Additionally, the location masks explicitly encode the forgery concepts, which can serve as extra priors for VLMs to ease their training optimization, thus enhancing the interpretability of detection and localization results. Building on these findings, we propose a new IFDL pipeline named IFDL-VLM. To demonstrate the effectiveness of our method, we conduct experiments on 9 popular benchmarks and assess the model performance under both in-domain and cross-dataset generalization settings. The experimental results show that we consistently achieve new state-of-the-art performance in detection, localization, and interpretability.Code is available at: https://github.com/sha0fengGuo/IFDL-VLM. △ Less

Submitted 13 March, 2026; originally announced March 2026.

Comments: 8pages

MSC Class: 68T45 ACM Class: I.4.8; I.4.9; I.2.10; K.6.5

arXiv:2603.12780 [pdf, ps, other]

Functional CLT for general sample covariance matrices

Authors: Jian Cui, Zhijun Liu, Jiang Hu, Zhidong Bai

Abstract: This paper studies the central limit theorems (CLTs) for linear spectral statistics (LSSs) of general sample covariance matrices, when the test functions belong to $C^3$, the class of functions with continuous third order derivatives. We consider matrices of the form $B_n=(1/n)T_p^{1/2}X_nX_n^{*}T_p^{1/2},$ where $X_n= (x_{i j} ) $ is a $p \times n$ matrix whose entries are independent and identic… ▽ More This paper studies the central limit theorems (CLTs) for linear spectral statistics (LSSs) of general sample covariance matrices, when the test functions belong to $C^3$, the class of functions with continuous third order derivatives. We consider matrices of the form $B_n=(1/n)T_p^{1/2}X_nX_n^{*}T_p^{1/2},$ where $X_n= (x_{i j} ) $ is a $p \times n$ matrix whose entries are independent and identically distributed (i.i.d.) real or complex random variables, and $T_p$ is a $p\times p$ nonrandom Hermitian nonnegative definite matrix with its spectral norm uniformly bounded in $p$. By using Bernstein polynomial approximation, we show that, under $\mathbb{E}|x_{ij}|^{8}<\infty$, the centered LSSs of $B_n$ have Gaussian limits. Under the stronger $\mathbb{E}|x_{ij}|^{10}<\infty$, we further establish convergence rates $O(n^{-1/2+κ})$ in Kolmogorov--Smirnov $O(n^{-1/2+κ})$, for any fixed $κ>0$. △ Less

Submitted 13 March, 2026; originally announced March 2026.

arXiv:2603.12277 [pdf, ps, other]

Prompt Injection as Role Confusion

Authors: Charles Ye, Jasmine Cui, Dylan Hadfield-Menell

Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer the source of text based on how it sounds, not where it actually comes from. A command hidden in a webpage hijacks an agent simply because it sounds like a user instruction. This is not just behavioral: in the model's internal representations, text… ▽ More Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer the source of text based on how it sounds, not where it actually comes from. A command hidden in a webpage hijacks an agent simply because it sounds like a user instruction. This is not just behavioral: in the model's internal representations, text that sounds like a trusted source occupies the same space as text that actually is one. We design role probes which measure how models internally perceive "who is speaking", showing that attacker-controllable signals (e.g. syntactic patterns, lexical choice) control role perception. We first test this with CoT Forgery, a zero-shot attack that injects fabricated reasoning into user prompts or ingested webpages. Models mistake the text for their own thoughts, yielding 60% attack success on StrongREJECT across frontier models with near-0% baselines. Strikingly, the degree of role confusion strongly predicts attack success. We then generalize these results to standard agent prompt injections, introducing a unifying framework that reframes prompt injection not as an ad-hoc exploit but as a measurable consequence of how models represent role. △ Less

Submitted 15 April, 2026; v1 submitted 22 February, 2026; originally announced March 2026.

arXiv:2603.12240 [pdf, ps, other]

BiGain: Unified Token Compression for Joint Generation and Classification

Authors: Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen

Abstract: Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key… ▽ More Acceleration methods for diffusion models (e.g., token merging or downsampling) typically optimize synthesis quality under reduced compute, yet often ignore discriminative capacity. We revisit token compression with a joint objective and present BiGain, a training-free, plug-and-play framework that preserves generation quality while improving classification in accelerated diffusion models. Our key insight is frequency separation: mapping feature-space signals into a frequency-aware representation disentangles fine detail from global semantics, enabling compression that respects both generative fidelity and discriminative utility. BiGain reflects this principle with two frequency-aware operators: (1) Laplacian-gated token merging, which encourages merges among spectrally smooth tokens while discouraging merges of high-contrast tokens, thereby retaining edges and textures; and (2) Interpolate-Extrapolate KV Downsampling, which downsamples keys/values via a controllable interextrapolation between nearest and average pooling while keeping queries intact, thereby conserving attention precision. Across DiT- and U-Net-based backbones and ImageNet-1K, ImageNet-100, Oxford-IIIT Pets, and COCO-2017, our operators consistently improve the speed-accuracy trade-off for diffusion-based classification, while maintaining or enhancing generation quality under comparable acceleration. For instance, on ImageNet-1K, with 70% token merging on Stable Diffusion 2.0, BiGain increases classification accuracy by 7.15% while improving FID by 0.34 (1.85%). Our analyses indicate that balanced spectral retention, preserving high-frequency detail and low/mid-frequency semantics, is a reliable design rule for token compression in diffusion models. To our knowledge, BiGain is the first framework to jointly study and advance both generation and classification under accelerated diffusion, supporting lower-cost deployment. △ Less

Submitted 12 March, 2026; originally announced March 2026.

Comments: CVPR 2026. Code: https://github.com/Greenoso/BiGain

arXiv:2603.10818 [pdf, ps, other]

Searches for charged-lepton-flavor violation in $χ_{bJ}(1P)$ decays

Authors: Belle, Belle II Collaborations, :, M. Abumusabh, I. Adachi, A. Aggarwal, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, V. Babu, H. Bae , et al. (394 additional authors not shown)

Abstract: We report the first searches for charged-lepton-flavor violation in decays of $χ_{bJ}(1P)$ ($J=0, 1,$ and $2$) to a pair of charged leptons using 158 million $Υ(2S)$ decays collected with the Belle detector in $e^+e^-$ collisions at the KEKB collider. No significant signal is observed, and we set upper limits on the branching fractions for $χ_{bJ}(1P)$ decays to $e^\pmμ^\mp$ at the level of… ▽ More We report the first searches for charged-lepton-flavor violation in decays of $χ_{bJ}(1P)$ ($J=0, 1,$ and $2$) to a pair of charged leptons using 158 million $Υ(2S)$ decays collected with the Belle detector in $e^+e^-$ collisions at the KEKB collider. No significant signal is observed, and we set upper limits on the branching fractions for $χ_{bJ}(1P)$ decays to $e^\pmμ^\mp$ at the level of $10^{-6}$ and to $e^\pmτ^\mp$ or $μ^\pmτ^\mp$ at the level of $10^{-5}$. Limits on $χ_{b0}(1P)$ decays are translated into bounds on the corresponding Wilson coefficients of scalar operators that mediate charged-lepton-flavor violation. △ Less

Submitted 12 March, 2026; v1 submitted 11 March, 2026; originally announced March 2026.

Comments: 5 pages, 4 figures, HQL 2025, QWG 2025

Report number: Belle~II Preprint 2026-005, KEK Preprint 2026-1

arXiv:2603.06283 [pdf]

Optimizing Complex Health Intervention Packages through the Learn-As-you-GO (LAGO) Design

Authors: Donna Spiegelman, Dong Roman Xu, Ante Bing, Guangyu Tong, Mona Abdo, Jingyu Cui, Charles Goss, John Baptist Kiggundu, Chris T. Longenecker, LaRon Nelson, Drew Cameron, Fred Semitala, Xin Zhou, Judith J. Lok

Abstract: In the face of vast numbers of preventable deaths worldwide and gaping disparities in their distribution, we cannot afford to conduct null and inconclusive effectiveness and implementation trials of evidence-based interventions. The gold standard in biomedical research, the individually randomized clinical trial, is ill-suited as the primary tool for knowledge generation for contextually relevant,… ▽ More In the face of vast numbers of preventable deaths worldwide and gaping disparities in their distribution, we cannot afford to conduct null and inconclusive effectiveness and implementation trials of evidence-based interventions. The gold standard in biomedical research, the individually randomized clinical trial, is ill-suited as the primary tool for knowledge generation for contextually relevant, scalable, complex public health interventions of multi-component strategies. In this paper, we discuss the new Learn-As-you-GO (LAGO) design. In LAGO trials, the components of a complex intervention package are repeatedly optimized in pre-planned stages, until the package achieves its outcome and power goals with minimized cost and/or other optimization criteria, such as maximizing patient satisfaction. In this paper, the inputs to, and outputs of, LAGO are described, along with its general methodology. The methods are illustrated in the BetterBirth study, a large trial that aimed to reduce maternal and neonatal mortality in Uttar Pradesh, India, using the WHO essential birth practices checklist. Despite its scale, the BetterBirth study failed to demonstrate a significant effect of the intervention package on the primary health endpoint that included maternal mortality. We show how this unfortunate outcome could have been remedied had LAGO been used. LAGO is further illustrated through the discussion of several ongoing LAGO-informed implementation trials of HIV and non-communicable diseases in the United States and Sub-Saharan Africa. The Learn-As-you-GO (LAGO) design optimizes a complex, multi-level intervention for minimum cost, pre-specified power, and a pre-specified effectiveness goal, by adapting the intervention as the study is conducted, reducing risk of trial failure. △ Less

Submitted 6 March, 2026; originally announced March 2026.

arXiv:2603.05836 [pdf, ps, other]

Heterogeneous entanglement between a trapped ion and a solid-state quantum memory

Authors: Chen-Xu Wang, Yi-Yang Wang, Tian-Xiang Zhu, Qing-Quan Yao, Peng-Jun Liang, Yuan-Cong Li, Zi-Peng Liu, Ran He, Yong-Jian Han, Jin-Ming Cui, Zong-Quan Zhou, Yun-Feng Huang, Chuan-Feng Li, Guang-Can Guo

Abstract: Hybrid quantum networks offer a promising architecture for scalable quantum information processing and a future quantum internet, as they can combine the complementary strengths of disparate physical platforms. While single-atom systems provide deterministic quantum logic gates, atomic ensembles enable large-capacity quantum storage. However, generating entanglement between such heterogeneous syst… ▽ More Hybrid quantum networks offer a promising architecture for scalable quantum information processing and a future quantum internet, as they can combine the complementary strengths of disparate physical platforms. While single-atom systems provide deterministic quantum logic gates, atomic ensembles enable large-capacity quantum storage. However, generating entanglement between such heterogeneous systems has remained an open challenge, primarily due to fundamental spectral mismatches and system complexity. Here, we demonstrate a hybrid quantum network that entangles a single trapped $\mathrm{^{171}Yb^{+}}$ ion and a quantum memory based on $\rm ^{153}Eu^{3+}\colon\!Y_2SiO_5$ crystal over a 75-m separation. Using polarization-maintaining quantum frequency conversion, we map spin-photon entanglement onto a hybrid entanglement between a single spin qubit and a collective excitation of the quantum memory. The resulting entangled state achieves a fidelity of $(89.21 \pm 2.23)\%$ and violates the CHSH-Bell inequality by 6 standard deviations ($S = 2.328 \pm 0.055$), confirming nonlocality between two heterogeneous nodes. This work establishes entanglement between a quantum processing module with a multiplexed quantum memory node, representing a key step toward a scalable, multifunctional quantum internet. △ Less

Submitted 5 March, 2026; originally announced March 2026.

Comments: 27 pages,16 figures,2 tables

arXiv:2603.05564 [pdf, ps, other]

Multi-channel joint analysis of the exotic charmonium-like state $T_{c\bar{c}}(4020)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (700 additional authors not shown)

Abstract: This paper reports the first multi-channel joint analysis to identify the properties of the exotic charmonium-like state $T_{c\bar{c}}(4020)$ via the electron-positron annihilation process $e^{+}e^{-}\toπ^{+}T_{c\bar{c}}(4020)^{-}+c.c$. A partial wave analysis is performed simultaneously in three decay channels $T_{c\bar{c}}(4020)^{-}\to {D}^{*0}D^{*-}$, $π^{-}J/ψ$, and $π^{-}h_{c}$, based on data… ▽ More This paper reports the first multi-channel joint analysis to identify the properties of the exotic charmonium-like state $T_{c\bar{c}}(4020)$ via the electron-positron annihilation process $e^{+}e^{-}\toπ^{+}T_{c\bar{c}}(4020)^{-}+c.c$. A partial wave analysis is performed simultaneously in three decay channels $T_{c\bar{c}}(4020)^{-}\to {D}^{*0}D^{*-}$, $π^{-}J/ψ$, and $π^{-}h_{c}$, based on data samples taken at $\sqrt{s}=4.395$ and $4.416\,\mathrm{GeV}$ with an integrated luminosity of $1598.9\,\mathrm{pb}^{-1}$ collected with the BESIII detector operating on the BEPCII collider. For the first time, the spin-parity of the $T_{c\bar{c}}(4020)^{-}$ is determined to be $J^{P}=1^{+}$ with a significance $11.7σ$. Pole positions are extracted on the Riemann sheets with three branch points in the complex energy plane. Furthermore, the relative branching fractions are obtained as $\mathcal{B}[T_{c\bar{c}}(4020)^{-}\toπ^{-}J/ψ]/\mathcal{B}[T_{c\bar{c}}(4020)^{-}\to{D}^{*0}D^{*-}]=(3.6\pm0.6\pm1.6)\times10^{-3}$ and $\mathcal{B}[T_{c\bar{c}}(4020)^{-}\toπ^{-}h_{c}]/\mathcal{B}[T_{c\bar{c}}(4020)^{-}\to{D}^{*0}D^{*-}]=(8.9\pm1.3\pm2.3)\times10^{-2}$, where the first uncertainties are statistical, and the second are systematic. △ Less

Submitted 5 March, 2026; originally announced March 2026.

arXiv:2603.05405 [pdf, ps, other]

Bala-Join: An Adaptive Hash Join for Balancing Communication and Computation in Geo-Distributed SQL Databases

Authors: Wenlong Song, Hui Li, Bingying Zhai, Jinxin Yang, Pinghui Wang, Luming Sun, Ming Li, Jiangtao Cui

Abstract: Shared-nothing geo-distributed SQL databases, such as CockroachDB, are increasingly vital for enterprise applications requiring data resilience and locality. However, we encountered significant performance degradation at the customer side, especially when their deployments span multiple data centers over a Wide Area Network (WAN). Our investigation identifies the bottleneck in the performance of t… ▽ More Shared-nothing geo-distributed SQL databases, such as CockroachDB, are increasingly vital for enterprise applications requiring data resilience and locality. However, we encountered significant performance degradation at the customer side, especially when their deployments span multiple data centers over a Wide Area Network (WAN). Our investigation identifies the bottleneck in the performance of the Distributed Hash Join (Dist-HJ) algorithm, which is contingent upon a crucial balance between communication overhead and computational load. This balance is severely disrupted when processing skewed data from real-world customer workloads, leading to the observed performance decline. To tackle this challenge, we introduce Bala-Join, an adaptive solution to balance the computation and network load in Dist-HJ execution. Our approach consists of the Balanced Partition and Partial Replication (BPPR) algorithm and a distributed online skewed join key detector. The former achieves balanced redistribution of skewed data through a multicast mechanism to improve computational performance and reduce network overhead. The latter provides real-time skewed join key information tailored to BPPR. Furthermore, an Active-Signaling and Asynchronous-Pulling (ASAP) mechanism is incorporated to enable efficient, real-time synchronization between the detector and the redistribution process with minimal overhead. Empirical study shows that Bala-Join outperforms the popular Dist-HJ solutions, increasing throughput by 25%-61%. △ Less

Submitted 5 March, 2026; originally announced March 2026.

Comments: 14Pages, 8 figures

ACM Class: H.2.4

arXiv:2603.04868 [pdf, ps, other]

K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation

Authors: Mingxuan Mu, Guo Yang, Lei Chen, Ping Wu, Jianxun Cui

Abstract: Generating realistic and diverse trajectories is a critical challenge in autonomous driving simulation. While Large Language Models (LLMs) show promise, existing methods often rely on structured data like vectorized maps, which fail to capture the rich, unstructured visual context of a scene. To address this, we propose K-Gen, an interpretable keypoint-guided multimodal framework that leverages Mu… ▽ More Generating realistic and diverse trajectories is a critical challenge in autonomous driving simulation. While Large Language Models (LLMs) show promise, existing methods often rely on structured data like vectorized maps, which fail to capture the rich, unstructured visual context of a scene. To address this, we propose K-Gen, an interpretable keypoint-guided multimodal framework that leverages Multimodal Large Language Models (MLLMs) to unify rasterized BEV map inputs with textual scene descriptions. Instead of directly predicting full trajectories, K-Gen generates interpretable keypoints along with reasoning that reflects agent intentions, which are subsequently refined into accurate trajectories by a refinement module. To further enhance keypoint generation, we apply T-DAPO, a trajectory-aware reinforcement fine-tuning algorithm. Experiments on WOMD and nuPlan demonstrate that K-Gen outperforms existing baselines, highlighting the effectiveness of combining multimodal reasoning with keypoint-guided trajectory generation. △ Less

Submitted 5 March, 2026; originally announced March 2026.

arXiv:2603.04071 [pdf, ps, other]

SaFeR: Safety-Critical Scenario Generation for Autonomous Driving Test via Feasibility-Constrained Token Resampling

Authors: Jinlong Cui, Fenghua Liang, Guo Yang, Chengcheng Tang, Jianxun Cui

Abstract: Safety-critical scenario generation is crucial for evaluating autonomous driving systems. However, existing approaches often struggle to balance three conflicting objectives: adversarial criticality, physical feasibility, and behavioral realism. To bridge this gap, we propose SaFeR: safety-critical scenario generation for autonomous driving test via feasibility-constrained token resampling. We fir… ▽ More Safety-critical scenario generation is crucial for evaluating autonomous driving systems. However, existing approaches often struggle to balance three conflicting objectives: adversarial criticality, physical feasibility, and behavioral realism. To bridge this gap, we propose SaFeR: safety-critical scenario generation for autonomous driving test via feasibility-constrained token resampling. We first formulate traffic generation as a discrete next token prediction problem, employing a Transformer-based model as a realism prior to capture naturalistic driving distributions. To capture complex interactions while effectively mitigating attention noise, we propose a novel differential attention mechanism within the realism prior. Building on this prior, SaFeR implements a novel resampling strategy that induces adversarial behaviors within a high-probability trust region to maintain naturalism, while enforcing a feasibility constraint derived from the Largest Feasible Region (LFR). By approximating the LFR via offline reinforcement learning, SaFeR effectively prevents the generation of theoretically inevitable collisions. Closed-loop experiments on the Waymo Open Motion Dataset and nuPlan demonstrate that SaFeR significantly outperforms state-of-the-art baselines, achieving a higher solution rate and superior kinematic realism while maintaining strong adversarial effectiveness. △ Less

Submitted 4 March, 2026; originally announced March 2026.

arXiv:2603.04055 [pdf, ps, other]

A scalar auxiliary variable-based semi-implicit scheme for stochastic Cahn--Hilliard equation

Authors: Jianbo Cui, Jie Shen, Derui Sheng, Yahong Xiang

Abstract: In this paper, we present a novel semi-implicit numerical scheme for the stochastic Cahn--Hilliard equation driven by multiplicative noise. By reformulating the original equation into an equivalent stochastic scalar auxiliary variable (SSAV) system, our method enables an efficient and stable treatment of polynomial nonlinearities in a semi-implicit fashion. In order to accurately capture the impac… ▽ More In this paper, we present a novel semi-implicit numerical scheme for the stochastic Cahn--Hilliard equation driven by multiplicative noise. By reformulating the original equation into an equivalent stochastic scalar auxiliary variable (SSAV) system, our method enables an efficient and stable treatment of polynomial nonlinearities in a semi-implicit fashion. In order to accurately capture the impact of stochastic perturbations, we carefully incorporate Itô correction terms into the SSAV approximation. Leveraging the smoothing properties of the underlying semigroup and the $H^{-1}$-dissipative structure of the nonlinear term, we establish the optimal strong convergence order of one-half for the proposed scheme in the trace-class noise case. Moreover, we show that the modified SAV energy asymptotically preserves the energy evolution law. Finally, numerical experiments are provided to validate the theoretical results and to explore the influence of noise near the sharp-interface limit. △ Less

Submitted 4 March, 2026; originally announced March 2026.

arXiv:2603.01928 [pdf, ps, other]

LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving

Authors: Yuechen Luo, Fang Li, Shaoqing Xu, Yang Ji, Zehan Zhang, Bing Wang, Yuannan Shen, Jianwei Cui, Long Chen, Guang Chen, Hangjun Ye, Zhi-Xin Yang, Fuxi Wen

Abstract: While Vision-Language-Action (VLA) models have revolutionized autonomous driving by unifying perception and planning, their reliance on explicit textual Chain-of-Thought (CoT) leads to semantic-perceptual decoupling and perceptual-symbolic conflicts. Recent shifts toward latent reasoning attempt to bypass these bottlenecks by thinking in continuous hidden space. However, without explicit intermedi… ▽ More While Vision-Language-Action (VLA) models have revolutionized autonomous driving by unifying perception and planning, their reliance on explicit textual Chain-of-Thought (CoT) leads to semantic-perceptual decoupling and perceptual-symbolic conflicts. Recent shifts toward latent reasoning attempt to bypass these bottlenecks by thinking in continuous hidden space. However, without explicit intermediate constraints, standard latent CoT often operates as a physics-agnostic representation. To address this, we propose the Latent Spatio-Temporal VLA (LaST-VLA), a framework shifting the reasoning paradigm from discrete symbolic processing into a physically grounded Latent Spatio-Temporal CoT. By implementing a dual-feature alignment mechanism, we distill geometric constraints from 3D foundation models and dynamic foresight from world models directly into the latent space. Coupled with a progressive SFT training strategy that transitions from feature alignment to trajectory generation, and refined via Reinforcement Learning with Group Relative Policy Optimization (GRPO) to ensure safety and rule compliance. \method~setting a new record on NAVSIM v1 (91.3 PDMS) and NAVSIM v2 (87.1 EPDMS), while excelling in spatial-temporal reasoning on SURDS and NuDynamics benchmarks. △ Less

Submitted 12 March, 2026; v1 submitted 2 March, 2026; originally announced March 2026.

arXiv:2603.00597 [pdf, ps, other]

AI-IO: An Aerodynamics-Inspired Real-Time Inertial Odometry for Quadrotors

Authors: Jiahao Cui, Feng Yu, Linzuo Zhang, Yu Hu, Danping Zou

Abstract: Inertial Odometry (IO) has gained attention in quadrotor applications due to its sole reliance on inertial measurement units (IMUs), attributed to its lightweight design, low cost, and robust performance across diverse environments. However, most existing learning-based inertial odometry systems for quadrotors either use only IMU data or include additional dynamics-related inputs such as thrust, b… ▽ More Inertial Odometry (IO) has gained attention in quadrotor applications due to its sole reliance on inertial measurement units (IMUs), attributed to its lightweight design, low cost, and robust performance across diverse environments. However, most existing learning-based inertial odometry systems for quadrotors either use only IMU data or include additional dynamics-related inputs such as thrust, but still lack a principled formulation of the underlying physical model to be learned. This lack of interpretability hampers the model's ability to generalize and often limits its accuracy. In this work, we approach the inertial odometry learning problem from a different perspective. Inspired by the aerodynamics model and IMU measurement model, we identify the key physical quantity--rotor speed measurements required for inertial odometry and design a transformer-based inertial odometry. By incorporating rotor speed measurements, the proposed model improves velocity prediction accuracy by 36.9%. Furthermore, the transformer architecture more effectively exploits temporal dependencies for denoising and aerodynamics modeling, yielding an additional 22.4% accuracy gain over previous results. To support evaluation, we also provide a real-world quadrotor flight dataset capturing IMU measurements and rotor speed for high-speed motion. Finally, combined with an uncertainty-aware extended Kalman filter (EKF), our framework is validated across multiple datasets and real-time systems, demonstrating superior accuracy, generalization, and real-time performance. We share the code and data to promote further research (https://github.com/SJTU-ViSYS-team/AI-IO). △ Less

Submitted 28 February, 2026; originally announced March 2026.

Comments: 8 pages, 8 figures, 2026 IEEE International Conference on Robotics(ICRA 2026)

arXiv:2602.24029 [pdf]

Measurement and Modeling of Structure-Induced Surface Scattering on Terahertz Channel

Authors: Peian Li, Yapeng Ge, Jiacheng Liu, Wenbo Liu, Jiayuan Cui, Jiabiao Zhao, Qiang Niu, Yuping Yang, Xiangzhu Meng, Yiming Zhao, Jianjun Ma

Abstract: As terahertz (THz) frequencies emerge as promising candidates for next-generation wireless networks, accurate characterization of propagation mechanisms in indoor/outdoor environments becomes essential for system design and performance optimization. This article presents an experimental and theoretical investigation of structure-induced indoor surface scattering on THz channels, examining how mate… ▽ More As terahertz (THz) frequencies emerge as promising candidates for next-generation wireless networks, accurate characterization of propagation mechanisms in indoor/outdoor environments becomes essential for system design and performance optimization. This article presents an experimental and theoretical investigation of structure-induced indoor surface scattering on THz channels, examining how material properties and structural configurations jointly govern channel power and angular distribution. Six representative indoor surfaces are characterized, revealing that intrinsic structural inhomogeneity -- particularly the quasi-periodic earlywood-latewood arrangement in pine wood -- induces measurable angular scattering whose dominant lobes and angular shifts are reproduced by a beam-propagation modeling (BPM) framework. Material-covered surface configurations are further investigated, demonstrating that thin dielectric covering layers can substantially modify reflection characteristics through thickness- and frequency- dependent thin-film interference effects. Wide-angle bistatic measurements conducted in a conference-room environment reveal that structured indoor elements, such as folded curtains, can enhance angular scattering and extend spatial coverage. These findings establish that structure-induced surface scattering mechanisms offer potential for constructing non-line-of-sight THz links in indoor environments. △ Less

Submitted 27 February, 2026; originally announced February 2026.

Comments: Submitted to IEEE Transactions on Antenna and Propagation

arXiv:2602.22801 [pdf, ps, other]

Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

Authors: Yinan Zheng, Tianyi Tan, Bin Huang, Enguang Liu, Ruiming Liang, Jianlin Zhang, Jianwei Cui, Guang Chen, Kun Ma, Hangjun Ye, Long Chen, Ya-Qin Zhang, Xianyuan Zhan, Jingjing Liu

Abstract: Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Auto… ▽ More Diffusion models have become a popular choice for decision-making tasks in robotics, and more recently, are also being considered for solving autonomous driving tasks. However, their applications and evaluations in autonomous driving remain limited to simulation-based or laboratory settings. The full strength of diffusion models for large-scale, complex real-world settings, such as End-to-End Autonomous Driving (E2E AD), remains underexplored. In this study, we conducted a systematic and large-scale investigation to unleash the potential of the diffusion models as planners for E2E AD, based on a tremendous amount of real-vehicle data and road testing. Through comprehensive and carefully controlled studies, we identify key insights into the diffusion loss space, trajectory representation, and data scaling that significantly impact E2E planning performance. Moreover, we also provide an effective reinforcement learning post-training strategy to further enhance the safety of the learned planner. The resulting diffusion-based learning framework, Hyper Diffusion Planner} (HDP), is deployed on a real-vehicle platform and evaluated across 6 urban driving scenarios and 200 km of real-world testing, achieving a notable 10x performance improvement over the base model. Our work demonstrates that diffusion models, when properly designed and trained, can serve as effective and scalable E2E AD planners for complex, real-world autonomous driving tasks. △ Less

Submitted 26 February, 2026; originally announced February 2026.

arXiv:2602.22660 [pdf, ps, other]

LEDA: Latent Semantic Distribution Alignment for Multi-domain Graph Pre-training

Authors: Lianze Shan, Jitao Zhao, Dongxiao He, Siqi Liu, Jiaxu Cui, Weixiong Zhang

Abstract: Recent advances in generic large models, such as GPT and DeepSeek, have motivated the introduction of universality to graph pre-training, aiming to learn rich and generalizable knowledge across diverse domains using graph representations to improve performance in various downstream applications. However, most existing methods face challenges in learning effective knowledge from generic graphs, pri… ▽ More Recent advances in generic large models, such as GPT and DeepSeek, have motivated the introduction of universality to graph pre-training, aiming to learn rich and generalizable knowledge across diverse domains using graph representations to improve performance in various downstream applications. However, most existing methods face challenges in learning effective knowledge from generic graphs, primarily due to simplistic data alignment and limited training guidance. The issue of simplistic data alignment arises from the use of a straightforward unification for highly diverse graph data, which fails to align semantics and misleads pre-training models. The problem with limited training guidance lies in the arbitrary application of in-domain pre-training paradigms to cross-domain scenarios. While it is effective in enhancing discriminative representation in one data space, it struggles to capture effective knowledge from many graphs. To address these challenges, we propose a novel Latent sEmantic Distribution Alignment (LEDA) model for universal graph pre-training. Specifically, we first introduce a dimension projection unit to adaptively align diverse domain features into a shared semantic space with minimal information loss. Furthermore, we design a variational semantic inference module to obtain the shared latent distribution. The distribution is then adopted to guide the domain projection, aligning it with shared semantics across domains and ensuring cross-domain semantic learning. LEDA exhibits strong performance across a broad range of graphs and downstream tasks. Remarkably, in few-shot cross-domain settings, it significantly outperforms in-domain baselines and advanced universal pre-training models. △ Less

Submitted 26 February, 2026; originally announced February 2026.

Comments: Accepted by WWW-26, 12 pages, 2 figures

arXiv:2602.21723 [pdf, ps, other]

LessMimic: Long-Horizon Humanoid Interaction with Unified Distance Field Representations

Authors: Yutang Lin, Jieming Cui, Yixuan Li, Baoxiong Jia, Yixin Zhu, Siyuan Huang

Abstract: Humanoid robots that autonomously interact with physical environments over extended horizons represent a central goal of embodied intelligence. Existing approaches rely on reference motions or task-specific rewards, tightly coupling policies to particular object geometries and precluding multi-skill generalization within a single framework. A unified interaction representation enabling reference-f… ▽ More Humanoid robots that autonomously interact with physical environments over extended horizons represent a central goal of embodied intelligence. Existing approaches rely on reference motions or task-specific rewards, tightly coupling policies to particular object geometries and precluding multi-skill generalization within a single framework. A unified interaction representation enabling reference-free inference, geometric generalization, and long-horizon skill composition within one policy remains an open challenge. Here we show that Distance Field (DF) provides such a representation: LessMimic conditions a single whole-body policy on DF-derived geometric cues--surface distances, gradients, and velocity decompositions--removing the need for motion references, with interaction latents encoded via a Variational Auto-Encoder (VAE) and post-trained using Adversarial Interaction Priors (AIP) under Reinforcement Learning (RL). Through DAgger-style distillation that aligns DF latents with egocentric depth features, LessMimic further transfers seamlessly to vision-only deployment without motion capture (MoCap) infrastructure. A single LessMimic policy achieves 80--100% success across object scales from 0.4x to 1.6x on PickUp and SitStand where baselines degrade sharply, attains 62.1% success on 5 task instances trajectories, and remains viable up to 40 sequentially composed tasks. By grounding interaction in local geometry rather than demonstrations, LessMimic offers a scalable path toward humanoid robots that generalize, compose skills, and recover from failures in unstructured environments. △ Less

Submitted 25 February, 2026; originally announced February 2026.

arXiv:2602.20952 [pdf, ps, other]

RISK: Efficiently processing rich spatial-keyword queries on encrypted geo-textual data

Authors: Zhen Lv, Cong Cao, Hongwei Huo, Jiangtao Cui, Yanguo Peng, Hui Li, Yingfan Liu

Abstract: Symmetric searchable encryption (SSE) for geo-textual data has attracted significant attention. However, existing schemes rely on task-specific, incompatible indices for isolated specific secure queries (e.g., range or k-nearest neighbor spatial-keyword queries), limiting practicality due to prohibitive multi-index overhead. To address this, we propose RISK, a model for rich spatial-keyword querie… ▽ More Symmetric searchable encryption (SSE) for geo-textual data has attracted significant attention. However, existing schemes rely on task-specific, incompatible indices for isolated specific secure queries (e.g., range or k-nearest neighbor spatial-keyword queries), limiting practicality due to prohibitive multi-index overhead. To address this, we propose RISK, a model for rich spatial-keyword queries on encrypted geo-textual data. In a textual-first-then-spatial manner, RISK is built on a novel k-nearest neighbor quadtree (kQ-tree) that embeds representative and regional nearest neighbors, with the kQ-tree further encrypted using standard cryptographic tools (e.g., keyed hash functions and symmetric encryption). Overall, RISK seamlessly supports both secure range and k-nearest neighbor queries, is provably secure under IND-CKA2 model, and extensible to multi-party scenarios and dynamic updates. Experiments on three real-world and one synthetic datasets show that RISK outperforms state-of-the-art methods by at least 0.5 and 4 orders of magnitude in response time for 1% range queries and 10-nearest neighbor queries, respectively. △ Less

Submitted 24 February, 2026; originally announced February 2026.

Comments: 15 pages, 10 figures, IEEE ICDE

arXiv:2602.20021 [pdf, ps, other]

Agents of Chaos

Authors: Natalie Shapira, Chris Wendler, Avery Yen, Gabriele Sarti, Koyena Pal, Olivia Floody, Adam Belfki, Alex Loftus, Aditya Ratan Jannali, Nikhil Prakash, Jasmine Cui, Giordano Rogers, Jannik Brinkmann, Can Rager, Amir Zur, Michael Ripa, Aruna Sankaranarayanan, David Atkinson, Rohit Gandikota, Jaden Fiotto-Kaufman, EunJeong Hwang, Hadas Orgad, P Sam Sahil, Negev Taglicht, Tomer Shabtay , et al. (13 additional authors not shown)

Abstract: We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language mod… ▽ More We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation. △ Less

Submitted 23 February, 2026; originally announced February 2026.

arXiv:2602.19807 [pdf, ps, other]

Study of $e^+e^- \to π^+π^-Υ(1D)$ at Belle II

Authors: Belle II Collaboration, M. Abumusabh, I. Adachi, A. Aggarwal, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, M. Angelsmark, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati, P. Bambade , et al. (383 additional authors not shown)

Abstract: The bottomonium spectrum, consisting of bound states of a $b$ quark and an anti-$b$ quark, provides an excellent laboratory for probing quantum chromodynamics in the non-perturbative regime. While $S$ and $P$-wave bottomonium states are well studied experimentally, information on $D$-wave states remains scarce. We search for $D$-wave bottomonium state via the decay of a vector bottomonium-like sta… ▽ More The bottomonium spectrum, consisting of bound states of a $b$ quark and an anti-$b$ quark, provides an excellent laboratory for probing quantum chromodynamics in the non-perturbative regime. While $S$ and $P$-wave bottomonium states are well studied experimentally, information on $D$-wave states remains scarce. We search for $D$-wave bottomonium state via the decay of a vector bottomonium-like state $Υ(10753)$ in the reaction $e^+e^- \to π^+π^- Υ(1D)$, using $19.6~\mathrm{fb}^{-1}$ of data collected with the Belle II detector at center-of-mass energies $\sqrt{s} = 10.653, 10.701, 10.745$, and $10.805$~GeV, in the vicinity of the $Υ(10753)$ resonance. No significant signals are observed. Upper limits at the 90% credibility level are set on the products of the cross sections and branching fractions, $σ[e^+e^- \to π^+π^- Υ_2(1D)] \times \mathcal{B}[Υ_2(1D) \to γχ_{b1}]$ and $σ[e^+e^- \to π^+π^- Υ_3(1D)] \times \mathcal{B}[Υ_3(1D) \to γχ_{b2}]$, at each center-of-mass energy. △ Less

Submitted 23 February, 2026; originally announced February 2026.

Comments: 22 pages, 4 figures

Report number: Belle II Preprint: 2026-004 KEK Preprint: 2025-44

arXiv:2602.19153 [pdf, ps, other]

Constrained Diffusion for Accelerated Structure Relaxation of Inorganic Solids with Point Defects

Authors: Jingyi Cui, Jacob K. Christopher, Ankita Biswas, Prasanna V. Balachandran, Ferdinando Fioretto

Abstract: Point defects affect material properties by altering electronic states and modifying local bonding environments. However, high-throughput first-principles simulations of point defects are costly due to large simulation cells and complex energy landscapes. To this end, we propose a generative framework for simulating point defects, overcoming the limits of costly first-principles simulators. By lev… ▽ More Point defects affect material properties by altering electronic states and modifying local bonding environments. However, high-throughput first-principles simulations of point defects are costly due to large simulation cells and complex energy landscapes. To this end, we propose a generative framework for simulating point defects, overcoming the limits of costly first-principles simulators. By leveraging a primal-dual algorithm, we introduce a constraint-aware diffusion model which outperforms existing constrained diffusion approaches in this domain. Across six defect configuration settings for Bi2Te3, the proposed approach provides state-of-the-art performance generating physically grounded structures. △ Less

Submitted 22 February, 2026; originally announced February 2026.

Comments: Appeared in the NeurIPS 2025 Workshop on AI for Accelerated Material Design (AI4Mat)

arXiv:2602.17645 [pdf, ps, other]

Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting

Authors: Xiaohan Zhao, Zhaoyi Li, Yaxin Luo, Jiacheng Cui, Zhiqiang Shen

Abstract: Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent loca… ▽ More Black-box adversarial attacks on Large Vision-Language Models (LVLMs) are challenging due to missing gradients and complex multimodal boundaries. While prior state-of-the-art transfer-based approaches like M-Attack perform well using local crop-level matching between source and target images, we find this induces high-variance, nearly orthogonal gradients across iterations, violating coherent local alignment and destabilizing optimization. We attribute this to (i) ViT translation sensitivity that yields spike-like gradients and (ii) structural asymmetry between source and target crops. We reformulate local matching as an asymmetric expectation over source transformations and target semantics, and build a gradient-denoising upgrade to M-Attack. On the source side, Multi-Crop Alignment (MCA) averages gradients from multiple independently sampled local views per iteration to reduce variance. On the target side, Auxiliary Target Alignment (ATA) replaces aggressive target augmentation with a small auxiliary set from a semantically correlated distribution, producing a smoother, lower-variance target manifold. We further reinterpret momentum as Patch Momentum, replaying historical crop gradients; combined with a refined patch-size ensemble (PE+), this strengthens transferable directions. Together these modules form M-Attack-V2, a simple, modular enhancement over M-Attack that substantially improves transfer-based black-box attacks on frontier LVLMs: boosting success rates on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%, outperforming prior black-box LVLM attacks. Code and data are publicly available at: https://github.com/vila-lab/M-Attack-V2. △ Less

Submitted 19 February, 2026; originally announced February 2026.

Comments: Code at: https://github.com/vila-lab/M-Attack-V2

arXiv:2602.14349 [pdf, ps, other]

Same Prompt, Different Outcomes: Evaluating the Reproducibility of Data Analysis by LLMs

Authors: Jiaxin Cui, Rohan Alexander

Abstract: We systematically evaluate the reproducibility of data analysis conducted by Large Language Models (LLMs). We evaluate two prompting strategies, six models, and four temperature settings, with ten independent executions per configuration, yielding 480 total attempts. We assess the completion, concordance, validity, and consistency of each attempt and find considerable variation in the analytical r… ▽ More We systematically evaluate the reproducibility of data analysis conducted by Large Language Models (LLMs). We evaluate two prompting strategies, six models, and four temperature settings, with ten independent executions per configuration, yielding 480 total attempts. We assess the completion, concordance, validity, and consistency of each attempt and find considerable variation in the analytical results even for consistent configurations. This suggests, as with human data analysis, the data analysis conducted by LLMs can vary, even given the same task, data, and settings. Our results mean that if an LLM is being used to conduct data analysis, then it should be run multiple times independently and the distribution of results considered. △ Less

Submitted 15 February, 2026; originally announced February 2026.

arXiv:2602.14010 [pdf, ps, other]

A Deployment-Friendly Foundational Framework for Efficient Computational Pathology

Authors: Yu Cai, Cheng Jin, Jiabo Ma, Fengtao Zhou, Yingxue Xu, Zhengrui Guo, Yihui Wang, Zhengyu Zhang, Ling Liang, Yonghao Tan, Pingcheng Dong, Du Cai, On Ki Tang, Chenglong Zhao, Xi Wang, Can Yang, Yali Xu, Jing Cui, Zhenhui Li, Ronald Cheong Kin Chan, Yueping Liu, Feng Gao, Xiuming Zhang, Li Liang, Hao Chen , et al. (1 additional authors not shown)

Abstract: Pathology foundation models (PFMs) have enabled robust generalization in computational pathology through large-scale datasets and expansive architectures, but their substantial computational cost, particularly for gigapixel whole slide images, limits clinical accessibility and scalability. Here, we present LitePath, a deployment-friendly foundational framework designed to mitigate model over-param… ▽ More Pathology foundation models (PFMs) have enabled robust generalization in computational pathology through large-scale datasets and expansive architectures, but their substantial computational cost, particularly for gigapixel whole slide images, limits clinical accessibility and scalability. Here, we present LitePath, a deployment-friendly foundational framework designed to mitigate model over-parameterization and patch level redundancy. LitePath integrates LiteFM, a compact model distilled from three large PFMs (Virchow2, H-Optimus-1 and UNI2) using 190 million patches, and the Adaptive Patch Selector (APS), a lightweight component for task-specific patch selection. The framework reduces model parameters by 28x and lowers FLOPs by 403.5x relative to Virchow2, enabling deployment on low-power edge hardware such as the NVIDIA Jetson Orin Nano Super. On this device, LitePath processes 208 slides per hour, 104.5x faster than Virchow2, and consumes 0.36 kWh per 3,000 slides, 171x lower than Virchow2 on an RTX3090 GPU. We validated accuracy using 37 cohorts across four organs and 26 tasks (26 internal, 9 external, and 2 prospective), comprising 15,672 slides from 9,808 patients disjoint from the pretraining data. LitePath ranks second among 19 evaluated models and outperforms larger models including H-Optimus-1, mSTAR, UNI2 and GPFM, while retaining 99.71% of the AUC of Virchow2 on average. To quantify the balance between accuracy and efficiency, we propose the Deployability Score (D-Score), defined as the weighted geometric mean of normalized AUC and normalized FLOP, where LitePath achieves the highest value, surpassing Virchow2 by 10.64%. These results demonstrate that LitePath enables rapid, cost-effective and energy-efficient pathology image analysis on accessible hardware while maintaining accuracy comparable to state-of-the-art PFMs and reducing the carbon footprint of AI deployment. △ Less

Submitted 15 February, 2026; originally announced February 2026.

arXiv:2602.13569 [pdf, ps, other]

Study of $e^{+}e^{-}\to h^{+}h^{-}J/ψ~(h=π,~K,~p)$ via initial-state radiation at Belle~II

Authors: Belle II Collaboration, M. Abumusabh, I. Adachi, A. Aggarwal, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, V. Babu, H. Bae, N. K. Baghel, S. Bahinipati, P. Bambade , et al. (396 additional authors not shown)

Abstract: Using a data sample of 427.9 fb$^{-1}$ collected by the Belle~II detector at or near the $Υ(4S)$ and $Υ(10753)$ resonances, the cross sections for $e^+e^-\to h^+h^-J/ψ$ $(h=π/K/p)$ at center-of-mass energies ranging from 3.8 GeV or the production threshold to 5.5/6.0/7.0 GeV have been measured via initial-state radiation. The cross sections for the processes $e^+e^-\to π^+π^-J/ψ$ and… ▽ More Using a data sample of 427.9 fb$^{-1}$ collected by the Belle~II detector at or near the $Υ(4S)$ and $Υ(10753)$ resonances, the cross sections for $e^+e^-\to h^+h^-J/ψ$ $(h=π/K/p)$ at center-of-mass energies ranging from 3.8 GeV or the production threshold to 5.5/6.0/7.0 GeV have been measured via initial-state radiation. The cross sections for the processes $e^+e^-\to π^+π^-J/ψ$ and $e^+e^-\to K^+K^-J/ψ$ are consistent with previously published results. The cross sections for these channels obtained by combining with previous Belle results are also given. The process $e^+e^-\to p\bar p J/ψ$ is investigated for the first time. The yields are small and no significant structure is observed in the cross section versus energy. Searches for vector charmonium-like states in the $h^+h^-J/ψ$ systems, and for associated intermediate states in the $h^{\pm} J/ψ$ systems, are also presented. △ Less

Submitted 13 February, 2026; originally announced February 2026.

Comments: Belle II Preprint: 2026-002 KEK Preprint: 2025-42

arXiv:2602.11573 [pdf, ps, other]

Fast Tuning the Index Construction Parameters of Proximity Graphs in Vector Databases

Authors: Wenyang Zhou, Jiadong Xie, Yingfan Liu, Zhihao Yin, Jeffrey Xu Yu, Hui Li, Zhangqian Mu, Xiaotian Qiao, Jiangtao Cui

Abstract: k-approximate nearest neighbor search (k-ANNS) in high-dimensional vector spaces is a fundamental problem across many fields. With the advent of vector databases and retrieval-augmented generation, k-ANNS has garnered increasing attention. Among existing methods, proximity graphs (PG) based approaches are the state-of-the-art (SOTA) methods. However, the construction parameters of PGs significantl… ▽ More k-approximate nearest neighbor search (k-ANNS) in high-dimensional vector spaces is a fundamental problem across many fields. With the advent of vector databases and retrieval-augmented generation, k-ANNS has garnered increasing attention. Among existing methods, proximity graphs (PG) based approaches are the state-of-the-art (SOTA) methods. However, the construction parameters of PGs significantly impact their search performance. Before constructing a PG for a given dataset, it is essential to tune these parameters, which first recommends a set of promising parameters and then estimates the quality of each parameter by building the corresponding PG and then testing its k-ANNS performance. Given that the construction complexity of PGs is superlinear, building and evaluating graph indexes accounts for the primary cost of parameter tuning. Unfortunately, there is currently no method considered and optimized this process.In this paper, we introduce FastPGT, an efficient framework for tuning the PG construction parameters. FastPGT accelerates parameter estimation by building multiple PGs simultaneously, thereby reducing repeated computations. Moreover, we modify the SOTA tuning model to recommend multiple parameters at once, which can be efficiently estimated using our method of building multiple PGs simultaneously. Through extensive experiments on real-world datasets, we demonstrate that FastPGT achieves up to 2.37x speedup over the SOTA method VDTuner, without compromising tuning quality. △ Less

Submitted 17 February, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

arXiv:2602.11484 [pdf, ps, other]

Quantifying the effect of graph structure on strong Feller property of SPDEs

Authors: Jianbo Cui, Tonghe Dang, Jialin Hong, Zhengkai Wang

Abstract: This paper investigates how the structure of the underlying graph influences the behavior of stochastic partial differential equations (SPDEs) on finite tree graphs, where each edge is driven by space-time white noise. We first introduce a novel graph-based null decomposition approach to analyzing the strong Feller property of the Markov semigroup generated by SPDEs on tree graphs. By examining th… ▽ More This paper investigates how the structure of the underlying graph influences the behavior of stochastic partial differential equations (SPDEs) on finite tree graphs, where each edge is driven by space-time white noise. We first introduce a novel graph-based null decomposition approach to analyzing the strong Feller property of the Markov semigroup generated by SPDEs on tree graphs. By examining the positions of zero entries in eigenfunctions of the graph Laplacian operator, we establish a sharp upper bound on the number of noise-free edges that ensures both the strong Feller property and irreducibility. Interestingly, we find that the addition of noise to any single edge is sufficient for chain graphs, whereas for star graphs, at most one edge can remain noise-free without compromising the system's properties. Furthermore, under a dissipative condition, we prove the existence and exponential ergodicity of a unique invariant measure. △ Less

Submitted 11 February, 2026; originally announced February 2026.

Comments: 34 pages

MSC Class: 60H15; 35R02; 47D07; 37L40

arXiv:2602.11150 [pdf, ps, other]

YOR: Your Own Mobile Manipulator for Generalizable Robotics

Authors: Manan H Anjaria, Mehmet Enes Erciyes, Vedant Ghatnekar, Neha Navarkar, Haritheja Etukuru, Xiaole Jiang, Kanad Patel, Dhawal Kabra, Nicholas Wojno, Radhika Ajay Prayage, Soumith Chintala, Lerrel Pinto, Nur Muhammad Mahi Shafiullah, Zichen Jeff Cui

Abstract: Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditization of actuators, has propelled growth in low-cost robotic platforms. However, the optimal form factor for mobile manipulation, especially on a budget, remains an open question. We introduce YOR, an open-source,… ▽ More Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditization of actuators, has propelled growth in low-cost robotic platforms. However, the optimal form factor for mobile manipulation, especially on a budget, remains an open question. We introduce YOR, an open-source, low-cost mobile manipulator that integrates an omnidirectional base, a telescopic vertical lift, and two arms with grippers to achieve whole-body mobility and manipulation. Our design emphasizes modularity, ease of assembly using off-the-shelf components, and affordability, with a bill-of-materials cost under 10,000 USD. We demonstrate YOR's capability by completing tasks that require coordinated whole-body control, bimanual manipulation, and autonomous navigation. Overall, YOR offers competitive functionality for mobile manipulation research at a fraction of the cost of existing platforms. Project website: https://www.yourownrobot.ai/ △ Less

Submitted 11 February, 2026; originally announced February 2026.

arXiv:2602.09800 [pdf, ps, other]

Study of $B^+ \to μ^+ ν_μ$ decays at Belle and Belle II

Authors: Belle, Belle II Collaborations, :, M. Abumusabh, I. Adachi, K. Adamczyk, A. Aggarwal, L. Aggarwal, H. Ahmed, Y. Ahn, H. Aihara, N. Akopov, S. Alghamdi, M. Alhakami, A. Aloisio, N. Althubiti, K. Amos, N. Anh Ky, C. Antonioli, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, V. Babu, H. Bae , et al. (391 additional authors not shown)

Abstract: We report a measurement of the branching fraction for the leptonic decay $B^+\toμ^+ν_μ$. This work presents the first $B^+\toμ^+ν_μ$ result using Belle~II data, an updated Belle measurement that supersedes the previous result, and their combination, which yields the most precise search to date. The analysis is based on $1076\,\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-ma… ▽ More We report a measurement of the branching fraction for the leptonic decay $B^+\toμ^+ν_μ$. This work presents the first $B^+\toμ^+ν_μ$ result using Belle~II data, an updated Belle measurement that supersedes the previous result, and their combination, which yields the most precise search to date. The analysis is based on $1076\,\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of $10.58\,\mathrm{GeV}$ with the Belle and Belle~II detectors at the KEKB and SuperKEKB colliders, respectively. We measure $\mathcal{B}(B^+\toμ^+ν_μ)=(4.4\pm1.9\pm 1.0)\times10^{-7}$, where the first uncertainty is statistical and the second systematic. The observed significance relative to the background-only hypothesis is 2.4 standard deviations. We set a 90\% confidence level upper limit of $\mathcal{B}(B^+\toμ^+ν_μ)<6.7\times10^{-7}$ using a frequentist approach and a 90\% credibility level upper limit of $\mathcal{B}(B^+\toμ^+ν_μ)<7.2\times 10^{-7}$ using a Bayesian approach. These are the most stringent limits to date. The result is interpreted as an exclusion region in the parameter space of type~II and type~III two-Higgs-doublet models. We search for stable sterile neutrinos with masses $m_N\in[0,1.5]\,\mathrm{GeV}$. No signal is observed, and the resulting exclusion on the squared mixing parameter $|U_{μN}|^2$ provides improvement over previous limits. We report a measurement of the partial branching fraction of semileptonic $B\to X_u\ellν_\ell$ decays with $p_μ^B>2.2\,\mathrm{GeV}$, obtaining $Δ\mathcal{B}(B\to X_u\ellν_\ell)=(2.72\pm0.05\pm0.29)\times10^{-4}$. We present a model-dependent study of weak annihilation decays using the muon momentum spectrum. We observe a signal of 2.4 standard deviations above the background-only hypothesis in regions where the distribution resembles that of $B\to X_u\ellν_\ell$ decays. △ Less

Submitted 10 February, 2026; originally announced February 2026.

Report number: Belle II Preprint: 2026-001 KEK Preprint: 2025-38

arXiv:2602.09017 [pdf, ps, other]

Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models

Authors: Zichen Jeff Cui, Omar Rayyan, Haritheja Etukuru, Bowen Tan, Zavier Andrianarivo, Zicheng Teng, Yihang Zhou, Krish Mehta, Nicholas Wojno, Kevin Yuanbo Wu, Manan H Anjaria, Ziyuan Wu, Manrong Mao, Guangxun Zhang, Binit Shah, Yejin Kim, Soumith Chintala, Lerrel Pinto, Nur Muhammad Mahi Shafiullah

Abstract: The prevalent paradigm in robot learning attempts to generalize across environments, embodiments, and tasks with language prompts at runtime. A fundamental tension limits this approach: language is often too abstract to guide the concrete physical understanding required for robust manipulation. In this work, we introduce Contact-Anchored Policies (CAP), which replace language conditioning with poi… ▽ More The prevalent paradigm in robot learning attempts to generalize across environments, embodiments, and tasks with language prompts at runtime. A fundamental tension limits this approach: language is often too abstract to guide the concrete physical understanding required for robust manipulation. In this work, we introduce Contact-Anchored Policies (CAP), which replace language conditioning with points of physical contact in space. Simultaneously, we structure CAP as a library of modular utility models rather than a monolithic generalist policy. This factorization allows us to implement a real-to-sim iteration cycle: we build EgoGym, a lightweight simulation benchmark, to rapidly identify failure modes and refine our models and datasets prior to real-world deployment. We show that by conditioning on contact and iterating via simulation, CAP generalizes to novel environments and embodiments out of the box on three fundamental manipulation skills while using only 23 hours of demonstration data, and outperforms large, state-of-the-art VLAs in zero-shot evaluations by 56%. All model checkpoints, codebase, hardware, simulation, and datasets will be open-sourced. Project page: https://cap-policy.github.io/ △ Less

Submitted 9 February, 2026; originally announced February 2026.

arXiv:2602.09012 [pdf, ps, other]

Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense

Authors: Jiacheng Liu, Yaxin Luo, Jiacheng Cui, Xinyi Shang, Xiaohan Zhao, Zhiqiang Shen

Abstract: The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles like "Bi… ▽ More The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles like "Bingo". In response, we introduce Next-Gen CAPTCHAs, a scalable defense framework designed to secure the next-generation web against the advanced agents. Unlike static datasets, our benchmark is built upon a robust data generation pipeline, allowing for large-scale and easily scalable evaluations, notably, for backend-supported types, our system is capable of generating effectively unbounded CAPTCHA instances. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents, offering a scalable and diverse defense mechanism for the agentic era. △ Less

Submitted 9 February, 2026; originally announced February 2026.

Comments: Project page at https://greenoso.github.io/NextGen-CAPTCHAs_webpage/

arXiv:2602.08803 [pdf, ps, other]

A melting mode of frozen sessile droplets with unmelted ice layer deposited at the bottom

Authors: Jiawang Cui, Yugang Zhao, Tianyou Wang, Zhizhao Che

Abstract: Water-repellent properties of superhydrophobic surfaces make them promising for anti-icing and deicing applications. Through experimental visualization of frozen sessile droplets undergoing melting on superhydrophobic surfaces, we identify a melting mode with the unmelted ice layer deposited at the bottom of the melting droplet, even though the density of ice is lower than that of water. In the de… ▽ More Water-repellent properties of superhydrophobic surfaces make them promising for anti-icing and deicing applications. Through experimental visualization of frozen sessile droplets undergoing melting on superhydrophobic surfaces, we identify a melting mode with the unmelted ice layer deposited at the bottom of the melting droplet, even though the density of ice is lower than that of water. In the deposited mode of the melting process, the time required for the frozen droplet to melt completely is much shorter than that in the floating mode. Force analysis shows that the melted fluid flows along the gas-liquid interface toward the top of the melting droplet, thereby exerting force and then suppressing the upward movement of the unmelted ice layer. Moreover, the flow within the liquid film formed between the unmelted ice layer and the heating wall is dominated by the viscous force, which has a lubrication effect and maintains the deposition of the unmelted ice layer. High heating temperature, large contact angle, and low particle concentration are helpful for the occurrence of the deposited mode. △ Less

Submitted 9 February, 2026; originally announced February 2026.

arXiv:2602.08100 [pdf, ps, other]

Emergent Search and Backtracking in Latent Reasoning Models

Authors: Jasmine Cui, Charles Ye

Abstract: What happens when a language model thinks without words? Standard reasoning LLMs verbalize intermediate steps as chain-of-thought; latent reasoning transformers (LRTs) instead perform deliberation entirely in continuous hidden space. We investigate an LRT, decoding the model's evolving beliefs at every step on a multiple-choice QA benchmark. We find that the model spontaneously learns a structured… ▽ More What happens when a language model thinks without words? Standard reasoning LLMs verbalize intermediate steps as chain-of-thought; latent reasoning transformers (LRTs) instead perform deliberation entirely in continuous hidden space. We investigate an LRT, decoding the model's evolving beliefs at every step on a multiple-choice QA benchmark. We find that the model spontaneously learns a structured search process in latent space. Deliberation follows a consistent trajectory: an exploration phase where probability mass spreads across candidates, tentative commitment to a frontrunner, and either convergence or backtracking. Backtracking is prevalent (32% of instances), beneficial (34% accuracy gain over non-backtracking instances), and predominantly directed away from the semantically closest distractor toward the correct answer. The search is adaptive: replacing distractors with implausible alternatives shortens exploration by 54%. Latent reasoning models achieve in activation space what chain-of-thought achieves through words: the ability to be wrong, notice, and recover. △ Less

Submitted 8 February, 2026; originally announced February 2026.

Showing 1–50 of 1,415 results for author: Cui, J