-
Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (728 additional authors not shown)
Abstract:
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$,…
▽ More
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$.
△ Less
Submitted 14 April, 2026; v1 submitted 14 April, 2026;
originally announced April 2026.
-
From Myopic Selection to Long-Horizon Awareness: Sequential LLM Routing for Multi-Turn Dialogue
Authors:
Jiarui Zhang,
Xiangyu Liu,
Yong Hu,
Chaoyue Niu,
Hang Zeng,
Shaojie Tang,
Fan Wu,
Guihai Chen
Abstract:
Multi-turn dialogue is the predominant form of interaction with large language models (LLMs). While LLM routing is effective in single-turn settings, existing methods fail to maximize cumulative performance in multi-turn dialogue due to interaction dynamics and delayed rewards. To address this challenge, we move from myopic, single-turn selection to long-horizon sequential routing for multi-turn d…
▽ More
Multi-turn dialogue is the predominant form of interaction with large language models (LLMs). While LLM routing is effective in single-turn settings, existing methods fail to maximize cumulative performance in multi-turn dialogue due to interaction dynamics and delayed rewards. To address this challenge, we move from myopic, single-turn selection to long-horizon sequential routing for multi-turn dialogue. Accordingly, we propose DialRouter, which first performs MCTS to explore dialogue branches induced by different LLM selections and collect trajectories with high cumulative rewards. DialRouter then learns a lightweight routing policy from search-derived data, augmented with retrieval-based future state approximation, enabling multi-turn routing without online search. Experiments on both open-domain and domain-specific dialogue tasks across diverse candidate sets of both open-source and closed-source LLMs demonstrate that DialRouter significantly outperforms single LLMs and existing routing baselines in task success rate, while achieving a superior performance-cost trade-off when combined with a cost-aware reward.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
21 cm Power Spectrum Analysis of North Celestial Pole Observations with the Tianlai Dish Pathfinder Array
Authors:
Guangzhi He,
Shifan Zuo,
Jixia Li,
Yichao Li,
Furen Deng,
Shijie Sun,
Reza Ansari,
Olivier Perdereau,
Peter Timbie,
Albert Stebbins,
Ayodeji Ibitoye,
Fengquan Wu,
Yougang Wang,
Xuelei Chen
Abstract:
The Tianlai Dish Pathfinder Array (TDPA) is a radio interferometer designed to test techniques for 21 cm intensity mapping in the post-reionization universe as a means of measuring large-scale cosmic structure. Using 9 nights of observations targeting the North Celestial Pole (NCP) field, totaling approximately 107 hours of integration time, we analyze data in the frequency range 700-800 MHz (corr…
▽ More
The Tianlai Dish Pathfinder Array (TDPA) is a radio interferometer designed to test techniques for 21 cm intensity mapping in the post-reionization universe as a means of measuring large-scale cosmic structure. Using 9 nights of observations targeting the North Celestial Pole (NCP) field, totaling approximately 107 hours of integration time, we analyze data in the frequency range 700-800 MHz (corresponding to redshift $z \sim 0.9$). We do the data format conversion, radio frequency interference (RFI) flagging, calibration, imaging and point source subtraction, and foreground removal via Singular Value Decomposition (SVD). The spherically averaged power spectrum $Δ^2(k)$ is obtained. This work successfully establishes and validates a comprehensive data analysis framework for the TDPA. We identify key improvements including sky model refinement, increased integration time, and pipeline optimization that will enable future detection of the 21 cm signal through auto-correlation and cross-correlation with optical galaxy surveys.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling
Authors:
Zikun Liu,
Liang Luo,
Qianru Li,
Zhengyu Zhang,
Wei Ling,
Jingyi Shen,
Zeliang Chen,
Yaning Huang,
Jingxian Huang,
Abdallah Aboelela,
Chonglin Sun,
Feifan Gu,
Fenggang Wu,
Hang Qu,
Huayu Li,
Jill Pan,
Kaidi Pei,
Laming Chen,
Longhao Jin,
Qin Huang,
Tongyi Tang,
Varna Puvvada,
Wenlin Chen,
Xiaohan Wei,
Xu Cao
, et al. (8 additional authors not shown)
Abstract:
Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Lat…
▽ More
Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding. SOLARIS proactively precomputes user-item interaction embeddings by predicting which user-item pairs are likely to appear in future requests, and asynchronously generating their foundation model representations ahead of time. This approach decouples the costly foundation model inference from the latency-critical serving path, enabling real-time knowledge transfer from models previously considered too expensive for online use. Deployed across Meta's advertising system serving billions of daily requests, SOLARIS achieves 0.67% revenue-driving top-line metrics gain, demonstrating its effectiveness at scale.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild
Authors:
Aleksandr Gushchin,
Khaled Abud,
Ekaterina Shumitskaya,
Artem Filippov,
Georgii Bychkov,
Sergey Lavrushkin,
Mikhail Erofeev,
Anastasia Antsiferova,
Changsheng Chen,
Shunquan Tan,
Radu Timofte,
Dmitry Vatolin,
Chuanbiao Song,
Zijian Yu,
Hao Tan,
Jun Lan,
Zhiqiang Yang,
Yongwei Tang,
Zhiqiang Wu,
Jia Wen Seow,
Hong Vin Koay,
Haodong Ren,
Feng Xu,
Shuai Chen,
Ruiyang Xia
, et al. (29 additional authors not shown)
Abstract:
This paper presents an overview of the NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild, held in conjunction with the NTIRE workshop at CVPR 2026. The goal of this challenge was to develop detection models capable of distinguishing real images from generated ones in realistic scenarios: the images are often transformed (cropped, resized, compressed, blurred) for practical us…
▽ More
This paper presents an overview of the NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild, held in conjunction with the NTIRE workshop at CVPR 2026. The goal of this challenge was to develop detection models capable of distinguishing real images from generated ones in realistic scenarios: the images are often transformed (cropped, resized, compressed, blurred) for practical usage, and therefore, the detection models should be robust to such transformations. The challenge is based on a novel dataset consisting of 108,750 real and 185,750 AI-generated images from 42 generators comprising a large variety of open-source and closed-source models of various architectures, augmented with 36 image transformations. Methods were evaluated using ROC AUC on the full test set, including both transformed and untransformed images. A total of 511 participants registered, with 20 teams submitting valid final solutions. This report provides a comprehensive overview of the challenge, describes the proposed solutions, and can be used as a valuable reference for researchers and practitioners in increasing the robustness of the detection models to real-world transformations.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues
Authors:
Haofu Yang,
Jiaji Liu,
Chen Huang,
Faguo Wu,
Wenqiang Lei,
See-Kiong Ng
Abstract:
Developing non-collaborative dialogue agents traditionally requires the manual, unscalable codification of expert strategies. We propose \ours, a method that leverages large language models to autonomously induce both strategy actions and planning logic directly from raw transcripts. METRO formalizes expert knowledge into a Strategy Forest, a hierarchical structure that captures both short-term re…
▽ More
Developing non-collaborative dialogue agents traditionally requires the manual, unscalable codification of expert strategies. We propose \ours, a method that leverages large language models to autonomously induce both strategy actions and planning logic directly from raw transcripts. METRO formalizes expert knowledge into a Strategy Forest, a hierarchical structure that captures both short-term responses (nodes) and long-term strategic foresight (branches). Experimental results across two benchmarks show that METRO demonstrates promising performance, outperforming existing methods by an average of 9%-10%. Our further analysis not only reveals the success behind METRO (strategic behavioral diversity and foresight), but also demonstrates its robust cross-task transferability. This offers new insights into building non-collaborative agents in a cost-effective and scalable way. Our code is available at https://github.com/Humphrey-0125/METRO.
△ Less
Submitted 16 April, 2026; v1 submitted 13 April, 2026;
originally announced April 2026.
-
Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (741 additional authors not shown)
Abstract:
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$,…
▽ More
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$, $\mathcal{B}(χ_{c1} \to π^{+}π^{-}π^{0}π^{0}) = (1.16 \pm 0.01 \pm 0.05) \times 10^{-2}$, and $\mathcal{B}(χ_{c2} \to π^{+}π^{-}π^{0}π^{0}) = (1.92 \pm 0.01 \pm 0.08) \times 10^{-2}$, where the first uncertainties are statistical and the second systematic. The dominant intermediate states are found to be $χ_{cJ}\toρ^+ρ^-$. These results supersede the previous most precise measurements and provide significantly improved precision.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
First Observation of \boldmath{$D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$} Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (734 additional authors not shown)
Abstract:
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measure…
▽ More
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measured to be $(3.20\pm0.06_{\text{stat.}}\pm0.03_{\text{syst.}})\times 10^{-3}$ and $(2.43 \pm 0.11_{\text{stat.}} \pm 0.04_{\text{syst.}}) \times 10^{-3}$, respectively. % , both achieving three times better precision than the current PDG values. The decay process $D^{+}\to a_0(980)^{+}f_0(500)$ is observed for the first time with an unexpectedly large branching fraction. Moreover, we observe the decays $D^+ \to a_0(980)^{+(0)} ρ(770)^{0(+)}$ and measure the ratio $r_{+/0} \equiv \frac{\mathcal{B}(D^+ \to a_0(980)^+ ρ(770)^0)}{\mathcal{B}(D^+ \to a_0(980)^0 ρ(770)^+)}$ for the first time to be $0.55\pm0.08_{\text{stat.}}\pm0.05_{\text{syst.}}$. These results offer a novel insight into our comprehension of the nature of the $a_0(980)$ and $f_0(500)$ states.
△ Less
Submitted 15 April, 2026; v1 submitted 11 April, 2026;
originally announced April 2026.
-
Brittle-to-ductile fracturing transition: A chemo-mechanical phase-field framework
Authors:
Fanyu Wu,
Chong Liu,
Manolis Veveakis,
Manman Hu
Abstract:
In chemically reactive environments, the mechanical integrity of geomaterials is fundamentally compromised by solid matrix dissolution. In this study, we propose a fully coupled chemo-mechanical phase-field framework to capture the dynamic interplay between mineral dissolution and fracture propagation. A key feature of the proposed model is the dynamic coupling of local mass removal to the fractur…
▽ More
In chemically reactive environments, the mechanical integrity of geomaterials is fundamentally compromised by solid matrix dissolution. In this study, we propose a fully coupled chemo-mechanical phase-field framework to capture the dynamic interplay between mineral dissolution and fracture propagation. A key feature of the proposed model is the dynamic coupling of local mass removal to the fracture length scale, while also incorporating the damage-accelerated reaction-diffusion processes. Our results capture the development of an enlarged fracture process zone driven by chemical mass removal. This chemically induced widening blunts the sharp crack tip, alleviating the near-tip stress concentrations and causing a pronounced degradation in material stiffness before failure. Furthermore, we reveal a distinct ductilization effect, characterized by a more gradual accumulation of damage and a delayed onset of macroscopic failure. We show that the transition between brittle and ductile failure modes is dictated by the competing timescales of chemical degradation and mechanical deformation. Highly acidic environments enhance matrix dissolution and promote ductile fracture, whereas rapid mechanical loading limits chemical interaction and preserves brittle failure mode.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Pressure-Induced Superconducting-like Transition in the $\it d$-wave Altermagnet Candidate CsV$_2$Se$_2$O
Authors:
Yuanzhe Li,
Yilin Han,
Liu Yang,
Wanli He,
Pengda Ye,
Wencheng Huang,
Jiabin Qiao,
Yuemei Li,
Xiaodong Sun,
Tingli He,
Jiayi Han,
Yuxiang Chen,
Ruifeng Tian,
Hao Sun,
Yuwei Liu,
Feng Wu,
Baoshan Song,
Zhengtai Liu,
Mao Ye,
Yaobo Huang,
Kenichi Ozawa,
Ji Dai,
Massimo Tallarida,
Shengtao Cui,
Jie Chen
, et al. (7 additional authors not shown)
Abstract:
Altermagnetism generates exchange-type spin splitting without net magnetization and, in its $\it d$-wave form, resembles the angular symmetry of unconventional $\it d$-wave superconductivity. Whether this correspondence bears directly on superconducting instabilities in real correlated materials remains open. Here we study the quasi-two-dimensional vanadium oxychalcogenide CsV$_2$Se$_2$O (CVSO), a…
▽ More
Altermagnetism generates exchange-type spin splitting without net magnetization and, in its $\it d$-wave form, resembles the angular symmetry of unconventional $\it d$-wave superconductivity. Whether this correspondence bears directly on superconducting instabilities in real correlated materials remains open. Here we study the quasi-two-dimensional vanadium oxychalcogenide CsV$_2$Se$_2$O (CVSO), a square-net $\it d$-wave altermagnet candidate, through combined experimental and theoretical investigation of its lattice structure, electronic structure and transport properties. At ambient pressure, CVSO is a weakly insulating parent state with a density-wave-like anomaly near 100 K, and its bulk properties are most consistent with a G-type compensated antiferromagnetic background. Under compression, the density-wave-like feature is suppressed, the magnetoresistance evolves from predominantly negative to positive, and a superconducting-like resistive downturn emerges below about 3 K. This low-temperature anomaly is reproducible across samples and pressure media, and is suppressed by magnetic field. Room-temperature X-ray diffraction reveals no symmetry lowering, whereas does show a pronounced compressibility anomaly over the same pressure range. CVSO thus reveals a pressure-tuned phase diagram in which a reconstructed weakly insulating parent state gives way to strange-metal-like transport and superconducting-like behavior, echoing broader phenomenology associated with unconventional superconductors, including cuprates and nickelates.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization
Authors:
Yuqin Lan,
Gen Li,
Yuanze Hu,
Weihao Shen,
Zhaoxin Fan,
Faguo Wu,
Xiao Zhang,
Laurence T. Yang,
Zhiming Zheng
Abstract:
Vision-Language Models (VLMs) are powerful but remain vulnerable to multimodal jailbreak attacks. Existing attacks mainly rely on either explicit visual prompt attacks or gradient-based adversarial optimization. While the former is easier to detect, the latter produces subtle perturbations that are less perceptible, but is usually optimized and evaluated under homogeneous open-source surrogate-tar…
▽ More
Vision-Language Models (VLMs) are powerful but remain vulnerable to multimodal jailbreak attacks. Existing attacks mainly rely on either explicit visual prompt attacks or gradient-based adversarial optimization. While the former is easier to detect, the latter produces subtle perturbations that are less perceptible, but is usually optimized and evaluated under homogeneous open-source surrogate-target settings, leaving its effectiveness on commercial closed-source VLMs under heterogeneous settings unclear. To examine this issue, we study different surrogate-target settings and observe a consistent gap between homogeneous and heterogeneous settings, a phenomenon we term surrogate dependency. Motivated by this finding, we propose Mosaic, a Multi-view ensemble optimization framework for multimodal jailbreak against closed-source VLMs, which alleviates surrogate dependency under heterogeneous surrogate-target settings by reducing over-reliance on any single surrogate model and visual view. Specifically, Mosaic incorporates three core components: a Text-Side Transformation module, which perturbs refusal-sensitive lexical patterns; a Multi-View Image Optimization module, which updates perturbations under diverse cropped views to avoid overfitting to a single visual view; and a Surrogate Ensemble Guidance module, which aggregates optimization signals from multiple surrogate VLMs to reduce surrogate-specific bias. Extensive experiments on safety benchmarks demonstrate that Mosaic achieves state-of-the-art Attack Success Rate and Average Toxicity against commercial closed-source VLMs.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
A Closer Look at the Application of Causal Inference in Graph Representation Learning
Authors:
Hang Gao,
Kunyu Li,
Huang Hong,
Baoquan Cui,
Fengge Wu
Abstract:
Modeling causal relationships in graph representation learning remains a fundamental challenge. Existing approaches often draw on theories and methods from causal inference to identify causal subgraphs or mitigate confounders. However, due to the inherent complexity of graph-structured data, these approaches frequently aggregate diverse graph elements into single causal variables, an operation tha…
▽ More
Modeling causal relationships in graph representation learning remains a fundamental challenge. Existing approaches often draw on theories and methods from causal inference to identify causal subgraphs or mitigate confounders. However, due to the inherent complexity of graph-structured data, these approaches frequently aggregate diverse graph elements into single causal variables, an operation that risks violating the core assumptions of causal inference. In this work, we prove that such aggregation compromises causal validity. Building on this conclusion, we propose a theoretical model grounded in the smallest indivisible units of graph data to ensure that the causal validity is guaranteed. With this model, we further analyze the costs of achieving precise causal modeling in graph representation learning and identify the conditions under which the problem can be simplified. To empirically support our theory, we construct a controllable synthetic dataset that reflects realworld causal structures and conduct extensive experiments for validation. Finally, we develop a causal modeling enhancement module that can be seamlessly integrated into existing graph learning pipelines, and we demonstrate its effectiveness through comprehensive comparative experiments.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
From Business Events to Auditable Decisions: Ontology-Governed Graph Simulation for Enterprise AI
Authors:
Hongyin Zhu,
Jinming Liang,
Mengjun Hou,
Ruifan Tang,
Xianbin Zhu,
Jingyuan Yang,
Yuanman Mao,
Feng Wu
Abstract:
Existing LLM-based agent systems share a common architectural failure: they answer from the unrestricted knowledge space without first simulating how active business scenarios reshape that space for the event at hand -- producing decisions that are fluent but ungrounded and carrying no audit trail. We present LOM-action, which equips enterprise AI with \emph{event-driven ontology simulation}: busi…
▽ More
Existing LLM-based agent systems share a common architectural failure: they answer from the unrestricted knowledge space without first simulating how active business scenarios reshape that space for the event at hand -- producing decisions that are fluent but ungrounded and carrying no audit trail. We present LOM-action, which equips enterprise AI with \emph{event-driven ontology simulation}: business events trigger scenario conditions encoded in the enterprise ontology~(EO), which drive deterministic graph mutations in an isolated sandbox, evolving a working copy of the subgraph into the scenario-valid simulation graph $G_{\text{sim}}$; all decisions are derived exclusively from this evolved graph. The core pipeline is \emph{event $\to$ simulation $\to$ decision}, realized through a dual-mode architecture -- \emph{skill mode} and \emph{reasoning mode}. Every decision produces a fully traceable audit log. LOM-action achieves 93.82% accuracy and 98.74% tool-chain F1 against frontier baselines Doubao-1.8 and DeepSeek-V3.2, which reach only 24--36% F1 despite 80% accuracy -- exposing the \emph{illusive accuracy} phenomenon. The four-fold F1 advantage confirms that ontology-governed, event-driven simulation, not model scale, is the architectural prerequisite for trustworthy enterprise decision intelligence.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Engineering Ferrimagnetic Interactions in Molecular Quantum Systems
Authors:
Elia Turco,
Fupeng Wu,
Annika Bernhardt,
Nils Krane,
Ji Ma,
Roman Fasel,
Michal Juriček,
Xinliang Feng,
Pascal Ruffieux
Abstract:
Achieving long-range ferrimagnetic order in purely organic systems remains a major challenge in molecular magnetism. Here we report the synthesis and characterization of heterospin-coupling motifs, formed by covalently linking spin-1/2 and spin-1 triangular nanographenes. A combined solution-phase and on-surface synthetic strategy yields three distinct compounds, whose structures are elucidated by…
▽ More
Achieving long-range ferrimagnetic order in purely organic systems remains a major challenge in molecular magnetism. Here we report the synthesis and characterization of heterospin-coupling motifs, formed by covalently linking spin-1/2 and spin-1 triangular nanographenes. A combined solution-phase and on-surface synthetic strategy yields three distinct compounds, whose structures are elucidated by bond-resolved scanning probe microscopy. Starting from a spin-1/2--spin-1 dimer as the elemental ferrimagnetic unit, we employ inelastic electron tunneling spectroscopy to resolve low-energy magnetic excitations and extract the parameters of the Heisenberg Hamiltonian. Extension to trimeric architectures results in two distinct spin configurations, with compensated ($S=0$) and uncompensated ($S=3/2$) ferrimagnetic ground states. The Heisenberg model accurately describes all magnetic transitions, offering direct insight into increasingly complex spin Hamiltonians. These findings establish a molecular platform for designing tunable heterospin systems with robust exchange interactions, opening routes toward multi-level spin encoding in qudit-based quantum technologies.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
Authors:
Xuezhen Tu,
Jingyu Wu,
Fangyu Kang,
Qingpeng Nong,
Kaijin Zhang,
Chaoyue Niu,
Fan Wu
Abstract:
Spatio-Temporal Video Grounding requires jointly localizing target objects across both temporal and spatial dimensions based on natural language queries, posing fundamental challenges for existing Multimodal Large Language Models (MLLMs). We identify two core challenges: \textit{entangled spatio-temporal alignment}, arising from coupling two heterogeneous sub-tasks within the same autoregressive o…
▽ More
Spatio-Temporal Video Grounding requires jointly localizing target objects across both temporal and spatial dimensions based on natural language queries, posing fundamental challenges for existing Multimodal Large Language Models (MLLMs). We identify two core challenges: \textit{entangled spatio-temporal alignment}, arising from coupling two heterogeneous sub-tasks within the same autoregressive output space, and \textit{dual-domain visual token redundancy}, where target objects exhibit simultaneous temporal and spatial sparsity, rendering the overwhelming majority of visual tokens irrelevant to the grounding query. To address these, we propose \textbf{Bridge-STG}, an end-to-end framework that decouples temporal and spatial localization while maintaining semantic coherence. While decoupling is the natural solution to this entanglement, it risks creating a semantic gap between the temporal MLLM and the spatial decoder. Bridge-STG resolves this through two pivotal designs: the \textbf{Spatio-Temporal Semantic Bridging (STSB)} mechanism with Explicit Temporal Alignment (ETA) distills the MLLM's temporal reasoning context into enriched bridging queries as a robust semantic interface; and the \textbf{Query-Guided Spatial Localization (QGSL)} module leverages these queries to drive a purpose-built spatial decoder with multi-layer interactive queries and positive/negative frame sampling, jointly eliminating dual-domain visual token redundancy. Extensive experiments across multiple benchmarks demonstrate that Bridge-STG achieves state-of-the-art performance among MLLM-based methods. Bridge-STG improves average m\_vIoU from $26.4$ to $34.3$ on VidSTG and demonstrates strong cross-task transfer across various fine-grained video understanding tasks under a unified multi-task training regime.
△ Less
Submitted 13 April, 2026; v1 submitted 9 April, 2026;
originally announced April 2026.
-
WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report
Authors:
Yiquan Wu,
Yuhang Liu,
Yifei Liu,
Ang Li,
Siying Zhou,
Kun Kuang,
Fei Wu
Abstract:
Large language models have demonstrated remarkable capabilities across a wide range of natural language processing tasks, yet their application in the legal domain remains challenging due to the specialized terminology, complex reasoning requirements, and rapidly evolving legal knowledge involved. In this paper, we present WisdomInterrogatory (LuWen), an open-source Chinese legal language model bu…
▽ More
Large language models have demonstrated remarkable capabilities across a wide range of natural language processing tasks, yet their application in the legal domain remains challenging due to the specialized terminology, complex reasoning requirements, and rapidly evolving legal knowledge involved. In this paper, we present WisdomInterrogatory (LuWen), an open-source Chinese legal language model built upon the Baichuan foundation model through three key techniques: continual pre-training on a large-scale legal corpus, supervised fine-tuning with carefully curated legal instruction data, and retrieval-augmented generation integrated with a comprehensive legal knowledge base. We evaluate LuWen on five representative legal tasks spanning both prediction and generation settings, including legal judgment prediction, judicial examination, legal text summarization, law article question answering, and judicial decision reasoning. Experimental results show that LuWen outperforms several strong baselines, demonstrating the effectiveness of our approach in adapting general-purpose language models to the legal domain.
△ Less
Submitted 10 April, 2026; v1 submitted 8 April, 2026;
originally announced April 2026.
-
Automating Database-Native Function Code Synthesis with LLMs
Authors:
Wei Zhou,
Xuanhe Zhou,
Qikang He,
Guoliang Li,
Bingsheng He,
Quanqing Xu,
Fan Wu
Abstract:
Database systems incorporate an ever-growing number of functions in their kernels (a.k.a., database native functions) for scenarios like new application support and business migration. This growth causes an urgent demand for automatic database native function synthesis. While recent advances in LLM-based code generation (e.g., Claude Code) show promise, they are too generic for database-specific d…
▽ More
Database systems incorporate an ever-growing number of functions in their kernels (a.k.a., database native functions) for scenarios like new application support and business migration. This growth causes an urgent demand for automatic database native function synthesis. While recent advances in LLM-based code generation (e.g., Claude Code) show promise, they are too generic for database-specific development. They often hallucinate or overlook critical context because database function synthesis is inherently complex and error-prone, where synthesizing a single function may involve registering multiple function units, linking internal references, and implementing logic correctly. To this end, we propose DBCooker, an LLM-based system for automatically synthesizing database native functions. It consists of three components. First, the function characterization module aggregates multi-source declarations, identifies function units that require specialized coding, and traces cross-unit dependencies. Second, we design operations to address the main synthesis challenges: (1) a pseudo-code-based coding plan generator that constructs structured implementation skeletons by identifying key elements such as reusable referenced functions; (2) a hybrid fill-in-the-blank model guided by probabilistic priors and component awareness to integrate core logic with reusable routines; and (3) three-level progressive validation, including syntax checking, standards compliance, and LLM-guided semantic verification. Finally, an adaptive orchestration strategy unifies these operations with existing tools and dynamically sequences them via the orchestration history of similar functions. Results show that DBCooker outperforms other methods on SQLite, PostgreSQL, and DuckDB (34.55% higher accuracy on average), and can synthesize new functions absent in the latest SQLite (v3.50).
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures
Authors:
Fan Zhang,
Mingzi Song,
Rania Elbadry,
Yankai Chen,
Shaobo Wang,
Yixi Zhou,
Xunwen Zheng,
Yueru He,
Yuyang Dai,
Georgi Georgiev,
Ayesha Gull,
Muhammad Usman Safder,
Fan Wu,
Liyuan Meng,
Fengxian Ji,
Junning Zhao,
Xueqing Peng,
Jimin Huang,
Yu Chen,
Xue,
Liu,
Preslav Nakov,
Zhuohan Xie
Abstract:
Financial reporting systems increasingly use large language models (LLMs) to extract and summarize corporate disclosures. However, most assume a single-market setting and do not address structural differences across jurisdictions. Variations in accounting taxonomies, tagging infrastructures (e.g., XBRL vs. PDF), and aggregation conventions make cross-jurisdiction reporting a semantic alignment and…
▽ More
Financial reporting systems increasingly use large language models (LLMs) to extract and summarize corporate disclosures. However, most assume a single-market setting and do not address structural differences across jurisdictions. Variations in accounting taxonomies, tagging infrastructures (e.g., XBRL vs. PDF), and aggregation conventions make cross-jurisdiction reporting a semantic alignment and verification challenge. We present FinReporting, an agentic workflow for localized cross-jurisdiction financial reporting. The system builds a unified canonical ontology over Income Statement, Balance Sheet, and Cash Flow, and decomposes reporting into auditable stages including filing acquisition, extraction, canonical mapping, and anomaly logging. Rather than using LLMs as free-form generators, FinReporting deploys them as constrained verifiers under explicit decision rules and evidence grounding. Evaluated on annual filings from the US, Japan, and China, the system improves consistency and reliability under heterogeneous reporting regimes. We release an interactive demo supporting cross-market inspection and structured export of localized financial statements. Our demo is available at https://huggingface.co/spaces/BoomQ/FinReporting-Demo . The video describing our system is available at https://www.youtube.com/watch?v=f65jdEL31Kk
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Precise measurement of the CKM angle $γ$ with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from…
▽ More
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from ${B^{\pm}\rightarrow D(\rightarrow K_{\rm S}^{0} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays in LHCb data, where $h^{(\prime)}$ is either a pion or kaon, while the corresponding strong-phase parameters are measured using doubly tagged ${D\rightarrow K_{\rm S/L}^0 h^{\prime+} h^{\prime-}}$ decays in the quantum-correlated $D\overline{D}$ system present in BESIII data. A joint fit to both datasets, which allows for a simultaneous determination of the associated $C\!P$-violating observables and strong-phase parameters, yields ${γ= (71.3\pm 5.0)^{\circ}}$. The result is the most precise to date and consistent with previous measurements and world averages.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider…
▽ More
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider II during 2010--2011 and 2021--2022, corresponding to an integrated luminosity of 8 fb$^{-1}$, and proton-proton collisions collected by the LHCb experiment at the Large Hadron Collider during 2011--2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. The two datasets are analyzed simultaneously by applying per-event weights based on the amplitude variation over the $D$-decay phase space to enhance the sensitivity to $C\!P$-violating observables. The CKM angle $γ$ is determined to be $γ= (71.3\pm 5.0)^{\circ}$, which constitutes the most precise single measurement to date.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
Authors:
Bin Wang,
Tianyao He,
Linke Ouyang,
Fan Wu,
Zhiyuan Zhao,
Tao Chu,
Yuan Qu,
Zhenjiang Jin,
Weijun Zeng,
Ziyang Miao,
Bangrui Xu,
Junbo Niu,
Mengzhang Cai,
Jiantao Qiu,
Qintong Zhang,
Dongsheng Ma,
Yuefeng Sun,
Hejun Dong,
Wenzheng Zhang,
Jutao Xiao,
Jiayong Shi,
Pengyu Liao,
Xiaomeng Zhao,
Huaping Zhong,
Liqun Wei
, et al. (18 additional authors not shown)
Abstract:
Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training…
▽ More
Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than from architectural differences. Building on this finding, we present MinerU2.5-Pro, which advances the state of the art purely through data engineering and training strategy design while retaining the 1.2B-parameter architecture of MinerU2.5 unchanged. At its core is a Data Engine co-designed around coverage, informativeness, and annotation accuracy: Diversity-and-Difficulty-Aware Sampling expands training data from under 10M to 65.5M samples while mitigating distribution shift; Cross-Model Consistency Verification leverages output consensus among heterogeneous models to assess sample difficulty and generate reliable annotations; the Judge-and-Refine pipeline improves annotation quality for hard samples through render-then-verify iterative correction. A three-stage progressive training strategy--large-scale pre-training, hard sample fine-tuning, and GRPO alignment--sequentially exploits these data at different quality tiers. On the evaluation front, we rectify element-matching biases in OmniDocBench v1.5 and introduce a Hard subset, establishing the more discriminative OmniDocBench v1.6 protocol. Without any architectural modification, MinerU2.5-Pro achieves 95.69 on OmniDocBench v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods, including those based on models with over 200x more parameters.
△ Less
Submitted 9 April, 2026; v1 submitted 6 April, 2026;
originally announced April 2026.
-
CPT: Controllable and Editable Design Variations with Language Models
Authors:
Karthik Suresh,
Amine Ben Khalifa,
Li Zhang,
Wei-ting Hsu,
Fangzheng Wu,
Vinay More,
Asim Kadav
Abstract:
Designing visually diverse and high-quality designs remains a manual, time-consuming process, limiting scalability and personalization in creative workflows. We present a system for generating editable design variations using a decoder-only language model, the Creative Pre-trained Transformer (CPT), trained to predict visual style attributes in design templates. At the core of our approach is a ne…
▽ More
Designing visually diverse and high-quality designs remains a manual, time-consuming process, limiting scalability and personalization in creative workflows. We present a system for generating editable design variations using a decoder-only language model, the Creative Pre-trained Transformer (CPT), trained to predict visual style attributes in design templates. At the core of our approach is a new representation called Creative Markup Language (CML), a compact, machine-learning-friendly format that captures canvas-level structure, page layout, and element-level details (text, images, and vector graphics), including both content and style. We fine-tune CPT on a large corpus of design templates authored by professional designers, enabling it to learn meaningful, context-aware predictions for attributes such as color schemes and font choices. The model produces semantically structured and stylistically coherent outputs, preserving internal consistency across elements. Unlike generative image models, our system yields fully editable design documents rather than pixel-only images, allowing users to iterate and personalize within a design editor. In experiments, our approach generates contextual color and font variations for existing templates and shows promise in adjusting layouts while maintaining design principles.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
LOGER: Local--Global Ensemble for Robust Deepfake Detection in the Wild
Authors:
Fei Wu,
Dagong Lu,
Mufeng Yao,
Xinlei Xu,
Fengjun Guo
Abstract:
Robust deepfake detection in the wild remains challenging due to the ever-growing variety of manipulation techniques and uncontrolled real-world degradations. Forensic cues for deepfake detection reside at two complementary levels: global-level anomalies in semantics and statistics that require holistic image understanding, and local-level forgery traces concentrated in manipulated regions that ar…
▽ More
Robust deepfake detection in the wild remains challenging due to the ever-growing variety of manipulation techniques and uncontrolled real-world degradations. Forensic cues for deepfake detection reside at two complementary levels: global-level anomalies in semantics and statistics that require holistic image understanding, and local-level forgery traces concentrated in manipulated regions that are easily diluted by global averaging. Since no single backbone or input scale can effectively cover both levels, we propose LOGER, a LOcal--Global Ensemble framework for Robust deepfake detection. The global branch employs heterogeneous vision foundation model backbones at multiple resolutions to capture holistic anomalies with diverse visual priors. The local branch performs patch-level modeling with a Multiple Instance Learning top-$k$ aggregation strategy that selectively pools only the most suspicious regions, mitigating evidence dilution caused by the dominance of normal patches; dual-level supervision at both the aggregated image level and individual patch level keeps local responses discriminative. Because the two branches differ in both granularity and backbone, their errors are largely decorrelated, a property that logit-space fusion exploits for more robust prediction. LOGER achieves 2nd place in the NTIRE 2026 Robust Deepfake Detection Challenge, and further evaluation on multiple public benchmarks confirms its strong robustness and generalization across diverse manipulation methods and real-world degradation conditions.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
HEDGE: Heterogeneous Ensemble for Detection of AI-GEnerated Images in the Wild
Authors:
Fei Wu,
Dagong Lu,
Mufeng Yao,
Xinlei Xu,
Fengjun Guo
Abstract:
Robust detection of AI-generated images in the wild remains challenging due to the rapid evolution of generative models and varied real-world distortions. We argue that relying on a single training regime, resolution, or backbone is insufficient to handle all conditions, and that structured heterogeneity across these dimensions is essential for robust detection. To this end, we propose HEDGE, a He…
▽ More
Robust detection of AI-generated images in the wild remains challenging due to the rapid evolution of generative models and varied real-world distortions. We argue that relying on a single training regime, resolution, or backbone is insufficient to handle all conditions, and that structured heterogeneity across these dimensions is essential for robust detection. To this end, we propose HEDGE, a Heterogeneous Ensemble for Detection of AI-GEnerated images, that introduces complementary detection routes along three axes: diverse training data with strong augmentation, multi-scale feature extraction, and backbone heterogeneity. Specifically, Route~A progressively constructs DINOv3-based detectors through staged data expansion and augmentation escalation, Route~B incorporates a higher-resolution branch for fine-grained forensic cues, and Route~C adds a MetaCLIP2-based branch for backbone diversity. All outputs are fused via logit-space weighted averaging, refined by a lightweight dual-gating mechanism that handles branch-level outliers and majority-dominated fusion errors. HEDGE achieves 4th place in the NTIRE 2026 Robust AI-Generated Image Detection in the Wild Challenge and attains state-of-the-art performance with strong robustness on multiple AIGC image detection benchmarks.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing
Authors:
Zhuohang Bian,
Feiyang Wu,
Chengrui Zhang,
Hangcheng Dong,
Yun Liang,
Youwei Zhuo
Abstract:
Multi-agent LLM applications organize execution in synchronized rounds where a central scheduler gathers outputs from all agents and redistributes the combined context. This All-Gather communication pattern creates massive KV Cache redundancy, because every agent's prompt contains the same shared output blocks, yet existing reuse methods fail to exploit it efficiently. We present TokenDance, a sys…
▽ More
Multi-agent LLM applications organize execution in synchronized rounds where a central scheduler gathers outputs from all agents and redistributes the combined context. This All-Gather communication pattern creates massive KV Cache redundancy, because every agent's prompt contains the same shared output blocks, yet existing reuse methods fail to exploit it efficiently. We present TokenDance, a system that scales the number of concurrent agents by exploiting the All-Gather pattern for collective KV Cache sharing. TokenDance's KV Collector performs KV Cache reuse over the full round in one collective step, so the cost of reusing a shared block is paid once regardless of agent count. Its Diff-Aware Storage encodes sibling caches as block-sparse diffs against a single master copy, achieving 11-17x compression on representative workloads. Evaluation on GenerativeAgents and AgentSociety shows that TokenDance supports up to 2.7x more concurrent agents than vLLM with prefix caching under SLO requirement, reduces per-agent KV Cache storage by up to 17.5x, and achieves up to 1.9x prefill speedup over per-request position-independent caching.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
Rethinking Representations for Cross-Domain Infrared Small Target Detection: A Generalizable Perspective from the Frequency Domain
Authors:
Yimin Fu,
Songbo Wang,
Feiyan Wu,
Jialin Lyu,
Zhunga Liu,
Michael K. Ng
Abstract:
The accurate target-background separation in infrared small target detection (IRSTD) highly depends on the discriminability of extracted representations. However, most existing methods are confined to domain-consistent settings, while overlooking whether such discriminability can generalize to unseen domains. In practice, distribution shifts between training and testing data are inevitable due to…
▽ More
The accurate target-background separation in infrared small target detection (IRSTD) highly depends on the discriminability of extracted representations. However, most existing methods are confined to domain-consistent settings, while overlooking whether such discriminability can generalize to unseen domains. In practice, distribution shifts between training and testing data are inevitable due to variations in observational conditions and environmental factors. Meanwhile, the intrinsic indistinctiveness of infrared small targets aggravates overfitting to domain-specific patterns. Consequently, the detection performance of models trained on source domains can be severely degraded when deployed in unseen domains. To address this challenge, we propose a spatial-spectral collaborative perception network (S$^2$CPNet) for cross-domain IRSTD. Moving beyond conventional spatial learning pipelines, we rethink IRSTD representations from a frequency perspective and reveal inconsistencies in spectral phase as the primary manifestation of domain discrepancies. Based on this insight, we develop a phase rectification module (PRM) to derive generalizable target awareness. Then, we employ an orthogonal attention mechanism (OAM) in skip connections to preserve positional information while refining informative representations. Moreover, the bias toward domain-specific patterns is further mitigated through selective style recomposition (SSR). Extensive experiments have been conducted on three IRSTD datasets, and the proposed method consistently achieves state-of-the-art performance under diverse cross-domain settings.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
Vocal Prognostic Digital Biomarkers in Monitoring Chronic Heart Failure: A Longitudinal Observational Study
Authors:
Fan Wu,
Matthias P. Nägele,
Daryush D. Mehta,
Elgar Fleisch,
Frank Ruschitzka,
Andreas J. Flammer,
Filipe Barata
Abstract:
Objective: This study aimed to evaluate which voice features can predict health deterioration in patients with chronic HF.
Background: Heart failure (HF) is a chronic condition with progressive deterioration and acute decompensations, often requiring hospitalization and imposing substantial healthcare and economic burdens. Current standard-of-care (SoC) home monitoring, such as weight tracking,…
▽ More
Objective: This study aimed to evaluate which voice features can predict health deterioration in patients with chronic HF.
Background: Heart failure (HF) is a chronic condition with progressive deterioration and acute decompensations, often requiring hospitalization and imposing substantial healthcare and economic burdens. Current standard-of-care (SoC) home monitoring, such as weight tracking, lacks predictive accuracy and requires high patient engagement. Voice is a promising non-invasive biomarker, though prior studies have mainly focused on acute HF stages.
Methods: In a 2-month longitudinal study, 32 patients with HF collected daily voice recordings and SoC measures of weight and blood pressure at home, with biweekly questionnaires for health status. Acoustic analysis generated detailed vowel and speech features. Time-series features were extracted from aggregated lookback windows (e.g., 7 days) to predict next-day health status. Explainable machine learning with nested cross-validation identified top vocal biomarkers, and a case study illustrated model application.
Results: A total of 21,863 recordings were analyzed. Acoustic vowel features showed strong correlations with health status. Time-series voice features within the lookback window outperformed corresponding standard care measures, achieving peak sensitivity and specificity of 0.826 and 0.782 versus 0.783 and 0.567 for SoC metrics. Key prognostic voice features identifying deterioration included delayed energy shift, low energy variability, and higher shimmer variability in vowels, along with reduced speaking and articulation rate, lower phonation ratio, decreased voice quality, and increased formant variability in speech.
Conclusion: Voice-based monitoring offers a non-invasive approach to detect early health changes in chronic HF, supporting proactive and personalized care.
△ Less
Submitted 31 March, 2026;
originally announced April 2026.
-
First energy scan measurement of $e^{+}e^{-}\to K^{+}K^{-}$ around the $ψ(2S)$ resonance
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (683 additional authors not shown)
Abstract:
We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and el…
▽ More
We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and electromagnetic amplitudes of the $ψ(2S)$ resonance, a fundamental parameter in charmonium physics, based on the assumption that the relative phase between the electromagnetic amplitude of the $ψ(2S)$ resonance and the continuum is zero. Two distinct solutions for the branching fraction $\mathcal{B}$ of $ψ(2S)\to K^{+}K^{-}$ are observed: a constructive interference solution with $\mathcal{B}=(7.49\pm0.41)\times10^{-5}$ and $Φ=(110.1 \pm6.7)^\circ$, and a destructive interference solution with $\mathcal{B}=(10.94\pm0.48)\times10^{-5}$ and $Φ=(-106.8\pm5.7)^\circ$. A significant correlation between $Φ$ and $\mathcal{B}$ is established, demonstrating that interference effects must be taken into account in the $ψ(2S)$ branching fraction measurements. Additionally, the first results for both the $ψ(2S)$ strong form factor, which characterizes the strong coupling between $ψ(2S)$ and $K^{+}K^{-}$, and the energy-dependent electromagnetic form factor of the charged kaon in this energy region are here reported.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
SesQ: A Surface Electrostatic Simulator for Precise Energy Participation Ratio Simulation in Superconducting Qubits
Authors:
Ziang Wang,
Shuyuan Guan,
Feng Wu,
Xiaohang Zhang,
Qiong Li,
Jianxin Chen,
Xin Wan,
Tian Xia,
Hui-Hai Zhao
Abstract:
An accurate and efficient numerical electromagnetic model for superconducting qubits is essential for characterizing and minimizing design-dependent dielectric losses. The energy participation ratio (EPR) is the commonly adopted metric used to evaluate these losses, but its calculation presents a severe multiscale computational challenge. Conventional finite element method (FEM) requires 3D volume…
▽ More
An accurate and efficient numerical electromagnetic model for superconducting qubits is essential for characterizing and minimizing design-dependent dielectric losses. The energy participation ratio (EPR) is the commonly adopted metric used to evaluate these losses, but its calculation presents a severe multiscale computational challenge. Conventional finite element method (FEM) requires 3D volumetric meshing, leading to prohibitive computational costs and memory requirements when attempting to capture singular electric fields at nanometer-thin material interfaces. To address this bottleneck, we propose SesQ, a surface integral equation simulator tailored for the precise simulation of the EPR. By applying discretization on 2D surfaces, deriving a semi-analytical multilayer Green's function, and employing a dedicated non-conformal boundary mesh refinement scheme, SesQ accurately resolves singular edge fields without an explosive growth in the number of unknowns. Validations with analytically solvable models demonstrate that SesQ accelerates capacitance extraction by roughly two orders of magnitude compared to commercial FEM tools. While achieving comparable accuracy for capacitance extraction, SesQ delivers superior precision for EPR calculation. Simulations of practical transmon qubits further reveal that FEM approaches tend to significantly underestimate the EPR. Finally, the high efficiency of SesQ enables rapid iteration in the layout optimization, as demonstrated by minimizing the EPR of the qubit pattern, establishing the simulator as a powerful tool for the automated design of low-loss superconducting quantum circuits.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Generalizable Detection of AI Generated Images with Large Models and Fuzzy Decision Tree
Authors:
Fei Wu,
Guanghao Ding,
Zijian Niu,
Zhenrui Wang,
Lei Yang,
Zhuosheng Zhang,
Shilin Wang
Abstract:
The malicious use and widespread dissemination of AI-generated images pose a serious threat to the authenticity of digital content. Existing detection methods exploit low-level artifacts left by common manipulation steps within the generation pipeline, but they often lack generalization due to model-specific overfitting. Recently, researchers have resorted to Multimodal Large Language Models (MLLM…
▽ More
The malicious use and widespread dissemination of AI-generated images pose a serious threat to the authenticity of digital content. Existing detection methods exploit low-level artifacts left by common manipulation steps within the generation pipeline, but they often lack generalization due to model-specific overfitting. Recently, researchers have resorted to Multimodal Large Language Models (MLLMs) for AIGC detection, leveraging their high-level semantic reasoning and broad generalization capabilities. While promising, MLLMs lack the fine-grained perceptual sensitivity to subtle generation artifacts, making them inadequate as standalone detectors. To address this issue, we propose a novel AI-generated image detection framework that synergistically integrates lightweight artifact-aware detectors with MLLMs via a fuzzy decision tree. The decision tree treats the outputs of basic detectors as fuzzy membership values, enabling adaptive fusion of complementary cues from semantic and perceptual perspectives. Extensive experiments demonstrate that the proposed method achieves state-of-the-art accuracy and strong generalization across diverse generative models.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Observation of $Λ^+_c\to nπ^+η$ and search for $Λ^+_c\to na_0(980)^+$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (722 additional authors not shown)
Abstract:
By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be…
▽ More
By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be $0.155\pm0.031_{\rm stat.}\pm0.012_{\rm syst.}$ Taking the world average of $\mathcal{B}(Λ_c^+\to Λπ^+η)$ as reference, the absolute branching fraction is calculated to be $\mathcal{B}(Λ_c^+\to nπ^+η)=(2.94\pm0.59_{\rm stat.}\pm0.23_{\rm syst.}\pm0.13_{\rm ref.})\times10^{-3}$. The intermediate process $Λ_c^+\to na_0(980)^+$ is also searched for in the $π^+η$ invariant mass spectrum. Since no significant signal is found, the upper limit on $\mathcal{B}(Λ_c^+\to na_0(980)^+)\times\mathcal{B}(a_0(980)^+\toπ^+η)$ is set to $8.4\times10^{-4}$ at 90\% confidence level. A sophisticated deep learning approach using a Transformer-based architecture is employed to distinguish signals from prevalent hadronic backgrounds, complemented by thorough validation and systematic uncertainty quantification.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Optimization of Laser Irradiation Uniformity for the Double-Cone Ignition Scheme with MULTI-3D simulations
Authors:
Yicheng Wang,
Yiwen Yang,
Fuyuan Wu,
Yuhan Wang,
Rafael Ramis,
Jie Zhang
Abstract:
The double-cone ignition (DCI) scheme holds a promising perspective for laser driven fusion energy and astrophysics. However, optimizing the laser irradiation uniformity under the constraints of limited laser beams and a given cone angle remains to be explored. We utilized the three-dimensional radiation hydrodynamics program MULTI-3D to simulate the interaction process between the laser and plasm…
▽ More
The double-cone ignition (DCI) scheme holds a promising perspective for laser driven fusion energy and astrophysics. However, optimizing the laser irradiation uniformity under the constraints of limited laser beams and a given cone angle remains to be explored. We utilized the three-dimensional radiation hydrodynamics program MULTI-3D to simulate the interaction process between the laser and plasma shell. By employing Bayesian optimization for the pointing position of the incident laser beams, we achieved a laser irradiation scheme with nonuniformity less than 5%. This study can provide references for experiments and offer valuable insights for other laser fusion schemes.
△ Less
Submitted 28 March, 2026;
originally announced March 2026.
-
ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation
Authors:
Feng Wu,
Wei Zuo,
Wenliang Yang,
Jun Xiao,
Yang Liu,
Xinhua Zeng
Abstract:
Zero-shot object navigation requires agents to locate unseen target objects in unfamiliar environments without prior maps or task-specific training which remains a significant challenge. Although recent advancements in vision-language models(VLMs) provide promising commonsense reasoning capabilities for this task, these models still suffer from spatial hallucinations, local exploration deadlocks,…
▽ More
Zero-shot object navigation requires agents to locate unseen target objects in unfamiliar environments without prior maps or task-specific training which remains a significant challenge. Although recent advancements in vision-language models(VLMs) provide promising commonsense reasoning capabilities for this task, these models still suffer from spatial hallucinations, local exploration deadlocks, and a disconnect between high-level semantic intent and low-level control. In this regard, we propose a novel hierarchical navigation framework named ReMemNav, which seamlessly integrates panoramic semantic priors and episodic memory with VLMs. We introduce the Recognize Anything Model to anchor the spatial reasoning process of the VLM. We also design an adaptive dual-modal rethinking mechanism based on an episodic semantic buffer queue. The proposed mechanism actively verifies target visibility and corrects decisions using historical memory to prevent deadlocks. For low-level action execution, ReMemNav extracts a sequence of feasible actions using depth masks, allowing the VLM to select the optimal action for mapping into actual spatial movement. Extensive evaluations on HM3D and MP3D demonstrate that ReMemNav outperforms existing training-free zero-shot baselines in both success rate and exploration efficiency. Specifically, we achieve significant absolute performance improvements, with SR and SPL increasing by 1.7% and 7.0% on HM3D v0.1, 18.2% and 11.1% on HM3D v0.2, and 8.7% and 7.9% on MP3D.
△ Less
Submitted 7 April, 2026; v1 submitted 25 March, 2026;
originally announced March 2026.
-
Language-Conditioned World Modeling for Visual Navigation
Authors:
Yifei Dong,
Fengyi Wu,
Yilong Dai,
Lingdong Kong,
Guangyu Chen,
Xu Zhu,
Qiyu Hu,
Tianyu Wang,
Johnalbert Garnica,
Feng Liu,
Siyu Huang,
Qi Dai,
Zhi-Qi Cheng
Abstract:
We study language-conditioned visual navigation (LCVN), in which an embodied agent is asked to follow a natural language instruction based only on an initial egocentric observation. Without access to goal images, the agent must rely on language to shape its perception and continuous control, making the grounding problem particularly challenging. We formulate this problem as open-loop trajectory pr…
▽ More
We study language-conditioned visual navigation (LCVN), in which an embodied agent is asked to follow a natural language instruction based only on an initial egocentric observation. Without access to goal images, the agent must rely on language to shape its perception and continuous control, making the grounding problem particularly challenging. We formulate this problem as open-loop trajectory prediction conditioned on linguistic instructions and introduce the LCVN Dataset, a benchmark of 39,016 trajectories and 117,048 human-verified instructions that supports reproducible research across a range of environments and instruction styles. Using this dataset, we develop LCVN frameworks that link language grounding, future-state prediction, and action generation through two complementary model families. The first family combines LCVN-WM, a diffusion-based world model, with LCVN-AC, an actor-critic agent trained in the latent space of the world model. The second family, LCVN-Uni, adopts an autoregressive multimodal architecture that predicts both actions and future observations. Experiments show that these families offer different advantages: the former provides more temporally coherent rollouts, whereas the latter generalizes better to unseen environments. Taken together, these observations point to the value of jointly studying language grounding, imagination, and policy learning in a unified task setting, and LCVN provides a concrete basis for further investigation of language-conditioned world models. The code is available at https://github.com/F1y1113/LCVN.
△ Less
Submitted 23 March, 2026;
originally announced March 2026.
-
Amplitude analysis and branching fraction measurement of the decay $D^0 \to K^+K^-π^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
M. S. Anderson,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone
, et al. (749 additional authors not shown)
Abstract:
An amplitude analysis of the singly Cabibbo-suppressed decay $D^0 \to K^+ K^- π^0 π^0$ is performed, for the first time, to determine the relative magnitudes and phases of different intermediate processes. The analysis uses $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy 3.773~GeV corresponding to an integrated luminosity of 20.3 $\rm fb^{-1}$. The absolute…
▽ More
An amplitude analysis of the singly Cabibbo-suppressed decay $D^0 \to K^+ K^- π^0 π^0$ is performed, for the first time, to determine the relative magnitudes and phases of different intermediate processes. The analysis uses $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy 3.773~GeV corresponding to an integrated luminosity of 20.3 $\rm fb^{-1}$. The absolute branching fraction of $D^0 \to K^+ K^- π^0 π^0$ is measured to be \BF. The dominant intermediate process is $D^0 \to K^{*}(892)^+K^{*}(892)^-$, with a branching fraction of $(2.79 \pm 0.13_{\rm{stat.}} \pm 0.11_{\rm{syst.}}) \times 10^{-3}$. Amplitude analysis reveals that the $D^0 \to K^{*}(892)^+K^{*}(892)^-$ decay is S-wave dominant. The longitudinal polarization fraction of $D^0 \to K^{*}(892)^+ K^{*}(892)^-$ is measured to be $0.468\pm0.046_{\rm{stat.}}\pm0.011_{\rm{syst.}}$.
△ Less
Submitted 30 March, 2026; v1 submitted 26 March, 2026;
originally announced March 2026.
-
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
Authors:
Keming Ye,
Zhou Zhao,
Fan Wu,
Shengyu Zhang
Abstract:
Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework \textbf{CIAR}, which utilizes on-device self-verification to handle two ke…
▽ More
Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework \textbf{CIAR}, which utilizes on-device self-verification to handle two key properties of visual synthesis: \textit{the vast token vocabulary} required for high-fidelity images and \textit{inherent spatial redundancy} which leads to extreme predictability in homogeneous regions, while object boundaries exhibit high uncertainty. Uniform verification wastes resources on such redundant tokens. Our solution centers on an on-device token uncertainty quantifier, which adopts continuous probability intervals to accelerate processing and make it feasible for large visual vocabularies instead of conventional discrete solution sets. Additionally, we incorporate a Interval-enhanced decoding module to further speed up decoding while maintaining visual fidelity and semantic consistency via a distribution alignment training strategy. Extensive experiments demonstrate that CIAR achieves a 2.18x speed-up and reduces cloud requests by 70\%, while preserving image quality compared to existing methods.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers
Authors:
Nanxiang Jiang,
Zhaoxin Fan,
Baisen Wang,
Daiheng Gao,
Junhang Cheng,
Jifeng Guo,
Yalan Qin,
Yeying Jin,
Hongwei Zheng,
Faguo Wu,
Wenjun Wu
Abstract:
Concept erasure serves as a vital safety mechanism for removing unwanted concepts from text-to-image (T2I) models. While extensively studied in U-Net and dual-stream architectures (e.g., Flux), this task remains under-explored in the recent emerging paradigm of single-stream diffusion transformers (e.g., Z-Image). In this new paradigm, text and image tokens are processed as a single unified sequen…
▽ More
Concept erasure serves as a vital safety mechanism for removing unwanted concepts from text-to-image (T2I) models. While extensively studied in U-Net and dual-stream architectures (e.g., Flux), this task remains under-explored in the recent emerging paradigm of single-stream diffusion transformers (e.g., Z-Image). In this new paradigm, text and image tokens are processed as a single unified sequence via shared parameters. Consequently, directly applying prior erasure methods typically leads to generation collapse. To bridge this gap, we introduce Z-Erase, the first concept erasure method tailored for single-stream T2I models. To guarantee stable image generation, Z-Erase first proposes a Stream Disentangled Concept Erasure Framework that decouples updates and enables existing methods on single-stream models. Subsequently, within this framework, we introduce Lagrangian-Guided Adaptive Erasure Modulation, a constrained algorithm that further balances the sensitive erasure-preservation trade-off. Moreover, we provide a rigorous convergence analysis proving that Z-Erase can converge to a Pareto stationary point. Experiments demonstrate that Z-Erase successfully overcomes the generation collapse issue, achieving state-of-the-art performance across a wide range of tasks.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
Authors:
Yicheng Zou,
Dongsheng Zhu,
Lin Zhu,
Tong Zhu,
Yunhua Zhou,
Peiheng Zhou,
Xinyu Zhou,
Dongzhan Zhou,
Zhiwang Zhou,
Yuhao Zhou,
Bowen Zhou,
Zhanping Zhong,
Zhijie Zhong,
Haiteng Zhao,
Penghao Zhao,
Xiaomeng Zhao,
Zhiyuan Zhao,
Yechen Zhang,
Jin Zhang,
Wenwei Zhang,
Hongjie Zhang,
Zhuo Zhang,
Wenlong Zhang,
Bo Zhang,
Chao Zhang
, et al. (152 additional authors not shown)
Abstract:
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertis…
▽ More
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, including chemistry, materials, life sciences, and earth sciences. Achieving this massive scale is made possible by the robust infrastructure support of XTuner and LMDeploy, which facilitates highly efficient Reinforcement Learning (RL) training at the 1-trillion parameter level while ensuring strict precision consistency between training and inference. By seamlessly integrating these advancements, Intern-S1-Pro further fortifies the fusion of general and specialized intelligence, working as a Specializable Generalist, demonstrating its position in the top tier of open-source models for general capabilities, while outperforming proprietary models in the depth of specialized scientific tasks.
△ Less
Submitted 2 April, 2026; v1 submitted 26 March, 2026;
originally announced March 2026.
-
Cross Section Measurements of $\bar{n}p \rightarrow K^{+}K^{-}π^{+}(π^{0})$ via Antineutrons Produced by $J/ψ\to p π^{-} \bar{n}$ Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (737 additional authors not shown)
Abstract:
Based on a novel method for producing antineutrons via $J/ψ$ decays, we report a study of $\bar{n}p$ inelastic scattering into final states containing kaons. The analysis uses $(10087\pm44)\times 10^6$ $J/ψ$ events collected at the BESIII detector operating at the BEPCII storage ring. Antineutrons are produced via $J/ψ\to p π^{-} \bar{n}$ decays and tagged by the detected protons and pions, result…
▽ More
Based on a novel method for producing antineutrons via $J/ψ$ decays, we report a study of $\bar{n}p$ inelastic scattering into final states containing kaons. The analysis uses $(10087\pm44)\times 10^6$ $J/ψ$ events collected at the BESIII detector operating at the BEPCII storage ring. Antineutrons are produced via $J/ψ\to p π^{-} \bar{n}$ decays and tagged by the detected protons and pions, resulting in antineutron momenta ranging from 0 to 1174~MeV/$c$, while target protons are provided by the hydrogen in the beam-pipe material. The cross sections of the reactions $\bar{n}p \rightarrow K^{+}K^{-}π^{+}$ and $\bar{n}p \rightarrow K^{+}K^{-}π^{+}π^{0}$ are measured to be $0.53^{+0.15}_{-0.12} \pm 0.08$~mb and $1.09^{+0.36}_{-0.30} \pm 0.31$~mb respectively, where the first uncertainties are statistical and the second systematic. Due to limited statistics, the intermediate states in these processes are not investigated. The observation of clean antineutron-proton scattering events indicates the potential of this approach for future investigations of antineutron-proton interactions.
△ Less
Submitted 25 March, 2026;
originally announced March 2026.
-
FHAvatar: Fast and High-Fidelity Reconstruction of Face-and-Hair Composable 3D Head Avatar from Few Casual Captures
Authors:
Yujie Sun,
Zhuoqiang Cai,
Chaoyue Niu,
Jianchuan Chen,
Zhiwen Chen,
Chengfei Lv,
Fan Wu
Abstract:
We present FHAvatar, a novel framework for reconstructing 3D Gaussian avatars with composable face and hair components from an arbitrary number of views. Unlike previous approaches that couple facial and hair representations within a unified modeling process, we explicitly decouple two components in texture space by representing the face with planar Gaussians and the hair with strand-based Gaussia…
▽ More
We present FHAvatar, a novel framework for reconstructing 3D Gaussian avatars with composable face and hair components from an arbitrary number of views. Unlike previous approaches that couple facial and hair representations within a unified modeling process, we explicitly decouple two components in texture space by representing the face with planar Gaussians and the hair with strand-based Gaussians. To overcome the limitations of existing methods that rely on dense multi-view captures or costly per-identity optimization, we propose an aggregated transformer backbone to learn geometry-aware cross-view priors and head-hair structural coherence from multi-view datasets, enabling effective and efficient feature extraction and fusion from few casual captures. Extensive quantitative and qualitative experiments demonstrate that FHAvatar achieves state-of-the-art reconstruction quality from only a few observations of new identities within minutes, while supporting real-time animation, convenient hairstyle transfer, and stylized editing, broadening the accessibility and applicability of digital avatar creation.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Amplitude Analysis of the Isospin-Violating Decay $J/ψ\rightarrowγηπ^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. -R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (736 additional authors not shown)
Abstract:
Using $(10087 \pm 44)\times 10^{6}$ $\jpsi$ events collected with the BESIII detector, we perform the first amplitude analysis of the process $\jpsi\toγη\piz$. The decay is dominated by the intermediate processes $\jpsi\to\piz \bo \left( \toγη\right)$, $\jpsi\to\pizρ(1450)^0 \left( \toγη\right)$ and $\jpsi\toηh_1(1170) \left( \toγ\piz\right)$. Contributions from $\jpsi\toγa_0(980)^0(\toη\piz)$,…
▽ More
Using $(10087 \pm 44)\times 10^{6}$ $\jpsi$ events collected with the BESIII detector, we perform the first amplitude analysis of the process $\jpsi\toγη\piz$. The decay is dominated by the intermediate processes $\jpsi\to\piz \bo \left( \toγη\right)$, $\jpsi\to\pizρ(1450)^0 \left( \toγη\right)$ and $\jpsi\toηh_1(1170) \left( \toγ\piz\right)$. Contributions from $\jpsi\toγa_0(980)^0(\toη\piz)$, $\jpsi\toγa_2(1320)^0(\toη\piz)$ and $\jpsi\toγa_2(1700)^0(\toη\piz)$ are observed with a statistical significance exceeding $5σ$, constituting the first observation of radiative transitions of $\jpsi$ to isospin-triplet scalar mesons. The total branching fraction of $\jpsi\toγη\piz$ is measured to be \num{25.7\pm0.3\pm1.5e-6}, where the first uncertainty is statistical and the second systematic. This result is consistent with the previous measurement, with the precision improved by more than a factor of two.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (678 additional authors not shown)
Abstract:
A search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ is conducted using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper limits on the branching fractions of $D^0\to γ\bar K_1(1270)^0$ and…
▽ More
A search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ is conducted using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper limits on the branching fractions of $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ at 90\% confidence level are determined to be $7.7\times10^{-4}$ and $3.9\times10^{-5}$, respectively. This represents the first test of the Vector Meson Dominance mechanism in the radiative decays of charmed mesons to axial-vector mesons.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Cascade-Free Mandarin Visual Speech Recognition via Semantic-Guided Cross-Representation Alignment
Authors:
Lei Yang,
Yi He,
Fei Wu,
Shilin Wang
Abstract:
Chinese mandarin visual speech recognition (VSR) is a task that has advanced in recent years, yet still lags behind the performance on non-tonal languages such as English. One primary challenge arises from the tonal nature of Mandarin, which limits the effectiveness of conventional sequence-to-sequence modeling approaches. To alleviate this issue, existing Chinese VSR systems commonly incorporate…
▽ More
Chinese mandarin visual speech recognition (VSR) is a task that has advanced in recent years, yet still lags behind the performance on non-tonal languages such as English. One primary challenge arises from the tonal nature of Mandarin, which limits the effectiveness of conventional sequence-to-sequence modeling approaches. To alleviate this issue, existing Chinese VSR systems commonly incorporate intermediate representations, most notably pinyin, within cascade architectures to enhance recognition accuracy. While beneficial, in these cascaded designs, the subsequent stage during inference depends on the output of the preceding stage, leading to error accumulation and increased inference latency. To address these limitations, we propose a cascade-free architecture based on multitask learning that jointly integrates multiple intermediate representations, including phoneme and viseme, to better exploit contextual information. The proposed semantic-guided local contrastive loss temporally aligns the features, enabling on-demand activation during inference, thereby providing a trade-off between inference efficiency and performance while mitigating error accumulation caused by projection and re-embedding. Experiments conducted on publicly available datasets demonstrate that our method achieves superior recognition performance.
△ Less
Submitted 23 March, 2026;
originally announced March 2026.
-
Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences
Authors:
Chen Gong,
Zhenzhe Zheng,
Yiliu Chen,
Sheng Wang,
Fan Wu,
Guihai Chen
Abstract:
Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device…
▽ More
Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.
△ Less
Submitted 22 March, 2026;
originally announced March 2026.
-
RiboSphere: Learning Unified and Efficient Representations of RNA Structures
Authors:
Zhou Zhang,
Hanqun Cao,
Cheng Tan,
Fang Wu,
Pheng Ann Heng,
Tianfan Fu
Abstract:
Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modula…
▽ More
Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of RNA by combining vector quantization with flow matching. Our design is motivated by the modular organization of RNA architecture: complex folds are composed from recurring structural motifs. RiboSphere uses a geometric transformer encoder to produce SE(3)-invariant (rotation/translation-invariant) features, which are discretized with finite scalar quantization (FSQ) into a finite vocabulary of latent codes. Conditioned on these discrete codes, a flow-matching decoder reconstructs atomic coordinates, enabling high-fidelity structure generation. We find that the learned code indices are enriched for specific RNA motifs, suggesting that the model captures motif-level compositional structure rather than acting as a purely compressive bottleneck. Across benchmarks, RiboSphere achieves strong performance in structure reconstruction (RMSD 1.25\,Å, TM-score 0.84), and its pretrained discrete representations transfer effectively to inverse folding and RNA--ligand binding prediction, with robust generalization in data-scarce regimes.
△ Less
Submitted 20 March, 2026;
originally announced March 2026.
-
OS-Themis: A Scalable Critic Framework for Generalist GUI Rewards
Authors:
Zehao Li,
Zhenyu Wu,
Yibo Zhao,
Bowen Yang,
Jingjing Xie,
Zhaoyang Liu,
Zhoumianze Liu,
Kaiming Jin,
Jianze Liang,
Zonglin Li,
Feng Wu,
Bowen Zhou,
Zun Wang,
Zichen Ding
Abstract:
Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decompo…
▽ More
Reinforcement Learning (RL) has the potential to improve the robustness of GUI agents in stochastic environments, yet training is highly sensitive to the quality of the reward function. Existing reward approaches struggle to achieve both scalability and performance. To address this, we propose OS-Themis, a scalable and accurate multi-agent critic framework. Unlike a single judge, OS-Themis decomposes trajectories into verifiable milestones to isolate critical evidence for decision making and employs a review mechanism to strictly audit the evidence chain before making the final verdict. To facilitate evaluation, we further introduce OmniGUIRewardBench (OGRBench), a holistic cross-platform benchmark for GUI outcome rewards, where all evaluated models achieve their best performance under OS-Themis. Extensive experiments on AndroidWorld show that OS-Themis yields a 10.3% improvement when used to support online RL training, and a 6.9% gain when used for trajectory validation and filtering in the self-training loop, highlighting its potential to drive agent evolution.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
Observation of $D_s^+ \to a_0(980)^+f_0(500)$ in the Amplitude Analysis of $D_s^+ \to π^+ π^0 π^0 η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (719 additional authors not shown)
Abstract:
We report the first observation of the decay $D_s^+ \to π^+π^0π^0η$ in a data set corresponding to an integrated luminosity of 7.33 fb$^{-1}$, collected in $e^+e^-$ collisions by the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV. An unexpectedly large branching fraction…
▽ More
We report the first observation of the decay $D_s^+ \to π^+π^0π^0η$ in a data set corresponding to an integrated luminosity of 7.33 fb$^{-1}$, collected in $e^+e^-$ collisions by the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV. An unexpectedly large branching fraction $\mathcal{B}( D_s^+ \to a_0(980)^+ f_0(500), a_0(980)^+ \to π^+η, f_0(500)\to π^0π^0) = (0.98 \pm 0.16_{\rm{stat.}} \pm 0.22_{\rm{syst.}})\%$ is measured with a significance exceeding $10σ$, offering new constraints on the internal structure of light scalar mesons. The dominant intermediate process is $D_s^+ \to a_1(1260)^+η, a_1(1260)^+\to ρ(770)^+π^0$ with a branching fraction of $(1.77 \pm 0.21_{\rm stat.} \pm 0.12_{\rm syst.})\%$. The isospin symmetry has been validated to the decays of $a_1(1260)^+\to ρ(770)^0π^+$ and $a_1(1260)^+\to ρ(770)^+π^0$. Moreover, the measured $\mathcal{B}(D_s^+\to π^+π^0π^0η|_{\rm{non}-η^\prime})=(2.97 \pm 0.23_{\rm stat.} \pm 0.14_{\rm sys.})$ reduces the undetected $D_s^+ \to ηX$ decay branching fractions to (0.1 $\pm$ 3.1)\%.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
ARTT: Augmented Reverberant-Target Training for Unsupervised Monaural Speech Dereverberation
Authors:
Siqi Song,
Fulin Wu,
Zhong-Qiu Wang
Abstract:
Due to the absence of clean reference signals and spatial cues, monaural unsupervised speech dereverberation is a challenging ill-posed inverse problem. To realize it, we propose augmented reverberant-target training (ARTT), which consists of two stages. In the first stage, reverberant-target training (RTT) is proposed to first further reverberate the observed reverberant mixture signal, and then…
▽ More
Due to the absence of clean reference signals and spatial cues, monaural unsupervised speech dereverberation is a challenging ill-posed inverse problem. To realize it, we propose augmented reverberant-target training (ARTT), which consists of two stages. In the first stage, reverberant-target training (RTT) is proposed to first further reverberate the observed reverberant mixture signal, and then train a deep neural network (DNN) to recover the observed reverberant mixture via discriminative training. Although the target signal to fit is reverberant, we find that the resulting DNN can effectively reduce reverberation. In the second stage, an online self-distillation mechanism based on the mean-teacher algorithm is proposed to further improve dereverberation. Evaluation results demonstrate that ARTT achieves strong unsupervised dereverberation performance, significantly outperforming previous baselines.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
AppFlow: Memory Scheduling for Cold Launch of Large Apps on Mobile and Vehicle Systems
Authors:
Xiaochen Li,
Sicong Liu,
Bin Guo,
Yu Ouyang,
Fengmin Wu,
Yuan Xu,
Zhiwen Yu
Abstract:
GB-scale large apps like on-device LLMs and rich media editors are becoming the next-generation trend, but their heavy memory and I/O demands, especially during multitasking, cause devices to reclaim or kill processes, turning warm apps into cold launches. The challenge lies not in storing them, but in fast, accurate launching. For users, 1s is the usability cliff, yet our measurements show 86.6\%…
▽ More
GB-scale large apps like on-device LLMs and rich media editors are becoming the next-generation trend, but their heavy memory and I/O demands, especially during multitasking, cause devices to reclaim or kill processes, turning warm apps into cold launches. The challenge lies not in storing them, but in fast, accurate launching. For users, 1s is the usability cliff, yet our measurements show 86.6\% of GB-scale cold launches exceed it. Also, Android Vitals flags only $\geq$ 5s as slow, exposing a large satisfaction gap. Existing optimizations are designed in isolation and conflict. For example, preloading reduces I/O stalls but consumes scarce memory and is undone by reclamation, while reclamation and killing free memory but sacrifice background survivability, leading to repeated cold relaunches. Our key insight is that, although multitasking makes runtime behavior complex, each app's file access pattern remains predictable. The challenge lies in exploiting this predictability, i.e., preloading without exhausting memory, reclaiming without undoing gains, and killing selectively to preserve background survivability. We introduce AppFlow, a prediction-based system-wide scheduler that integrates a Selective File Preloader, an Adaptive Memory Reclaimer, and a Context-Aware Process Killer. Implemented across the Android framework and Linux kernel without app changes, AppFlow cuts GB-scale cold-launch latency by 66.5\% (e.g., 2s$\rightarrow$690ms) and sustains 95\% of launches within 1s over a 100-day test, significantly improving responsiveness and multitasking experience.
△ Less
Submitted 17 March, 2026;
originally announced March 2026.
-
EEG-SeeGraph: Interpreting functional connectivity disruptions in dementias via sparse-explanatory dynamic EEG-graph learning
Authors:
Fengcheng Wu,
Zhenxi Song,
Guoyang Xu,
Kaisong Hu,
Zirui Wang,
Yi Guo,
Zhiguo Zhang
Abstract:
Robust and interpretable dementia diagnosis from noisy, non-stationary electroencephalography (EEG) is clinically essential yet remains challenging. To this end, we propose SeeGraph, a Sparse-Explanatory dynamic EEG-graph network that models time-evolving functional connectivity and employs a node-guided sparse edge mask to reveal the connections that drive diagnostic decisions, while remaining ro…
▽ More
Robust and interpretable dementia diagnosis from noisy, non-stationary electroencephalography (EEG) is clinically essential yet remains challenging. To this end, we propose SeeGraph, a Sparse-Explanatory dynamic EEG-graph network that models time-evolving functional connectivity and employs a node-guided sparse edge mask to reveal the connections that drive diagnostic decisions, while remaining robust to noise and cross-site variability. SeeGraph comprises four components: (1) a dual-trajectory temporal encoder that models dynamic EEG with two streams, where node signals capture regional oscillations and edge signals capture interregional coupling; (2) a topology-aware positional encoder that derives graph-spectral Laplacian coordinates from the fused connectivity and augments node embeddings; (3) a node-guided sparse explanatory edge mask that gates the connectivity into a compact subgraph; and (4) a gated graph predictor that operates on the sparsified graph. The framework is trained using cross-entropy loss together with a sparsity regularizer on the mask, yielding noise-robust and interpretable diagnoses. The effectiveness of SeeGraph is validated on public and in-house EEG cohorts, including patients with neurodegenerative dementias and healthy controls, under both raw and noise-perturbed conditions. Its sparse, node-guided explanations highlight disease-relevant connections and align with established clinical findings on functional connectivity alterations, thereby offering transparent cues for neurological evaluation.
△ Less
Submitted 3 March, 2026;
originally announced March 2026.