-
Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (728 additional authors not shown)
Abstract:
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$,…
▽ More
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
BlazingAML: High-Throughput Anti-Money Laundering (AML) via Multi-Stage Graph Mining
Authors:
Haojie Ye,
Arjun Laxman,
Yichao Yuan,
Krisztian Flautner,
Nishil Talati
Abstract:
Money laundering detection faces challenges due to excessive false positives and inadequate adaptation to sophisticated multi-stage schemes that exploit modern financial networks. Graph analytics and AI are promising tools, but they struggle with the fuzziness of laundering patterns, which exhibit structural and temporal variations. Conventional data mining techniques require the detailed enumerat…
▽ More
Money laundering detection faces challenges due to excessive false positives and inadequate adaptation to sophisticated multi-stage schemes that exploit modern financial networks. Graph analytics and AI are promising tools, but they struggle with the fuzziness of laundering patterns, which exhibit structural and temporal variations. Conventional data mining techniques require the detailed enumeration of pattern variants, which not only complicates the analyst's task to specify them, but also leads to large run-time overheads and difficulty training accurate AI models. The paper presents BlazingAML, a scalable AML system design that introduces: 1. A novel multi-stage framework for expressing fuzzy money laundering patterns 2. A domain-specific compiler that transforms high-level pattern descriptions into high-performance code for CPU and GPU back-ends The multi-stage abstraction decomposes complex laundering schemes into logical stages connected by graph operations, enabling diverse patterns to be expressed using unified primitives while capturing structural and temporal fuzziness. The compiler applies sophisticated optimizations, eliminating manual parallel programming requirements for financial analysts. Evaluation on IBM AML datasets shows BlazingAML achieves the same F1 score as state-of-the-art approaches while delivering 210x and 333x higher speedup on CPU and GPU respectively, with superior scalability.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
CodeTracer: Towards Traceable Agent States
Authors:
Han Li,
Yifan Yao,
Letian Zhu,
Rili Feng,
Hongyi Ye,
Jiaming Wang,
Yancheng He,
Pengyu Zou,
Lehan Zhang,
Xinping Lei,
Haoyang Huang,
Ken Deng,
Ming Sun,
Zhaoxiang Zhang,
He Ye,
Jiaheng Liu
Abstract:
Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate parallel tool calls and multi-stage workflows over complex tasks, making the agent's state transitions and error propagation hard to observe. In these runs, an early misstep can trap the agent in unproductive loops or even cascade into fundamental errors, forming hidden error chains…
▽ More
Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate parallel tool calls and multi-stage workflows over complex tasks, making the agent's state transitions and error propagation hard to observe. In these runs, an early misstep can trap the agent in unproductive loops or even cascade into fundamental errors, forming hidden error chains that make it hard to tell when the agent goes off track and why. Existing agent tracing analyses either focus on simple interaction or rely on small-scale manual inspection, which limits their scalability and usefulness for real coding workflows. We present CodeTracer, a tracing architecture that parses heterogeneous run artifacts through evolving extractors, reconstructs the full state transition history as a hierarchical trace tree with persistent memory, and performs failure onset localization to pinpoint the failure origin and its downstream chain. To enable systematic evaluation, we construct CodeTraceBench from a large collection of executed trajectories generated by four widely used code agent frameworks on diverse code tasks (e.g., bug fixing, refactoring, and terminal interaction), with supervision at both the stage and step levels for failure localization. Experiments show that CodeTracer substantially outperforms direct prompting and lightweight baselines, and that replaying its diagnostic signals consistently recovers originally failed runs under matched budgets. Our code and data are publicly available.
△ Less
Submitted 15 April, 2026; v1 submitted 13 April, 2026;
originally announced April 2026.
-
Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (741 additional authors not shown)
Abstract:
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$,…
▽ More
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$, $\mathcal{B}(χ_{c1} \to π^{+}π^{-}π^{0}π^{0}) = (1.16 \pm 0.01 \pm 0.05) \times 10^{-2}$, and $\mathcal{B}(χ_{c2} \to π^{+}π^{-}π^{0}π^{0}) = (1.92 \pm 0.01 \pm 0.08) \times 10^{-2}$, where the first uncertainties are statistical and the second systematic. The dominant intermediate states are found to be $χ_{cJ}\toρ^+ρ^-$. These results supersede the previous most precise measurements and provide significantly improved precision.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
First Observation of \boldmath{$D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$} Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (734 additional authors not shown)
Abstract:
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measure…
▽ More
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measured to be $(3.20\pm0.06_{\text{stat.}}\pm0.03_{\text{syst.}})\times 10^{-3}$ and $(2.43 \pm 0.11_{\text{stat.}} \pm 0.04_{\text{syst.}}) \times 10^{-3}$, respectively. % , both achieving three times better precision than the current PDG values. The decay process $D^{+}\to a_0(980)^{+}f_0(500)$ is observed for the first time with an unexpectedly large branching fraction. Moreover, we observe the decays $D^+ \to a_0(980)^{+(0)} ρ(770)^{0(+)}$ and measure the ratio $r_{+/0} \equiv \frac{\mathcal{B}(D^+ \to a_0(980)^+ ρ(770)^0)}{\mathcal{B}(D^+ \to a_0(980)^0 ρ(770)^+)}$ for the first time to be $0.55\pm0.08_{\text{stat.}}\pm0.05_{\text{syst.}}$. These results offer a novel insight into our comprehension of the nature of the $a_0(980)$ and $f_0(500)$ states.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence
Authors:
Hengzhi Ye,
Minghui Zhou
Abstract:
AI development is embracing open-source paradigm, but the fundamental distinction between AI models and traditional software artifacts may lead to a divergent open-source development paradigm with different collaborative practices, which remains unexplored. We therefore bridge the knowledge gap by quantifying and characterizing the differences in the collaborative development paradigms of traditio…
▽ More
AI development is embracing open-source paradigm, but the fundamental distinction between AI models and traditional software artifacts may lead to a divergent open-source development paradigm with different collaborative practices, which remains unexplored. We therefore bridge the knowledge gap by quantifying and characterizing the differences in the collaborative development paradigms of traditional open source software (OSS) and open source AI models (OSM), and investigating the underlying factors that may drive these distinctions. We collect 1,428,792 OSS repositories from GitHub and 1,440,527 OSM repositories from HF Hub, and conduct comprehensive statistical, social network and content analyses to measure and understand the differences in collaboration intensity, collaboration openness, and user innovation across the two development paradigms, complementing these quantitative results with semi-structured interviews. In consequence, we find that compared to OSS development paradigm, the OSM development paradigm exhibits significantly lower collaboration intensity; lower collaboration openness regarding direct contribution while persisting relatively open knowledge exchange; and a divergence toward adaptive utilization user-innovation rather than collaborative improvement. Through semi-structured interviews, we further elucidate the socio-technical factors underlying these differences. These findings reveal the paradigmatic divergence in open source development between traditional OSS and OSM across three critical dimensions of open source collaboration and potential underlying factors, shedding light on how to improve collaborative work techniques and practices within the context of AI development.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Visually-grounded Humanoid Agents
Authors:
Hang Ye,
Xiaoxuan Ma,
Fan Lu,
Wayne Wu,
Kwan-Yee Lin,
Yizhou Wang
Abstract:
Digital human generation has been studied for decades and supports a wide range of real-world applications. However, most existing systems are passively animated, relying on privileged state or scripted control, which limits scalability to novel environments. We instead ask: how can digital humans actively behave using only visual observations and specified goals in novel scenes? Achieving this wo…
▽ More
Digital human generation has been studied for decades and supports a wide range of real-world applications. However, most existing systems are passively animated, relying on privileged state or scripted control, which limits scalability to novel environments. We instead ask: how can digital humans actively behave using only visual observations and specified goals in novel scenes? Achieving this would enable populating any 3D environments with digital humans at scale that exhibit spontaneous, natural, goal-directed behaviors. To this end, we introduce Visually-grounded Humanoid Agents, a coupled two-layer (world-agent) paradigm that replicates humans at multiple levels: they look, perceive, reason, and behave like real people in real-world 3D scenes. The World Layer reconstructs semantically rich 3D Gaussian scenes from real-world videos via an occlusion-aware pipeline and accommodates animatable Gaussian-based human avatars. The Agent Layer transforms these avatars into autonomous humanoid agents, equipping them with first-person RGB-D perception and enabling them to perform accurate, embodied planning with spatial awareness and iterative reasoning, which is then executed at the low level as full-body actions to drive their behaviors in the scene. We further introduce a benchmark to evaluate humanoid-scene interaction in diverse reconstructed environments. Experiments show our agents achieve robust autonomous behavior, yielding higher task success rates and fewer collisions than ablations and state-of-the-art planning methods. This work enables active digital human population and advances human-centric embodied AI. Data, code, and models will be open-sourced.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation
Authors:
Hengkai Ye,
Zhechang Zhang,
Jinyuan Jia,
Hong Hu
Abstract:
Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence…
▽ More
Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing defenses mainly detect anomalous instructions and remain ineffective against implicit TPAs. In this paper, we present TRUSTDESC, the first framework for preventing tool poisoning by automatically generating trusted tool descriptions from implementations. TRUSTDESC derives implementation-faithful descriptions through a three-stage pipeline. SliceMin performs reachability-aware static analysis and LLM-guided debloating to extract minimal tool-relevant code slices. DescGen synthesizes descriptions from these slices while mitigating misleading or adversarial code artifacts. DynVer refines descriptions through dynamic verification by executing synthesized tasks and validating behavioral claims. We evaluate TRUSTDESC on 52 real-world tools across multiple tool ecosystems. Results show that TRUSTDESC produces accurate tool descriptions that improve task completion rates while mitigating implicit TPAs at their root, with minimal time and monetary overhead.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
A Graph Foundation Model for Wireless Resource Allocation
Authors:
Yucheng Sheng,
Jiacheng Wang,
Le Liang,
Hao Ye,
Shi Jin
Abstract:
The aggressive densification of modern wireless networks necessitates judicious resource allocation to mitigate severe mutual interference. However, classical iterative algorithms remain computationally prohibitive for real-time applications requiring rapid responsiveness. While recent deep learning-based methods show promise, they typically function as task-specific solvers lacking the flexibilit…
▽ More
The aggressive densification of modern wireless networks necessitates judicious resource allocation to mitigate severe mutual interference. However, classical iterative algorithms remain computationally prohibitive for real-time applications requiring rapid responsiveness. While recent deep learning-based methods show promise, they typically function as task-specific solvers lacking the flexibility to adapt to different objectives and scenarios without expensive retraining. To address these limitations, we propose a graph foundation model for resource allocation (GFM-RA) based on a pre-training and fine-tuning paradigm to extract unified representations, thereby enabling rapid adaptation to different objectives and scenarios. Specifically, we introduce an interference-aware Transformer architecture with a bias projector that injects interference topologies into global attention mechanisms. Furthermore, we develop a hybrid self-supervised pre-training strategy that synergizes masked edge prediction with negative-free Teacher-Student contrastive learning, enabling the model to capture transferable structural representations from massive unlabeled datasets. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art performance and scales effectively with increased model capacity. Crucially, leveraging its unified representations, the foundation model exhibits exceptional sample efficiency, enabling robust few-shot adaptation to diverse and unsupervised downstream objectives in out-of-distribution (OOD) scenarios. These results demonstrate the promise of pre-trained foundation models for adaptable wireless resource allocation and provide a strong foundation for future research on generalizable learning-based wireless optimization.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Precise measurement of the CKM angle $γ$ with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from…
▽ More
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from ${B^{\pm}\rightarrow D(\rightarrow K_{\rm S}^{0} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays in LHCb data, where $h^{(\prime)}$ is either a pion or kaon, while the corresponding strong-phase parameters are measured using doubly tagged ${D\rightarrow K_{\rm S/L}^0 h^{\prime+} h^{\prime-}}$ decays in the quantum-correlated $D\overline{D}$ system present in BESIII data. A joint fit to both datasets, which allows for a simultaneous determination of the associated $C\!P$-violating observables and strong-phase parameters, yields ${γ= (71.3\pm 5.0)^{\circ}}$. The result is the most precise to date and consistent with previous measurements and world averages.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider…
▽ More
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider II during 2010--2011 and 2021--2022, corresponding to an integrated luminosity of 8 fb$^{-1}$, and proton-proton collisions collected by the LHCb experiment at the Large Hadron Collider during 2011--2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. The two datasets are analyzed simultaneously by applying per-event weights based on the amplitude variation over the $D$-decay phase space to enhance the sensitivity to $C\!P$-violating observables. The CKM angle $γ$ is determined to be $γ= (71.3\pm 5.0)^{\circ}$, which constitutes the most precise single measurement to date.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
An Iterative Test-and-Repair Framework for Competitive Code Generation
Authors:
Lingxiao Tang,
Muyang Ye,
Zhaoyang Chu,
Xiaoxue Ren,
Zhongxin Liu,
Lingfeng Bao,
He Ye
Abstract:
Large language models (LLMs) have made remarkable progress in code generation, but competitive programming remains a challenge. Recent training-based methods have improved code generation by using reinforcement learning (RL) with execution feedback. The more recent framework CURE further incorporates test generation into the training process, jointly training a Coder and a Tester within a single m…
▽ More
Large language models (LLMs) have made remarkable progress in code generation, but competitive programming remains a challenge. Recent training-based methods have improved code generation by using reinforcement learning (RL) with execution feedback. The more recent framework CURE further incorporates test generation into the training process, jointly training a Coder and a Tester within a single model. At inference time, the Coder generates many candidate programs, and the Tester generates tests from the problem description. The candidate who passes the most of the generated tests is selected as the final answer. However, CURE has two critical limitations. First, the Tester never reads any candidate code, so its tests often fail to expose implementation-specific bugs. Second, the Coder generates every candidate from scratch and never learns to fix a buggy program based on a failed test. To address these limitations, we propose FixAudit, which approaches competitive code generation from a new perspective: starting from a single initial candidate, it iteratively improves the candidate through a targeted test-and-repair debugging cycle. The framework trains one shared model with two specialized roles through four stages: the Fixer, which repairs the current candidate based on a failing test, and the Auditor, which reads the candidate code to generate new tests that expose its remaining bugs. We evaluate FixAudit on three benchmarks: APPS, CodeContests, and xCodeEval. Applied to a 7B model, the framework surpasses the average performance of the larger 32B baseline within the same model family under the zero-shot setting. Compared to strong baselines built on the same 7B base model, FixAudit improves average Pass@1 by 35.1% to 36.8% and average AvgPassRatio by 7.1% to 24.5%.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation
Authors:
Chenxin Yuan,
Shoupeng Chen,
Haojiang Ye,
Yiming Miao,
Limei Peng,
Pin-Han Ho
Abstract:
Accurate segmentation of 3D medical scans is crucial for clinical diagnostics and treatment planning, yet existing methods often fail to achieve both high accuracy and computational efficiency across diverse anatomies and imaging modalities. To address these challenges, we propose GCNV-Net, a novel 3D medical segmentation framework that integrates a Tri-directional Dynamic Nonvoid Voxel Transforme…
▽ More
Accurate segmentation of 3D medical scans is crucial for clinical diagnostics and treatment planning, yet existing methods often fail to achieve both high accuracy and computational efficiency across diverse anatomies and imaging modalities. To address these challenges, we propose GCNV-Net, a novel 3D medical segmentation framework that integrates a Tri-directional Dynamic Nonvoid Voxel Transformer (3DNVT), a Geometrical Cross-Attention module (GCA), and Nonvoid Voxelization. The 3DNVT dynamically partitions relevant voxels along the three orthogonal anatomical planes, namely the transverse, sagittal, and coronal planes, enabling effective modeling of complex 3D spatial dependencies. The GCA mechanism explicitly incorporates geometric positional information during multi-scale feature fusion, significantly enhancing fine-grained anatomical segmentation accuracy. Meanwhile, Nonvoid Voxelization processes only informative regions, greatly reducing redundant computation without compromising segmentation quality, and achieves a 56.13% reduction in FLOPs and a 68.49% reduction in inference latency compared to conventional voxelization. We evaluate GCNV-Net on multiple widely used benchmarks: BraTS2021, ACDC, MSD Prostate, MSD Pancreas, and AMOS2022. Our method achieves state-of-the-art segmentation performance across all datasets, outperforming the best existing methods by 0.65% on Dice, 0.63% on IoU, 1% on NSD, and relatively 14.5% on HD95. All results demonstrate that GCNV-Net effectively balances accuracy and efficiency, and its robustness across diverse organs, disease conditions, and imaging modalities highlights strong potential for clinical deployment.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
DriveVA: Video Action Models are Zero-Shot Drivers
Authors:
Mengmeng Liu,
Diankun Zhang,
Jiuming Liu,
Jianfeng Cui,
Hongwei Xie,
Guang Chen,
Hangjun Ye,
Michael Ying Yang,
Francesco Nex,
Hao Cheng
Abstract:
Generalization is a central challenge in autonomous driving, as real-world deployment requires robust performance under unseen scenarios, sensor domains, and environmental conditions. Recent world-model-based planning methods have shown strong capabilities in scene understanding and multi-modal future prediction, yet their generalization across datasets and sensor configurations remains limited. I…
▽ More
Generalization is a central challenge in autonomous driving, as real-world deployment requires robust performance under unseen scenarios, sensor domains, and environmental conditions. Recent world-model-based planning methods have shown strong capabilities in scene understanding and multi-modal future prediction, yet their generalization across datasets and sensor configurations remains limited. In addition, their loosely coupled planning paradigm often leads to poor video-trajectory consistency during visual imagination. To overcome these limitations, we propose DriveVA, a novel autonomous driving world model that jointly decodes future visual forecasts and action sequences in a shared latent generative process. DriveVA inherits rich priors on motion dynamics and physical plausibility from well-pretrained large-scale video generation models to capture continuous spatiotemporal evolution and causal interaction patterns. To this end, DriveVA employs a DiT-based decoder to jointly predict future action sequences (trajectories) and videos, enabling tighter alignment between planning and scene evolution. We also introduce a video continuation strategy to strengthen long-duration rollout consistency. DriveVA achieves an impressive closed-loop performance of 90.9 PDM score on the challenge NAVSIM. Extensive experiments also demonstrate the zero-shot capability and cross-domain generalization of DriveVA, which reduces average L2 error and collision rate by 78.9% and 83.3% on nuScenes and 52.5% and 52.4% on the Bench2drive built on CARLA v2 compared with the state-of-the-art world-model-based planner.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
NeuReasoner: Towards Explainable, Controllable, and Unified Reasoning via Mixture-of-Neurons
Authors:
Haonan Dong,
Kehan Jiang,
Haoran Ye,
Wenhao Zhu,
Zhaolu Kang,
Guojie Song
Abstract:
Large Reasoning Models (LRMs) have recently achieved remarkable success in complex reasoning tasks. However, closer scrutiny reveals persistent failure modes compromising performance and cost: I) Intra-step level, marked by calculation or derivation errors; II) Inter-step level, involving oscillation and stagnation; and III) Instance level, causing maladaptive over-thinking. Existing endeavors tar…
▽ More
Large Reasoning Models (LRMs) have recently achieved remarkable success in complex reasoning tasks. However, closer scrutiny reveals persistent failure modes compromising performance and cost: I) Intra-step level, marked by calculation or derivation errors; II) Inter-step level, involving oscillation and stagnation; and III) Instance level, causing maladaptive over-thinking. Existing endeavors target isolated levels without unification, while their black-box nature and reliance on RL hinder explainability and controllability. To bridge these gaps, we conduct an in-depth white-box analysis, identifying key neurons (Mixture of Neurons, MoN) and their fluctuation patterns associated with distinct failures. Building upon these insights, we propose NeuReasoner, an explainable, controllable, and unified reasoning framework driven by MoN. Technically, NeuReasoner integrates lightweight MLPs for failure detection with a special token-triggered self-correction mechanism learned via SFT. During inference, special tokens are inserted upon failure detection to actuate controllable remedial behaviors. Extensive evaluations across six benchmarks, six backbone models (8B~70B) against nine competitive baselines, demonstrate that NeuReasoner achieves performance gains of up to 27.0% while reducing token consumption by 19.6% ~ 63.3%.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
Authors:
Yongkang Li,
Lijun Zhou,
Sixu Yan,
Bencheng Liao,
Tianyi Yan,
Kaixin Xiong,
Long Chen,
Hongwei Xie,
Bing Wang,
Guang Chen,
Hangjun Ye,
Wenyu Liu,
Haiyang Sun,
Xinggang Wang
Abstract:
Vision-Language-Action (VLA) models have recently emerged in autonomous driving, with the promise of leveraging rich world knowledge to improve the cognitive capabilities of driving systems. However, adapting such models for driving tasks currently faces a critical dilemma between spatial perception and semantic reasoning. Consequently, existing VLA systems are forced into suboptimal compromises:…
▽ More
Vision-Language-Action (VLA) models have recently emerged in autonomous driving, with the promise of leveraging rich world knowledge to improve the cognitive capabilities of driving systems. However, adapting such models for driving tasks currently faces a critical dilemma between spatial perception and semantic reasoning. Consequently, existing VLA systems are forced into suboptimal compromises: directly adopting 2D Vision-Language Models yields limited spatial perception, whereas enhancing them with 3D spatial representations often impairs the native reasoning capacity of VLMs. We argue that this dilemma largely stems from the coupled optimization of spatial perception and semantic reasoning within shared model parameters. To overcome this, we propose UniDriveVLA, a Unified Driving Vision-Language-Action model based on Mixture-of-Transformers that addresses the perception-reasoning conflict via expert decoupling. Specifically, it comprises three experts for driving understanding, scene perception, and action planning, which are coordinated through masked joint attention. In addition, we combine a sparse perception paradigm with a three-stage progressive training strategy to improve spatial perception while maintaining semantic reasoning capability. Extensive experiments show that UniDriveVLA achieves state-of-the-art performance in open-loop evaluation on nuScenes and closed-loop evaluation on Bench2Drive. Moreover, it demonstrates strong performance across a broad range of perception, prediction, and understanding tasks, including 3D detection, online mapping, motion forecasting, and driving-oriented VQA, highlighting its broad applicability as a unified model for autonomous driving. Code and model have been released at https://github.com/xiaomi-research/unidrivevla
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor
Authors:
Yixiao Wang,
Ting Jiang,
Zishan Shao,
Hancheng Ye,
Jingwei Sun,
Mingyuan Ma,
Jianyi Zhang,
Yiran Chen,
Hai Li
Abstract:
Denoising generative models deliver high-fidelity generation but remain bottlenecked by inference latency due to the many iterative denoiser calls required during sampling. Training-free acceleration methods reduce latency by either sparsifying the model architecture or shortening the sampling trajectory. Current training-free acceleration methods are more complex than necessary: higher-order pred…
▽ More
Denoising generative models deliver high-fidelity generation but remain bottlenecked by inference latency due to the many iterative denoiser calls required during sampling. Training-free acceleration methods reduce latency by either sparsifying the model architecture or shortening the sampling trajectory. Current training-free acceleration methods are more complex than necessary: higher-order predictors amplify error under aggressive speedups, and architectural modifications hinder deployment. Beyond 2x acceleration, step skipping creates structural scarcity -- at most one fresh evaluation per local window -- leaving the computed output and its backward difference as the only causally grounded information. Based on this, we propose ZEUS, an acceleration method that predicts reduced denoiser evaluations using a second-order predictor, and stabilizes aggressive consecutive skipping with an interleaved scheme that avoids back-to-back extrapolations. ZEUS adds essentially zero overhead, no feature caches, and no architectural modifications, and it is compatible with different backbones, prediction objectives, and solver choices. Across image and video generation, ZEUS consistently improves the speed-fidelity performance over recent training-free baselines, achieving up to 3.2x end-to-end speedup while maintaining perceptual quality. Our code is available at: https://github.com/Ting-Justin-Jiang/ZEUS.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
Advanced Capacity Accreditation of Future Energy System Resources with Deep Uncertainties
Authors:
Ethan Cantor,
Yinyin Ge,
Hongxing Ye,
Jie Li
Abstract:
The electric power sector has seen an increased penetration of renewable energy sources (RESs) that could strain the system reliability due to their inherent uncertainties in availability and controllability. Effective load carrying capability (ELCC) is widely used to quantify the reliability contributions of these RESs. However, existing ELCC methods can over- or under-estimate their contribution…
▽ More
The electric power sector has seen an increased penetration of renewable energy sources (RESs) that could strain the system reliability due to their inherent uncertainties in availability and controllability. Effective load carrying capability (ELCC) is widely used to quantify the reliability contributions of these RESs. However, existing ELCC methods can over- or under-estimate their contributions and often neglect or simplify other critical factors such as transmission constraints and evolving climate trends, leading to inaccurate capacity credit (CC) allocations and inefficient reliability procurement in capacity markets. To address these limitations, this paper proposes TRACED (TRansmission And Climate Enhanced Delta) -- an advanced capacity accreditation approach that integrates transmission constraints and climate-adjusted system conditions into a Delta ELCC evaluation. Case studies on a modified IEEE-118 bus system with high RES and energy storage penetrations demonstrate that TRACED produces portfolio-consistent CC allocations by capturing resource interactions and avoiding the double-counting of shared reliability benefits inherent in marginal ELCC, which may otherwise lead to under-procurement of reliability resources. Results further demonstrate that transmission congestion and evolving climate trends have mutual impacts on CC allocation, justifying their necessary integration into TRACED.
△ Less
Submitted 31 March, 2026;
originally announced April 2026.
-
Learning Structural-Functional Brain Representations through Multi-Scale Adaptive Graph Attention for Cognitive Insight
Authors:
Badhan Mazumder,
Sir-Lord Wiafe,
Aline Kotoski,
Vince D. Calhoun,
Dong Hye Ye
Abstract:
Understanding how brain structure and function interact is key to explaining intelligence yet modeling them jointly is challenging as the structural and functional connectome capture complementary aspects of organization. We introduced Multi-scale Adaptive Graph Network (MAGNet), a Transformer-style graph neural network framework that adaptively learns structure-function interactions. MAGNet lever…
▽ More
Understanding how brain structure and function interact is key to explaining intelligence yet modeling them jointly is challenging as the structural and functional connectome capture complementary aspects of organization. We introduced Multi-scale Adaptive Graph Network (MAGNet), a Transformer-style graph neural network framework that adaptively learns structure-function interactions. MAGNet leverages source-based morphometry from structural MRI to extract inter-regional morphological features and fuses them with functional network connectivity from resting-state fMRI. A hybrid graph integrates direct and indirect pathways, while local-global attention refines connectivity importance and a joint loss simultaneously enforces cross-modal coherence and optimizes the prediction objective end-to-end. On the ABCD dataset, MAGNet outperformed relevant baselines, demonstrating effective multimodal integration for advancing our understanding of cognitive function.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
NeuroBRIDGE: Behavior-Conditioned Koopman Dynamics with Riemannian Alignment for Early Substance Use Initiation Prediction from Longitudinal Functional Connectome
Authors:
Badhan Mazumder,
Sir-Lord Wiafe,
Vince D. Calhoun,
Dong Hye Ye
Abstract:
Early identification of adolescents at risk for substance use initiation (SUI) is vital yet difficult, as most predictors treat connectivity as static or cross-sectional and miss how brain networks change over time and with behavior. We proposed NeuroBRIDGE (Behavior conditioned RIemannian Koopman Dynamics on lonGitudinal connEctomes), a novel graph neural network-based framework that aligns longi…
▽ More
Early identification of adolescents at risk for substance use initiation (SUI) is vital yet difficult, as most predictors treat connectivity as static or cross-sectional and miss how brain networks change over time and with behavior. We proposed NeuroBRIDGE (Behavior conditioned RIemannian Koopman Dynamics on lonGitudinal connEctomes), a novel graph neural network-based framework that aligns longitudinal functional connectome in a Riemannian tangent space and couples dual-time attention with behavioral-conditioned Koopman dynamics to capture temporal change. Evaluated on ABCD, NeuroBRIDGE improved future SUI prediction over relevant baselines while offering interpretable insights into neural pathways, refining our understanding of neurodevelopmental risk and informing targeted prevention.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
First energy scan measurement of $e^{+}e^{-}\to K^{+}K^{-}$ around the $ψ(2S)$ resonance
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (683 additional authors not shown)
Abstract:
We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and el…
▽ More
We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and electromagnetic amplitudes of the $ψ(2S)$ resonance, a fundamental parameter in charmonium physics, based on the assumption that the relative phase between the electromagnetic amplitude of the $ψ(2S)$ resonance and the continuum is zero. Two distinct solutions for the branching fraction $\mathcal{B}$ of $ψ(2S)\to K^{+}K^{-}$ are observed: a constructive interference solution with $\mathcal{B}=(7.49\pm0.41)\times10^{-5}$ and $Φ=(110.1 \pm6.7)^\circ$, and a destructive interference solution with $\mathcal{B}=(10.94\pm0.48)\times10^{-5}$ and $Φ=(-106.8\pm5.7)^\circ$. A significant correlation between $Φ$ and $\mathcal{B}$ is established, demonstrating that interference effects must be taken into account in the $ψ(2S)$ branching fraction measurements. Additionally, the first results for both the $ψ(2S)$ strong form factor, which characterizes the strong coupling between $ψ(2S)$ and $K^{+}K^{-}$, and the energy-dependent electromagnetic form factor of the charged kaon in this energy region are here reported.
△ Less
Submitted 31 March, 2026;
originally announced March 2026.
-
DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing
Authors:
Kailai Feng,
Yuxiang Wei,
Bo Chen,
Yang Pan,
Hu Ye,
Songwei Liu,
Chenqian Yan,
Yuan Gao
Abstract:
Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focus on T2I generation and lack support for image editing. In this paper, we propos…
▽ More
Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focus on T2I generation and lack support for image editing. In this paper, we propose DreamLite, a compact unified on-device diffusion model (0.39B) that supports both T2I generation and text-guided image editing within a single network. DreamLite is built on a pruned mobile U-Net backbone and unifies conditioning through in-context spatial concatenation in the latent space. It concatenates images horizontally as input, using a (target | blank) configuration for generation tasks and (target | source) for editing tasks. To stabilize the training of this compact model, we introduce a task-progressive joint pretraining strategy that sequentially targets T2I, editing, and joint tasks. After high-quality SFT and reinforcement learning, DreamLite achieves GenEval (0.72) for image generation and ImgEdit (4.11) for image editing, outperforming existing on-device models and remaining competitive with several server-side models. By employing step distillation, we further reduce denoising processing to just 4 steps, enabling our DreamLite could generate or edit a 1024 x 1024 image in less than 1s on a Xiaomi 14 smartphone. To the best of our knowledge, DreamLite is the first unified on-device diffusion model that supports both image generation and image editing.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Observation of $Λ^+_c\to nπ^+η$ and search for $Λ^+_c\to na_0(980)^+$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (722 additional authors not shown)
Abstract:
By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be…
▽ More
By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be $0.155\pm0.031_{\rm stat.}\pm0.012_{\rm syst.}$ Taking the world average of $\mathcal{B}(Λ_c^+\to Λπ^+η)$ as reference, the absolute branching fraction is calculated to be $\mathcal{B}(Λ_c^+\to nπ^+η)=(2.94\pm0.59_{\rm stat.}\pm0.23_{\rm syst.}\pm0.13_{\rm ref.})\times10^{-3}$. The intermediate process $Λ_c^+\to na_0(980)^+$ is also searched for in the $π^+η$ invariant mass spectrum. Since no significant signal is found, the upper limit on $\mathcal{B}(Λ_c^+\to na_0(980)^+)\times\mathcal{B}(a_0(980)^+\toπ^+η)$ is set to $8.4\times10^{-4}$ at 90\% confidence level. A sophisticated deep learning approach using a Transformer-based architecture is employed to distinguish signals from prevalent hadronic backgrounds, complemented by thorough validation and systematic uncertainty quantification.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
Amplitude analysis and branching fraction measurement of the decay $D^0 \to K^+K^-π^0π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
M. S. Anderson,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone
, et al. (749 additional authors not shown)
Abstract:
An amplitude analysis of the singly Cabibbo-suppressed decay $D^0 \to K^+ K^- π^0 π^0$ is performed, for the first time, to determine the relative magnitudes and phases of different intermediate processes. The analysis uses $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy 3.773~GeV corresponding to an integrated luminosity of 20.3 $\rm fb^{-1}$. The absolute…
▽ More
An amplitude analysis of the singly Cabibbo-suppressed decay $D^0 \to K^+ K^- π^0 π^0$ is performed, for the first time, to determine the relative magnitudes and phases of different intermediate processes. The analysis uses $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy 3.773~GeV corresponding to an integrated luminosity of 20.3 $\rm fb^{-1}$. The absolute branching fraction of $D^0 \to K^+ K^- π^0 π^0$ is measured to be \BF. The dominant intermediate process is $D^0 \to K^{*}(892)^+K^{*}(892)^-$, with a branching fraction of $(2.79 \pm 0.13_{\rm{stat.}} \pm 0.11_{\rm{syst.}}) \times 10^{-3}$. Amplitude analysis reveals that the $D^0 \to K^{*}(892)^+K^{*}(892)^-$ decay is S-wave dominant. The longitudinal polarization fraction of $D^0 \to K^{*}(892)^+ K^{*}(892)^-$ is measured to be $0.468\pm0.046_{\rm{stat.}}\pm0.011_{\rm{syst.}}$.
△ Less
Submitted 30 March, 2026; v1 submitted 26 March, 2026;
originally announced March 2026.
-
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
Authors:
Yicheng Zou,
Dongsheng Zhu,
Lin Zhu,
Tong Zhu,
Yunhua Zhou,
Peiheng Zhou,
Xinyu Zhou,
Dongzhan Zhou,
Zhiwang Zhou,
Yuhao Zhou,
Bowen Zhou,
Zhanping Zhong,
Zhijie Zhong,
Haiteng Zhao,
Penghao Zhao,
Xiaomeng Zhao,
Zhiyuan Zhao,
Yechen Zhang,
Jin Zhang,
Wenwei Zhang,
Hongjie Zhang,
Zhuo Zhang,
Wenlong Zhang,
Bo Zhang,
Chao Zhang
, et al. (152 additional authors not shown)
Abstract:
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertis…
▽ More
We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, including chemistry, materials, life sciences, and earth sciences. Achieving this massive scale is made possible by the robust infrastructure support of XTuner and LMDeploy, which facilitates highly efficient Reinforcement Learning (RL) training at the 1-trillion parameter level while ensuring strict precision consistency between training and inference. By seamlessly integrating these advancements, Intern-S1-Pro further fortifies the fusion of general and specialized intelligence, working as a Specializable Generalist, demonstrating its position in the top tier of open-source models for general capabilities, while outperforming proprietary models in the depth of specialized scientific tasks.
△ Less
Submitted 2 April, 2026; v1 submitted 26 March, 2026;
originally announced March 2026.
-
Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
Authors:
Haishan Ye
Abstract:
We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback.
In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points.
While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for str…
▽ More
We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback.
In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points.
While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for strongly convex functions still remained open as highlighted by \citet{agarwal2010optimal}. The primary challenge lies in the heavy-tailed nature of bandit gradient estimators, which makes standard concentration analysis difficult.
In this paper, we resolve this open challenge and provide the first high-probability regret bound of $O(d(\log T + \log(1/δ))/μ)$ for $μ$-strongly convex losses. Our result is minimax optimal with respect to both the time horizon $T$ and the dimension $d$.
△ Less
Submitted 5 April, 2026; v1 submitted 26 March, 2026;
originally announced March 2026.
-
Toward Physically Consistent Driving Video World Models under Challenging Trajectories
Authors:
Jiawei Zhou,
Zhenxin Zhu,
Lingyi Du,
Linye Lyu,
Lijun Zhou,
Zhanqian Wu,
Hongcheng Luo,
Zhuotao Tian,
Bing Wang,
Guang Chen,
Hangjun Ye,
Haiyang Sun,
Yu Li
Abstract:
Video generation models have shown strong potential as world models for autonomous driving simulation. However, existing approaches are primarily trained on real-world driving datasets, which mostly contain natural and safe driving scenarios. As a result, current models often fail when conditioned on challenging or counterfactual trajectories-such as imperfect trajectories generated by simulators…
▽ More
Video generation models have shown strong potential as world models for autonomous driving simulation. However, existing approaches are primarily trained on real-world driving datasets, which mostly contain natural and safe driving scenarios. As a result, current models often fail when conditioned on challenging or counterfactual trajectories-such as imperfect trajectories generated by simulators or planning systems-producing videos with severe physical inconsistencies and artifacts. To address this limitation, we propose PhyGenesis, a world model designed to generate driving videos with high visual fidelity and strong physical consistency. Our framework consists of two key components: (1) a physical condition generator that transforms potentially invalid trajectory inputs into physically plausible conditions, and (2) a physics-enhanced video generator that produces high-fidelity multi-view driving videos under these conditions. To effectively train these components, we construct a large-scale, physics-rich heterogeneous dataset. Specifically, in addition to real-world driving videos, we generate diverse challenging driving scenarios using the CARLA simulator, from which we derive supervision signals that guide the model to learn physically grounded dynamics under extreme conditions. This challenging-trajectory learning strategy enables trajectory correction and promotes physically consistent video generation. Extensive experiments demonstrate that PhyGenesis consistently outperforms state-of-the-art methods, especially on challenging trajectories. Our project page is available at: https://wm-research.github.io/PhyGenesis/.
△ Less
Submitted 1 April, 2026; v1 submitted 25 March, 2026;
originally announced March 2026.
-
Cross Section Measurements of $\bar{n}p \rightarrow K^{+}K^{-}π^{+}(π^{0})$ via Antineutrons Produced by $J/ψ\to p π^{-} \bar{n}$ Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (737 additional authors not shown)
Abstract:
Based on a novel method for producing antineutrons via $J/ψ$ decays, we report a study of $\bar{n}p$ inelastic scattering into final states containing kaons. The analysis uses $(10087\pm44)\times 10^6$ $J/ψ$ events collected at the BESIII detector operating at the BEPCII storage ring. Antineutrons are produced via $J/ψ\to p π^{-} \bar{n}$ decays and tagged by the detected protons and pions, result…
▽ More
Based on a novel method for producing antineutrons via $J/ψ$ decays, we report a study of $\bar{n}p$ inelastic scattering into final states containing kaons. The analysis uses $(10087\pm44)\times 10^6$ $J/ψ$ events collected at the BESIII detector operating at the BEPCII storage ring. Antineutrons are produced via $J/ψ\to p π^{-} \bar{n}$ decays and tagged by the detected protons and pions, resulting in antineutron momenta ranging from 0 to 1174~MeV/$c$, while target protons are provided by the hydrogen in the beam-pipe material. The cross sections of the reactions $\bar{n}p \rightarrow K^{+}K^{-}π^{+}$ and $\bar{n}p \rightarrow K^{+}K^{-}π^{+}π^{0}$ are measured to be $0.53^{+0.15}_{-0.12} \pm 0.08$~mb and $1.09^{+0.36}_{-0.30} \pm 0.31$~mb respectively, where the first uncertainties are statistical and the second systematic. Due to limited statistics, the intermediate states in these processes are not investigated. The observation of clean antineutron-proton scattering events indicates the potential of this approach for future investigations of antineutron-proton interactions.
△ Less
Submitted 25 March, 2026;
originally announced March 2026.
-
Amplitude Analysis of the Isospin-Violating Decay $J/ψ\rightarrowγηπ^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. -R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (736 additional authors not shown)
Abstract:
Using $(10087 \pm 44)\times 10^{6}$ $\jpsi$ events collected with the BESIII detector, we perform the first amplitude analysis of the process $\jpsi\toγη\piz$. The decay is dominated by the intermediate processes $\jpsi\to\piz \bo \left( \toγη\right)$, $\jpsi\to\pizρ(1450)^0 \left( \toγη\right)$ and $\jpsi\toηh_1(1170) \left( \toγ\piz\right)$. Contributions from $\jpsi\toγa_0(980)^0(\toη\piz)$,…
▽ More
Using $(10087 \pm 44)\times 10^{6}$ $\jpsi$ events collected with the BESIII detector, we perform the first amplitude analysis of the process $\jpsi\toγη\piz$. The decay is dominated by the intermediate processes $\jpsi\to\piz \bo \left( \toγη\right)$, $\jpsi\to\pizρ(1450)^0 \left( \toγη\right)$ and $\jpsi\toηh_1(1170) \left( \toγ\piz\right)$. Contributions from $\jpsi\toγa_0(980)^0(\toη\piz)$, $\jpsi\toγa_2(1320)^0(\toη\piz)$ and $\jpsi\toγa_2(1700)^0(\toη\piz)$ are observed with a statistical significance exceeding $5σ$, constituting the first observation of radiative transitions of $\jpsi$ to isospin-triplet scalar mesons. The total branching fraction of $\jpsi\toγη\piz$ is measured to be \num{25.7\pm0.3\pm1.5e-6}, where the first uncertainty is statistical and the second systematic. This result is consistent with the previous measurement, with the precision improved by more than a factor of two.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (678 additional authors not shown)
Abstract:
A search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ is conducted using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper limits on the branching fractions of $D^0\to γ\bar K_1(1270)^0$ and…
▽ More
A search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ is conducted using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper limits on the branching fractions of $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ at 90\% confidence level are determined to be $7.7\times10^{-4}$ and $3.9\times10^{-5}$, respectively. This represents the first test of the Vector Meson Dominance mechanism in the radiative decays of charmed mesons to axial-vector mesons.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Adsorption energies and decomposition barrier heights for ethylene carbonate on the surface of lithium from cluster-based quantum chemistry
Authors:
Ethan A. Vo,
Hung T. Vuong,
Zachary K. Goldsmith,
Hong-Zhou Ye,
Yujing Wei,
Sohang Kundu,
Ardavan Farahvash,
Garvit Agarwal,
Richard A. Friesner,
Timothy C. Berkelbach
Abstract:
For ethylene carbonate on the (100) surface of lithium, we calculate the adsorption energy in two binding motifs as well as the barrier height for a ring-opening decomposition reaction. We validate a scheme for producing results in the thermodynamic limit by correcting results obtained on finite lithium clusters containing only 40-100 atoms, which enables the use of hybrid density functionals, the…
▽ More
For ethylene carbonate on the (100) surface of lithium, we calculate the adsorption energy in two binding motifs as well as the barrier height for a ring-opening decomposition reaction. We validate a scheme for producing results in the thermodynamic limit by correcting results obtained on finite lithium clusters containing only 40-100 atoms, which enables the use of hybrid density functionals, the random-phase approximation, and correlated wavefunction theories such as coupled-cluster theory and auxiliary-field quantum Monte Carlo. We find that the high-level theories agree to within 2-5 kcal/mol and can therefore serve as benchmarks for more affordable methods. Using our reference data, we demonstrate that generalized gradient approximation functionals, such as PBE, are not sufficiently accurate for reaction barrier heights, and we identify $ω$B97X-V as an especially promising functional for the interfacial chemistry of electrolyte solvents at lithium metal anodes.
△ Less
Submitted 23 March, 2026;
originally announced March 2026.
-
The Golden Subspace: Where Efficiency Meets Generalization in Continual Test-Time Adaptation
Authors:
Guannan Lai,
Da-Wei Zhou,
Zhenguo Li,
Han-Jia Ye
Abstract:
Continual Test-Time Adaptation (CTTA) aims to enable models to adapt online to unlabeled data streams under distribution shift without accessing source data. Existing CTTA methods face an efficiency-generalization trade-off: updating more parameters improves adaptation but severely reduces online inference efficiency. An ideal solution is to achieve comparable adaptation with minimal feature updat…
▽ More
Continual Test-Time Adaptation (CTTA) aims to enable models to adapt online to unlabeled data streams under distribution shift without accessing source data. Existing CTTA methods face an efficiency-generalization trade-off: updating more parameters improves adaptation but severely reduces online inference efficiency. An ideal solution is to achieve comparable adaptation with minimal feature updates; we call this minimal subspace the golden subspace. We prove its existence in a single-step adaptation setting and show that it coincides with the row space of the pretrained classifier. To enable online maintenance of this subspace, we introduce the sample-wise Average Gradient Outer Product (AGOP) as an efficient proxy for estimating the classifier weights without retraining. Building on these insights, we propose Guided Online Low-rank Directional adaptation (GOLD), which uses a lightweight adapter to project features onto the golden subspace and learns a compact scaling vector while the subspace is dynamically updated via AGOP. Extensive experiments on classification and segmentation benchmarks, including autonomous-driving scenarios, demonstrate that GOLD attains superior efficiency, stability, and overall performance. Our code is available at https://github.com/AIGNLAI/GOLD.
△ Less
Submitted 23 March, 2026;
originally announced March 2026.
-
PlanaReLoc: Camera Relocalization in 3D Planar Primitives via Region-Based Structure Matching
Authors:
Hanqiao Ye,
Yuzhou Liu,
Yangdong Liu,
Shuhan Shen
Abstract:
While structure-based relocalizers have long strived for point correspondences when establishing or regressing query-map associations, in this paper, we pioneer the use of planar primitives and 3D planar maps for lightweight 6-DoF camera relocalization in structured environments. Planar primitives, beyond being fundamental entities in projective geometry, also serve as region-based representations…
▽ More
While structure-based relocalizers have long strived for point correspondences when establishing or regressing query-map associations, in this paper, we pioneer the use of planar primitives and 3D planar maps for lightweight 6-DoF camera relocalization in structured environments. Planar primitives, beyond being fundamental entities in projective geometry, also serve as region-based representations that encapsulate both structural and semantic richness. This motivates us to introduce PlanaReLoc, a streamlined plane-centric paradigm where a deep matcher associates planar primitives across the query image and the map within a learned unified embedding space, after which the 6-DoF pose is solved and refined under a robust framework. Through comprehensive experiments on the ScanNet and 12Scenes datasets across hundreds of scenes, our method demonstrates the superiority of planar primitives in facilitating reliable cross-modal structural correspondences and achieving effective camera relocalization without requiring realistically textured/colored maps, pose priors, or per-scene training. The code and data are available at https://github.com/3dv-casia/PlanaReLoc .
△ Less
Submitted 21 March, 2026;
originally announced March 2026.
-
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
Authors:
Hengwei Ye,
Yuanting Guan,
Yuxuan Ge,
Tianying Zhu,
Zhenhan Guan,
Yijia Zhong,
Yijing Zhang,
Han Zhang,
Yingna Wu,
Zheng Tian
Abstract:
Multimodal Large Language Models (MLLMs) combine the linguistic strengths of LLMs with the ability to process multimodal data, enbaling them to address a broader range of visual tasks. Because MLLMs aim at more general, human-like competence than language-only models, we take inspiration from the Wechsler Intelligence Scales - an established battery for evaluating children by decomposing intellige…
▽ More
Multimodal Large Language Models (MLLMs) combine the linguistic strengths of LLMs with the ability to process multimodal data, enbaling them to address a broader range of visual tasks. Because MLLMs aim at more general, human-like competence than language-only models, we take inspiration from the Wechsler Intelligence Scales - an established battery for evaluating children by decomposing intelligence into interpretable, testable abilities. We introduce KidGym, a comprehensive 2D grid-based benchmark for assessing five essential capabilities of MLLMs: Execution, Perception Reasoning, Learning, Memory and Planning. The benchmark comprises 12 unique tasks, each targeting at least one core capability, specifically designed to guage MLLMs' adaptability and developmental potential, mirroring the stages of children's cognitive growth. Additionally, our tasks encompass diverse scenarios and objects with randomly generated layouts, ensuring a more accurate and robust evluation of MLLM capabilities. KidGym is designed to be fully user-customizable and extensible, allowing researchers to create new evaluation scenarios and adjust difficuly levels to accommodate the rapidly growing MLLM community. Through the evaluation of state-of-the-art MLLMs using KidGym, we identified significant insights into model capabilities and revealed several limitations of current models. We release our benchmark at: https://bobo-ye.github.io/KidGym/.
△ Less
Submitted 1 April, 2026; v1 submitted 2 March, 2026;
originally announced March 2026.
-
Observation of $D_s^+ \to a_0(980)^+f_0(500)$ in the Amplitude Analysis of $D_s^+ \to π^+ π^0 π^0 η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (719 additional authors not shown)
Abstract:
We report the first observation of the decay $D_s^+ \to π^+π^0π^0η$ in a data set corresponding to an integrated luminosity of 7.33 fb$^{-1}$, collected in $e^+e^-$ collisions by the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV. An unexpectedly large branching fraction…
▽ More
We report the first observation of the decay $D_s^+ \to π^+π^0π^0η$ in a data set corresponding to an integrated luminosity of 7.33 fb$^{-1}$, collected in $e^+e^-$ collisions by the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV. An unexpectedly large branching fraction $\mathcal{B}( D_s^+ \to a_0(980)^+ f_0(500), a_0(980)^+ \to π^+η, f_0(500)\to π^0π^0) = (0.98 \pm 0.16_{\rm{stat.}} \pm 0.22_{\rm{syst.}})\%$ is measured with a significance exceeding $10σ$, offering new constraints on the internal structure of light scalar mesons. The dominant intermediate process is $D_s^+ \to a_1(1260)^+η, a_1(1260)^+\to ρ(770)^+π^0$ with a branching fraction of $(1.77 \pm 0.21_{\rm stat.} \pm 0.12_{\rm syst.})\%$. The isospin symmetry has been validated to the decays of $a_1(1260)^+\to ρ(770)^0π^+$ and $a_1(1260)^+\to ρ(770)^+π^0$. Moreover, the measured $\mathcal{B}(D_s^+\to π^+π^0π^0η|_{\rm{non}-η^\prime})=(2.97 \pm 0.23_{\rm stat.} \pm 0.14_{\rm sys.})$ reduces the undetected $D_s^+ \to ηX$ decay branching fractions to (0.1 $\pm$ 3.1)\%.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
Deep Tabular Representation Corrector
Authors:
Hangting Ye,
Peng Wang,
Wei Fan,
Xiaozhuang Song,
He Zhao,
Dandan Gun,
Yi Chang
Abstract:
Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. The recent success of deep learning has fostered many deep networks (e.g., Transformer, ResNet) based tabular learning methods. Generally, existing deep tabular machine learning methods are along with the two paradigms, i.e., in-learning and pre-learning. In-learning…
▽ More
Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. The recent success of deep learning has fostered many deep networks (e.g., Transformer, ResNet) based tabular learning methods. Generally, existing deep tabular machine learning methods are along with the two paradigms, i.e., in-learning and pre-learning. In-learning methods need to train networks from scratch or impose extra constraints to regulate the representations which nonetheless train multiple tasks simultaneously and make learning more difficult, while pre-learning methods design several pretext tasks for pre-training and then conduct task-specific fine-tuning, which however need much extra training effort with prior knowledge. In this paper, we introduce a novel deep Tabular Representation Corrector, TRC, to enhance any trained deep tabular model's representations without altering its parameters in a model-agnostic manner. Specifically, targeting the representation shift and representation redundancy that hinder prediction, we propose two tasks, i.e., (i) Tabular Representation Re-estimation, that involves training a shift estimator to calculate the inherent shift of tabular representations to subsequently mitigate it, thereby re-estimating the representations and (ii) Tabular Space Mapping, that transforms the above re-estimated representations into a light-embedding vector space via a coordinate estimator while preserves crucial predictive information to minimize redundancy. The two tasks jointly enhance the representations of deep tabular models without touching on the original models thus enjoying high efficiency. Finally, we conduct extensive experiments on state-of-the-art deep tabular machine learning models coupled with TRC on various tabular benchmarks which have shown consistent superiority.
△ Less
Submitted 17 March, 2026;
originally announced March 2026.
-
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification
Authors:
MiroMind Team,
S. Bai,
L. Bing,
L. Lei,
R. Li,
X. Li,
X. Lin,
E. Min,
L. Su,
B. Wang,
L. Wang,
L. Wang,
S. Wang,
X. Wang,
Y. Zhang,
Z. Zhang,
G. Chen,
L. Chen,
Z. Cheng,
Y. Deng,
Z. Huang,
D. Ng,
J. Ni,
Q. Ren,
X. Tang
, et al. (19 additional authors not shown)
Abstract:
We present MiroThinker-1.7, a new research agent designed for complex long-horizon reasoning tasks. Building on this foundation, we further introduce MiroThinker-H1, which extends the agent with heavy-duty reasoning capabilities for more reliable multi-step problem solving. In particular, MiroThinker-1.7 improves the reliability of each interaction step through an agentic mid-training stage that e…
▽ More
We present MiroThinker-1.7, a new research agent designed for complex long-horizon reasoning tasks. Building on this foundation, we further introduce MiroThinker-H1, which extends the agent with heavy-duty reasoning capabilities for more reliable multi-step problem solving. In particular, MiroThinker-1.7 improves the reliability of each interaction step through an agentic mid-training stage that emphasizes structured planning, contextual reasoning, and tool interaction. This enables more effective multi-step interaction and sustained reasoning across complex tasks. MiroThinker-H1 further incorporates verification directly into the reasoning process at both local and global levels. Intermediate reasoning decisions can be evaluated and refined during inference, while the overall reasoning trajectory is audited to ensure that final answers are supported by coherent chains of evidence. Across benchmarks covering open-web research, scientific reasoning, and financial analysis, MiroThinker-H1 achieves state-of-the-art performance on deep research tasks while maintaining strong results on specialized domains. We also release MiroThinker-1.7 and MiroThinker-1.7-mini as open-source models, providing competitive research-agent capabilities with significantly improved efficiency.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
Learning from Mistakes: Post-Training for Driving VLA with Takeover Data
Authors:
Yinfeng Gao,
Deqing Liu,
Qichao Zhang,
Yupeng Zheng,
Haochen Tian,
Guang Li,
Hangjun Ye,
Long Chen,
Da-Wei Ding,
Dongbin Zhao
Abstract:
Current Vision-Language-Action (VLA) paradigms in end-to-end autonomous driving rely on offline training from static datasets, leaving them vulnerable to distribution shift. Recent post-training methods use takeover data to mitigate this by augmenting the dataset with high-quality expert takeover samples, yet they suffer from two key limitations: supervision restricted to the period after the take…
▽ More
Current Vision-Language-Action (VLA) paradigms in end-to-end autonomous driving rely on offline training from static datasets, leaving them vulnerable to distribution shift. Recent post-training methods use takeover data to mitigate this by augmenting the dataset with high-quality expert takeover samples, yet they suffer from two key limitations: supervision restricted to the period after the takeover moments leads to policies with limited safety margins, and passive preference optimization lacks active exploration for optimal performance. In this paper, we propose TakeVLA, a novel VLA post-training framework that overcomes these shortcomings through two complementary innovations. First, we introduce pre-takeover language supervision, which allows the VLA to learn from mistakes proactively. By explicitly teaching the model about what to do in error-prone situations, we cultivate a precautionary mindset that anticipates hazards early and substantially enlarges safety margins. Second, we propose Scenario Dreaming, a reinforcement fine-tuning paradigm that operates in reconstruceted takeover scenarios, encouraging active exploration beyond mere preference fitting. Experiments on the Bench2Drive benchmark demonstrate that TakeVLA achieves state-of-the-art closed-loop performance, surpassing the strong VLA baseline SimLingo by 4.93 in driving score, with an enhanced safety margin as evidenced by an 11.76% increase in average TTC.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning
Authors:
Yinfeng Gao,
Qichao Zhang,
Deqing Liu,
Zhongpu Xia,
Guang Li,
Kun Ma,
Guang Chen,
Hangjun Ye,
Long Chen,
Da-Wei Ding,
Dongbin Zhao
Abstract:
End-to-end autonomous driving policies based on Imitation Learning (IL) often struggle in closed-loop execution due to the misalignment between inadequate open-loop training objectives and real driving requirements. While Reinforcement Learning (RL) offers a solution by directly optimizing driving goals via reward signals, the rendering-based training environments introduce the rendering gap and a…
▽ More
End-to-end autonomous driving policies based on Imitation Learning (IL) often struggle in closed-loop execution due to the misalignment between inadequate open-loop training objectives and real driving requirements. While Reinforcement Learning (RL) offers a solution by directly optimizing driving goals via reward signals, the rendering-based training environments introduce the rendering gap and are inefficient due to high computational costs. To overcome these challenges, we present a novel Pseudo-simulation-based RL method for closed-loop end-to-end autonomous driving, PerlAD. Based on offline datasets, PerlAD constructs a pseudo-simulation that operates in vector space, enabling efficient, rendering-free trial-and-error training. To bridge the gap between static datasets and dynamic closed-loop environments, PerlAD introduces a prediction world model that generates reactive agent trajectories conditioned on the ego vehicle's plan. Furthermore, to facilitate efficient planning, PerlAD utilizes a hierarchical decoupled planner that combines IL for lateral path generation and RL for longitudinal speed optimization. Comprehensive experimental results demonstrate that PerlAD achieves state-of-the-art performance on the Bench2Drive benchmark, surpassing the previous E2E RL method by 10.29% in Driving Score without requiring expensive online interactions. Additional evaluations on the DOS benchmark further confirm its reliability in handling safety-critical occlusion scenarios.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
The Python Simulations of Chemistry Framework: 10 years of an open-source quantum chemistry project
Authors:
Qiming Sun,
Matthew R Hermes,
Xiaojie Wu,
Huanchen Zhai,
Xing Zhang,
Abdelrahman M. Ahmed,
Juan José Aucar,
Oliver J. Backhouse,
Samragni Banerjee,
Peng Bao,
Nikolay A. Bogdanov,
Kyle Bystrom,
Frédéric Chapoton,
Ning-Yuan Chen,
Ivan Yu. Chernyshov,
Helen S. Clifford,
Sander Cohen-Janes,
Zhi-Hao Cui,
Yann D. Damour,
Nike Dattani,
Linus Bjarne Dittmer,
Sebastian Ehlert,
Janus Juul Eriksen,
Francesco A. Evangelista,
Simon A. Ewing
, et al. (78 additional authors not shown)
Abstract:
Over the past decade, the Python-based Simulations of Chemistry Framework (PySCF) has developed into a widely used open-source platform for electronic structure theory and quantum chemical method development. This article reviews the major advances since the previous overview in 2020, covering new modules and methodology, infrastructure changes, and performance benchmarks.
Over the past decade, the Python-based Simulations of Chemistry Framework (PySCF) has developed into a widely used open-source platform for electronic structure theory and quantum chemical method development. This article reviews the major advances since the previous overview in 2020, covering new modules and methodology, infrastructure changes, and performance benchmarks.
△ Less
Submitted 7 April, 2026; v1 submitted 14 March, 2026;
originally announced March 2026.
-
Continual Learning in Large Language Models: Methods, Challenges, and Opportunities
Authors:
Hongyang Chen,
Zhongwu Sun,
Hongfei Ye,
Kunchi Li,
Xuemin Lin
Abstract:
Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensive overview of CL methodologies tailored for LLMs, structured around three core t…
▽ More
Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm inherent to modern LLMs. This survey presents a comprehensive overview of CL methodologies tailored for LLMs, structured around three core training stages: continual pre-training, continual fine-tuning, and continual alignment.Beyond the canonical taxonomy of rehearsal-, regularization-, and architecture-based methods, we further subdivide each category by its distinct forgetting mitigation mechanisms and conduct a rigorous comparative analysis of the adaptability and critical improvements of traditional CL methods for LLMs. In doing so, we explicitly highlight core distinctions between LLM CL and traditional machine learning, particularly with respect to scale, parameter efficiency, and emergent capabilities. Our analysis covers essential evaluation metrics, including forgetting rates and knowledge transfer efficiency, along with emerging benchmarks for assessing CL performance. This survey reveals that while current methods demonstrate promising results in specific domains, fundamental challenges persist in achieving seamless knowledge integration across diverse tasks and temporal scales. This systematic review contributes to the growing body of knowledge on LLM adaptation, providing researchers and practitioners with a structured framework for understanding current achievements and future opportunities in lifelong learning for language models.
△ Less
Submitted 13 March, 2026;
originally announced March 2026.
-
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing
Authors:
Baifeng Shi,
Stephanie Fu,
Long Lian,
Hanrong Ye,
David Eigen,
Aaron Reite,
Boyi Li,
Jan Kautz,
Song Han,
David M. Chan,
Pavlo Molchanov,
Trevor Darrell,
Hongxu Yin
Abstract:
Multi-modal large language models (MLLMs) have advanced general-purpose video understanding but struggle with long, high-resolution videos -- they process every pixel equally in their vision transformers (ViTs) or LLMs despite significant spatiotemporal redundancy. We introduce AutoGaze, a lightweight module that removes redundant patches before processed by a ViT or an MLLM. Trained with next-tok…
▽ More
Multi-modal large language models (MLLMs) have advanced general-purpose video understanding but struggle with long, high-resolution videos -- they process every pixel equally in their vision transformers (ViTs) or LLMs despite significant spatiotemporal redundancy. We introduce AutoGaze, a lightweight module that removes redundant patches before processed by a ViT or an MLLM. Trained with next-token prediction and reinforcement learning, AutoGaze autoregressively selects a minimal set of multi-scale patches that can reconstruct the video within a user-specified error threshold, eliminating redundancy while preserving information. Empirically, AutoGaze reduces visual tokens by 4x-100x and accelerates ViTs and MLLMs by up to 19x, enabling scaling MLLMs to 1K-frame 4K-resolution videos and achieving superior results on video benchmarks (e.g., 67.0% on VideoMME). Furthermore, we introduce HLVid: the first high-resolution, long-form video QA benchmark with 5-minute 4K-resolution videos, where an MLLM scaled with AutoGaze improves over the baseline by 10.1% and outperforms the previous best MLLM by 4.5%. Project page: https://autogaze.github.io/.
△ Less
Submitted 12 March, 2026;
originally announced March 2026.
-
ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning
Authors:
Lingxiao Tang,
He Ye,
Zhaoyang Chu,
Muyang Ye,
Zhongxin Liu,
Xiaoxue Ren,
Lingfeng Bao
Abstract:
Code LLMs still struggle with code execution reasoning, especially in smaller models. Existing methods rely on supervised fine-tuning (SFT) with teacher-generated explanations, primarily in two forms: (1) input-output (I/O) prediction chains and (2) natural-language descriptions of execution traces. However, intermediate execution steps cannot be explicitly verified during SFT, so the training obj…
▽ More
Code LLMs still struggle with code execution reasoning, especially in smaller models. Existing methods rely on supervised fine-tuning (SFT) with teacher-generated explanations, primarily in two forms: (1) input-output (I/O) prediction chains and (2) natural-language descriptions of execution traces. However, intermediate execution steps cannot be explicitly verified during SFT, so the training objective can reduce to merely matching teacher explanations. Moreover, training data is typically collected without explicit control over task difficulty. We introduce ExecVerify, which goes beyond text imitation by incorporating verifiable white-box rewards derived from execution traces, including next-statement prediction and variable value/type prediction. Our work first builds a dataset with multiple difficulty levels via constraint-based program synthesis. Then, we apply reinforcement learning (RL) to reward correct answers about both intermediate execution steps and final outputs, aligning the training objective with semantic correctness at each execution step. Finally, we adopt a two-stage training pipeline that first enhances execution reasoning and then transfers to code generation. Experiments demonstrate that a 7B model trained with ExecVerify achieves performance comparable to 32B models on code reasoning benchmarks and improves pass@1 by up to 5.9\% on code generation tasks over strong post-training baselines.
△ Less
Submitted 11 March, 2026;
originally announced March 2026.
-
$V_{0.5}$: Generalist Value Model as a Prior for Sparse RL Rollouts
Authors:
Yi-Kai Zhang,
Yueqing Sun,
Hongyan Hao,
Qi Gu,
Xunliang Cai,
De-Chuan Zhan,
Han-Jia Ye
Abstract:
In Reinforcement Learning with Verifiable Rewards (RLVR), constructing a robust advantage baseline is critical for policy gradients, effectively guiding the policy model to reinforce desired behaviors. Recent research has introduced Generalist Value Models (such as $V_0$), which achieve pre-trained value estimation by explicitly encoding model capabilities in-context, eliminating the need to synch…
▽ More
In Reinforcement Learning with Verifiable Rewards (RLVR), constructing a robust advantage baseline is critical for policy gradients, effectively guiding the policy model to reinforce desired behaviors. Recent research has introduced Generalist Value Models (such as $V_0$), which achieve pre-trained value estimation by explicitly encoding model capabilities in-context, eliminating the need to synchronously update the value model alongside the policy model. In this paper, we propose $V_{0.5}$, which adaptively fuses the baseline predicted by such value model (acting as a prior) with the empirical mean derived from sparse rollouts. This constructs a robust baseline that balances computational efficiency with extremely low variance. Specifically, we introduce a real-time statistical testing and dynamic budget allocation. This balances the high variance caused by sparse sampling against the systematic bias (or hallucinations) inherent in the value model's prior. By constructing a hypothesis test to evaluate the prior's reliability in real-time, the system dynamically allocates additional rollout budget on demand. This mechanism minimizes the baseline estimator's Mean Squared Error (MSE), guaranteeing stable policy gradients, even under extreme sparsity with a group size of 4. Extensive evaluations across six mathematical reasoning benchmarks demonstrate that $V_{0.5}$ significantly outperforms GRPO and DAPO, achieving faster convergence and over some 10% performance improvement.
△ Less
Submitted 11 March, 2026;
originally announced March 2026.
-
Amplitude Analysis of Singly Cabibbo-Suppressed Decay $Λ^{+}_{c}\to p K^{+} K^{-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (681 additional authors not shown)
Abstract:
Using a sample of $e^{+}e^{-}$ annihilation data corresponding to an integrated luminosity of 4.4 $\rm{fb}^{-1}$ collected with the BESIII detector at the BEPCII collider and produced at center-of-mass energies from $4600$ to $4698~\rm{MeV}$, an amplitude analysis is performed of the singly Cabibbo-suppressed decay $Λ^{+}_{c}\to pK^{+}K^{-}$. The branching fractions of $Λ^{+}_{c}\to pφ(1020)$,…
▽ More
Using a sample of $e^{+}e^{-}$ annihilation data corresponding to an integrated luminosity of 4.4 $\rm{fb}^{-1}$ collected with the BESIII detector at the BEPCII collider and produced at center-of-mass energies from $4600$ to $4698~\rm{MeV}$, an amplitude analysis is performed of the singly Cabibbo-suppressed decay $Λ^{+}_{c}\to pK^{+}K^{-}$. The branching fractions of $Λ^{+}_{c}\to pφ(1020)$, $pf_{0}(980)$, $Λ(1405)K^{+}$, and $Λ(1670)K^{+}$ are measured, where the latter two modes are decays that are observed for the first time. At the same time, with the detection efficiency based on the results of the amplitude analysis, the branching fraction of $Λ^{+}_{c}\to pK^{+}K^{-}$ is updated to be $(9.94\pm0.65_{\text{stat.}}\pm0.50_{\text{syst.}})\times10^{-4}$, which is consistent with the current world average value within one standard deviation. The result supersedes the previous BESIII measurement with precision improved by approximately a factor of 1.5.
△ Less
Submitted 9 March, 2026;
originally announced March 2026.
-
An improved measurement of $η^\prime\rightarrow e^{+}e^{-}ω$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (751 additional authors not shown)
Abstract:
Using a sample of $(10087 \pm 44) \times 10^{6}$ $J/ψ$ events collected with the BESIII detector, an improved measurement of the decay $η^{\prime}\rightarrow e^{+}e^{-}ω$, with $ω\rightarrowπ^{+}π^{-}π^{0}$ and $π^{0}\rightarrowγγ$ is performed. The branching fraction is determined to be $\mathcal{B}(η^{\prime}\rightarrow e^{+}e^{-}ω) = (1.79 \pm 0.09 \pm 0.12) \times 10^{-4}$, where the first unc…
▽ More
Using a sample of $(10087 \pm 44) \times 10^{6}$ $J/ψ$ events collected with the BESIII detector, an improved measurement of the decay $η^{\prime}\rightarrow e^{+}e^{-}ω$, with $ω\rightarrowπ^{+}π^{-}π^{0}$ and $π^{0}\rightarrowγγ$ is performed. The branching fraction is determined to be $\mathcal{B}(η^{\prime}\rightarrow e^{+}e^{-}ω) = (1.79 \pm 0.09 \pm 0.12) \times 10^{-4}$, where the first uncertainty is statistical and the second is systematic. This result is consistent with the previous measurement and is obtained with significantly improved precision. Furthermore, the first measurement of the transition form factor cutoff parameter for this decay is reported, with $Λ^{-1} = (2.92 \pm 0.83 \pm 0.15)~\text{GeV}^{-1}$. These measurements provide valuable input for understanding the internal structure of the $η^{\prime}$ meson and testing theoretical models.
△ Less
Submitted 9 March, 2026;
originally announced March 2026.
-
Echo: Graph-Enhanced Retrieval and Execution Feedback for Issue Reproduction Test Generation
Authors:
Zhiwei Fei,
Yue Pan,
Federica Sarro,
Jidong Ge,
Marc Liu,
Vincent Ng,
He Ye
Abstract:
Identifying the root cause of a bug remains difficult for many developers because bug reports often lack a bug reproducing test case that reliably triggers the failure. Manually writing such test cases is time-consuming and requires substantial effort to understand the codebase and isolate the failing behavior. To address this challenge, we propose Echo, an agent for generating issue reproducing t…
▽ More
Identifying the root cause of a bug remains difficult for many developers because bug reports often lack a bug reproducing test case that reliably triggers the failure. Manually writing such test cases is time-consuming and requires substantial effort to understand the codebase and isolate the failing behavior. To address this challenge, we propose Echo, an agent for generating issue reproducing test cases, which advances previous work in several ways. During generation, Echo strengthens context retrieval by leveraging a code graph and a novel automatic query-refinement strategy. Echo also improves upon previous tools by automatically executing generated test cases, a first-of-its-kind feature that seamlessly integrates into practical development workflows. In addition, Echo generates potential patches and uses the patched version to validate whether a candidate test meets the fail-to-pass criterion and to provide actionable feedback for refinement. Unlike prior bug-reproduction agents that sample and rank multiple candidate tests, Echo generates a single test per issue, offering a better cost-performance trade-off. Experiments on SWT-Bench Verified show that Echo establishes a new state of the art among open-source approaches, achieving a 66.28% success rate.
△ Less
Submitted 7 March, 2026;
originally announced March 2026.
-
Shadows and Polarization Images of a Four-dimensional Gauss-Bonnet Black Hole Irradiated by a Thick Accretion Disk
Authors:
Xiao-Xiong Zeng,
Huan Ye,
Muhammad Israr Aslam,
Rabia Saleem
Abstract:
We adopt a general relativistic ray-tracing approach to study the shadows and polarization images of spherically symmetric Gauss-Bonnet (GB) black holes enveloped by geometrically thick accretion flows. Specifically, we adopt a phenomenological RIAF-like model and an analytical Hou disk model. In the RIAF-like model, increasing the GB coupling parameter $λ$ reduces both the size and brightness of…
▽ More
We adopt a general relativistic ray-tracing approach to study the shadows and polarization images of spherically symmetric Gauss-Bonnet (GB) black holes enveloped by geometrically thick accretion flows. Specifically, we adopt a phenomenological RIAF-like model and an analytical Hou disk model. In the RIAF-like model, increasing the GB coupling parameter $λ$ reduces both the size and brightness of the higher-order image, while increasing $θ$ alters the shape of the higher-order image and obscures the horizon's outline. The main difference between isotropic and anisotropic emission is that the latter produce distortion of the high-order image in the vertical direction, leading to an elliptical morphology. For the Hou disk model, due to specific regions being geometrically thinner with the conical approximation, the high-order images are narrower with the increase in $λ$ than the RIAF model. While increasing $θ$ enhances the brightness of the direct images outside the higher-order images, but hardly changes the size of the higher-order images, which is in sharp contrast to the RIAF model. Meanwhile, the Hou disk produces polarization patterns that trace the brightness configuration and are affected by $λ$ and $θ$, reflecting the intrinsic structure of spacetime. These results illustrate that intensity and polarization in thick-disk models provide probes of GB black holes and near-horizon accretion dynamics.
△ Less
Submitted 7 March, 2026;
originally announced March 2026.
-
Multi-channel joint analysis of the exotic charmonium-like state $T_{c\bar{c}}(4020)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (700 additional authors not shown)
Abstract:
This paper reports the first multi-channel joint analysis to identify the properties of the exotic charmonium-like state $T_{c\bar{c}}(4020)$ via the electron-positron annihilation process $e^{+}e^{-}\toπ^{+}T_{c\bar{c}}(4020)^{-}+c.c$. A partial wave analysis is performed simultaneously in three decay channels $T_{c\bar{c}}(4020)^{-}\to {D}^{*0}D^{*-}$, $π^{-}J/ψ$, and $π^{-}h_{c}$, based on data…
▽ More
This paper reports the first multi-channel joint analysis to identify the properties of the exotic charmonium-like state $T_{c\bar{c}}(4020)$ via the electron-positron annihilation process $e^{+}e^{-}\toπ^{+}T_{c\bar{c}}(4020)^{-}+c.c$. A partial wave analysis is performed simultaneously in three decay channels $T_{c\bar{c}}(4020)^{-}\to {D}^{*0}D^{*-}$, $π^{-}J/ψ$, and $π^{-}h_{c}$, based on data samples taken at $\sqrt{s}=4.395$ and $4.416\,\mathrm{GeV}$ with an integrated luminosity of $1598.9\,\mathrm{pb}^{-1}$ collected with the BESIII detector operating on the BEPCII collider. For the first time, the spin-parity of the $T_{c\bar{c}}(4020)^{-}$ is determined to be $J^{P}=1^{+}$ with a significance $11.7σ$. Pole positions are extracted on the Riemann sheets with three branch points in the complex energy plane. Furthermore, the relative branching fractions are obtained as $\mathcal{B}[T_{c\bar{c}}(4020)^{-}\toπ^{-}J/ψ]/\mathcal{B}[T_{c\bar{c}}(4020)^{-}\to{D}^{*0}D^{*-}]=(3.6\pm0.6\pm1.6)\times10^{-3}$ and $\mathcal{B}[T_{c\bar{c}}(4020)^{-}\toπ^{-}h_{c}]/\mathcal{B}[T_{c\bar{c}}(4020)^{-}\to{D}^{*0}D^{*-}]=(8.9\pm1.3\pm2.3)\times10^{-2}$, where the first uncertainties are statistical, and the second are systematic.
△ Less
Submitted 5 March, 2026;
originally announced March 2026.
-
A multi-center analysis of deep learning methods for video polyp detection and segmentation
Authors:
Noha Ghatwary,
Pedro Chavarias Solano,
Mohamed Ramzy Ibrahim,
Adrian Krenzer,
Frank Puppe,
Stefano Realdon,
Renato Cannizzaro,
Jiacheng Wang,
Liansheng Wang,
Thuy Nuong Tran,
Lena Maier-Hein,
Amine Yamlahi,
Patrick Godau,
Quan He,
Qiming Wan,
Mariia Kokshaikyna,
Mariia Dobko,
Haili Ye,
Heng Li,
Ragu B,
Antony Raj,
Hanaa Nagdy,
Osama E Salem,
James E. East,
Dominique Lamarque
, et al. (2 additional authors not shown)
Abstract:
Colonic polyps are well-recognized precursors to colorectal cancer (CRC), typically detected during colonoscopy. However, the variability in appearance, location, and size of these polyps complicates their detection and removal, leading to challenges in effective surveillance, intervention, and subsequently CRC prevention. The processes of colonoscopy surveillance and polyp removal are highly reli…
▽ More
Colonic polyps are well-recognized precursors to colorectal cancer (CRC), typically detected during colonoscopy. However, the variability in appearance, location, and size of these polyps complicates their detection and removal, leading to challenges in effective surveillance, intervention, and subsequently CRC prevention. The processes of colonoscopy surveillance and polyp removal are highly reliant on the expertise of gastroenterologists and occur within the complexities of the colonic structure. As a result, there is a high rate of missed detections and incomplete removal of colonic polyps, which can adversely impact patient outcomes. Recently, automated methods that use machine learning have been developed to enhance polyps detection and segmentation, thus helping clinical processes and reducing missed rates. These advancements highlight the potential for improving diagnostic accuracy in real-time applications, which ultimately facilitates more effective patient management. Furthermore, integrating sequence data and temporal information could significantly enhance the precision of these methods by capturing the dynamic nature of polyp growth and the changes that occur over time. To rigorously investigate these challenges, data scientists and experts gastroenterologists collaborated to compile a comprehensive dataset that spans multiple centers and diverse populations. This initiative aims to underscore the critical importance of incorporating sequence data and temporal information in the development of robust automated detection and segmentation methods. This study evaluates the applicability of deep learning techniques developed in real-time clinical colonoscopy tasks using sequence data, highlighting the critical role of temporal relationships between frames in improving diagnostic precision.
△ Less
Submitted 4 March, 2026;
originally announced March 2026.