Parcae: Scaling Laws For Stable Looped Language Models

Authors: Hayden Prairie, Zachary Novack, Taylor Berg-Kirkpatrick, Daniel Y. Fu

Abstract: Traditional fixed-depth architectures scale quality by increasing training FLOPs, typically through increased parameterization, at the expense of a higher memory footprint, or data. A potential alternative is looped architectures, which instead increase FLOPs by sending activations through a block of layers in a loop. While promising, existing recipes for training looped architectures can be unsta… ▽ More Traditional fixed-depth architectures scale quality by increasing training FLOPs, typically through increased parameterization, at the expense of a higher memory footprint, or data. A potential alternative is looped architectures, which instead increase FLOPs by sending activations through a block of layers in a loop. While promising, existing recipes for training looped architectures can be unstable, suffering from residual explosion and loss spikes. We address these challenges by recasting looping as a nonlinear time-variant dynamical system over the residual stream. Via a linear approximation to this system, we find that instability occurs in existing looped architectures as a result of large spectral norms in their injection parameters. To address these instability issues, we propose Parcae, a novel stable, looped architecture that constrains the spectral norm of the injection parameters via discretization of a negative diagonal parameterization. As a result, Parcae achieves up to 6.3% lower validation perplexity over prior large-scale looped models. Using our stable looped architecture, we investigate the scaling properties of looping as a medium to improve quality by increasing FLOPs in training and test-time. For training, we derive predictable power laws to scale FLOPs while keeping parameter count fixed. Our initial scaling laws suggest that looping and data should be increased in tandem, given a fixed FLOP budget. At test-time, we find that Parcae can use looping to scale compute, following a predictable, saturating exponential decay. When scaled up to 1.3B parameters, we find that Parcae improves CORE and Core-Extended quality by 2.99 and 1.18 points when compared to strong Transformer baselines under a fixed parameter and data budget, achieving a relative quality of up to 87.5% a Transformer twice the size. △ Less

Submitted 14 April, 2026; originally announced April 2026.

arXiv:2604.12944 [pdf, ps, other]

Distorted or Fabricated? A Survey on Hallucination in Video LLMs

Authors: Yiyang Huang, Yitian Zhang, Yizhou Wang, Mingyuan Zhang, Liang Shi, Huimin Zeng, Yun Fu

Abstract: Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video Large Language Models (Vid-LLMs), referring to outputs that appear plausible yet contradict the content of the input video. This survey presents a comprehensive analysis of hallucinations in Vid-LLMs and introduces a systematic taxonomy that categorizes them into two core types: dynamic d… ▽ More Despite significant progress in video-language modeling, hallucinations remain a persistent challenge in Video Large Language Models (Vid-LLMs), referring to outputs that appear plausible yet contradict the content of the input video. This survey presents a comprehensive analysis of hallucinations in Vid-LLMs and introduces a systematic taxonomy that categorizes them into two core types: dynamic distortion and content fabrication, each comprising two subtypes with representative cases. Building on this taxonomy, we review recent advances in the evaluation and mitigation of hallucinations, covering key benchmarks, metrics, and intervention strategies. We further analyze the root causes of dynamic distortion and content fabrication, which often result from limited capacity for temporal representation and insufficient visual grounding. These insights inform several promising directions for future work, including the development of motion-aware visual encoders and the integration of counterfactual learning techniques. This survey consolidates scattered progress to foster a systematic understanding of hallucinations in Vid-LLMs, laying the groundwork for building robust and reliable video-language systems. An up-to-date curated list of related works is maintained at https://github.com/hukcc/Awesome-Video-Hallucination . △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: ACL 2026 findings

arXiv:2604.12889 [pdf, ps, other]

Building reliable 3D photonic integrated circuits and cavities at the wafer scale

Authors: Yuhao Huang, Yunqi Fu, Yu Xia, Yuemin Li, Zheng Li, Yaoran Huang, Zhaoting Geng, Mingfei Liu, Chao Xiang

Abstract: Three-dimensional (3D) photonic integrated circuits (PIC) are emerging as an indispensable scheme for high density and multifunctional photonic systems. However, the wafer-scale scaling of PICs towards a 3D configuration is constrained by two key factors: (i) the trade-off between inter-layer taper efficiency and footprint, and (ii) wafer-scale uniformity of inter-layer transition loss. In this wo… ▽ More Three-dimensional (3D) photonic integrated circuits (PIC) are emerging as an indispensable scheme for high density and multifunctional photonic systems. However, the wafer-scale scaling of PICs towards a 3D configuration is constrained by two key factors: (i) the trade-off between inter-layer taper efficiency and footprint, and (ii) wafer-scale uniformity of inter-layer transition loss. In this work, we introduce etch-back assisted chemical mechanical polishing (E-CMP) to achieve high wafer-scale uniformity of the spacer layer. Moreover, we break the efficiency-footprint trade-off by demonstrating a novel $κ$-engineered taper, achieving a reliability metric that is 75\% higher than the traditional linearly tapered structure. Building on these design and fabrication developments, we enable reliable 3D PICs with typical loss of 0.077 and 0.068 dB/cm on two silicon nitride (SiN) waveguide layers and typical 3D transition loss as low as 6 mdB. Furthermore, the low 3D transition loss enables the first class of 3D high-Q optical cavities occupying two distinct device layers, providing new design space for high-Q optical cavities. The scalable fabrication process and design methodology provide routes for wafer-scale reliable 3D PICs that are promising in a series of applications ranging from photonic interconnects and computing networks to high-density photonic sensors and nonlinear photonics. △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: 10 pages, 4 figures

arXiv:2604.12566 [pdf, ps, other]

Scalable 3D silicon nitride photonic interposer for high-density optical interconnects

Authors: Yu Xia, Yuhao Huang, Yuemin Li, Jie Wang, Yunqi Fu, Yaoran Huang, Hongjie Liang, Hao Fang, Zheng Li, Mingfei Liu, Yitian Tong, Di Yu, Chao Xiang

Abstract: Modern computing workloads demand energy-efficient, high-bandwidth interconnects, motivating photonic interposers as an alternative to electrical links. Here we demonstrate a compact 3D silicon nitride (SiN) photonic interposer prototype comprising two routing layers, with the 3D routing scheme optimized by a global optimization algorithm. The 3D interposer realizes a fully connected 12-node optic… ▽ More Modern computing workloads demand energy-efficient, high-bandwidth interconnects, motivating photonic interposers as an alternative to electrical links. Here we demonstrate a compact 3D silicon nitride (SiN) photonic interposer prototype comprising two routing layers, with the 3D routing scheme optimized by a global optimization algorithm. The 3D interposer realizes a fully connected 12-node optical network that reduces the total number of intralayer crossings from 495 for all-planar routing to merely 150 (69.7% reduction), below the theoretical lower bound of 153 for all-planar interconnects. Comparing the two schemes, our 3D design achieves a 45.8% reduction experimentally in the average loss per waveguide. The proposed 3D routing architecture also features inherent symmetry and is scalable to higher node counts, flexible node placements, additional routing layers, and other operating wavelengths, enabling denser, lower-loss photonic interposers for next-generation scale-up and high-performance computing (HPC) systems. △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: 5 pages, 3 figures

arXiv:2604.12524 [pdf, ps, other]

Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, X. L. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (728 additional authors not shown)

Abstract: A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$,… ▽ More A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$. △ Less

Submitted 14 April, 2026; originally announced April 2026.

arXiv:2604.12455 [pdf, ps, other]

Sky-Ear: An Unmanned Aerial Vehicle-Enabled Victim Sound Detection and Localization System

Authors: Yi Hong, Mingyang Wang, Yalin Liu, Yaru Fu, Kevin Hung

Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly deployed in search-and-rescue (SAR) missions, yet continuous and reliable victim detection and localization remain challenging due to on-board hardware constraints. This paper designs an UAV-Enabled Victim Sound Detection and Localization System (called ``Sky-Ear'' for brevity) to achieve energy-efficient acoustic sensing and sound detection for SAR… ▽ More Unmanned Aerial Vehicles (UAVs) are increasingly deployed in search-and-rescue (SAR) missions, yet continuous and reliable victim detection and localization remain challenging due to on-board hardware constraints. This paper designs an UAV-Enabled Victim Sound Detection and Localization System (called ``Sky-Ear'' for brevity) to achieve energy-efficient acoustic sensing and sound detection for SAR. Based on a circular-shaped microphone array, two-stage (Sentinel and Responder) audio processing is developed for energy-consuming and highly reliable sound detection. A Masking autoencoder (MAE)-based sound detection method is designed in the Sentinel stage to analyze frequency-time acoustic features. For improved precision, a continuous localization method is designed by optimizing detected directions from multiple observations. Extensive simulation experiments are conducted to validate the system's performance in terms of victim detection accuracy and localization error. △ Less

Submitted 14 April, 2026; originally announced April 2026.

arXiv:2604.12450 [pdf, ps, other]

$\mathbb{Z}_{2}$ Skin Channels and Scale-Dependent Dynamical Quantum Phase Transitions

Authors: Yongxu Fu

Abstract: We analytically describe the dynamically separated $\mathbb{Z}_{2}$ skin channels (wavepacket evolutions) under periodic boundary condition (PBC) in non-Hermitian systems with anomalous time-reversal symmetry (ATRS), by combining the semiclassical worldline perspective with an enhanced understanding of skin effects. These channels, tied to the initial state and relevant symmetries, exhibit individ… ▽ More We analytically describe the dynamically separated $\mathbb{Z}_{2}$ skin channels (wavepacket evolutions) under periodic boundary condition (PBC) in non-Hermitian systems with anomalous time-reversal symmetry (ATRS), by combining the semiclassical worldline perspective with an enhanced understanding of skin effects. These channels, tied to the initial state and relevant symmetries, exhibit individually exponential-dominated time evolution in momentum space, where their amplitude maxima evolve toward the dominant momenta. In real space, they circulate around the one-dimensional (1D) chain, tracing semiclassical worldlines. Such circulations imply quantum revivals and dynamical quantum phase transitions (DQPTs) regardless of any wavepackets' phase interference, with the latter showing scale-dependent behavior, a feature distinct from conventional DQPTs. This work rigorously demonstrates our previous findings on worldline windings and the winding-control mechanism, confirming that the core physics is shared with the ordinary skin effect. △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: 6 pages, 2 figures. Supplemental Material is in preparation

arXiv:2604.12374 [pdf, ps, other]

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Authors: NVIDIA, :, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang , et al. (522 additional authors not shown)

Abstract: We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, a… ▽ More We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace. △ Less

Submitted 14 April, 2026; originally announced April 2026.

arXiv:2604.11998 [pdf, ps, other]

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Authors: Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timofte, Nicu Sebe, Mohamed Elhoseiny, Lingyi Hong, Mingxi Cheng, Xingqi He, Runze Li, Xingdong Sheng, Wenqiang Zhang, Jiacong Liu, Shu Luo, Yikai Qin, Yaze Zhao, Yongwei Jiang, Yixiong Zou, Zhe Zhang, Yang Yang , et al. (49 additional authors not shown)

Abstract: Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The chal… ▽ More Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The challenge received strong community interest, with 128 registered participants and a total of 696 submissions. Among them, 31 teams actively participated, and 19 teams submitted valid final results. Participants explored a wide range of strategies, introducing innovative methods that push the performance frontier under both open-source and closed-source tracks. This report presents a detailed overview of the NTIRE 2026 CD-FSOD Challenge, including a summary of the submitted approaches and an analysis of the final results across all participating teams. Challenge Codes: https://github.com/ohMargin/NTIRE2026_CDFSOD. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: accepted by CVPRW 26 @ NTIRE

arXiv:2604.10655 [pdf, ps, other]

LoViF 2026 The First Challenge on Weather Removal in Videos

Authors: Chenghao Qian, Xin Li, Yeying Jin, Shangguan Sun, Yilian Zhong, Yuxiang Chen, Shibo Yin, Yushun Fang, Xilei Zhu, Yahui Wang, Chen Lu, Ying Fu, Jianan Tian, Jifan Zhang, Chen Zhou, Junyang Jiang, Yuping Sun, Zhuohang Shi, Xiaojing Liu, Jiao Liu, Yatong Zhou, Shuai Liu, Qiang Deng, Jiajia Mi, Qianhao Luo , et al. (1 additional authors not shown)

Abstract: This paper presents a review of the LoViF 2026 Challenge on Weather Removal in Videos. The challenge encourages the development of methods for restoring clean videos from inputs degraded by adverse weather conditions such as rain and snow, with an emphasis on achieving visually plausible and temporally consistent results while preserving scene structure and motion dynamics. To support this task, w… ▽ More This paper presents a review of the LoViF 2026 Challenge on Weather Removal in Videos. The challenge encourages the development of methods for restoring clean videos from inputs degraded by adverse weather conditions such as rain and snow, with an emphasis on achieving visually plausible and temporally consistent results while preserving scene structure and motion dynamics. To support this task, we introduce a new short-form WRV dataset tailored for video weather removal. It consists of 18 videos 1,216 synthesized frames paired with 1,216 real-world ground-truth frames at a resolution of 832 x 480, and is split into training, validation, and test sets with a ratio of 1:1:1. The goal of this challenge is to advance robust and realistic video restoration under real-world weather conditions, with evaluation protocols that jointly consider fidelity and perceptual quality. The challenge attracted 37 participants and received 5 valid final submissions with corresponding fact sheets, contributing to progress in weather removal for videos. The project is publicly available at https://www.codabench.org/competitions/13462/. △ Less

Submitted 14 April, 2026; v1 submitted 12 April, 2026; originally announced April 2026.

Comments: CVPR Workshop Challenge Report

arXiv:2604.10523 [pdf, ps, other]

Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, H. R. Bao, X. L. Bao, M. Barbagiovanni, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (741 additional authors not shown)

Abstract: Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$,… ▽ More Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$, $\mathcal{B}(χ_{c1} \to π^{+}π^{-}π^{0}π^{0}) = (1.16 \pm 0.01 \pm 0.05) \times 10^{-2}$, and $\mathcal{B}(χ_{c2} \to π^{+}π^{-}π^{0}π^{0}) = (1.92 \pm 0.01 \pm 0.08) \times 10^{-2}$, where the first uncertainties are statistical and the second systematic. The dominant intermediate states are found to be $χ_{cJ}\toρ^+ρ^-$. These results supersede the previous most precise measurements and provide significantly improved precision. △ Less

Submitted 12 April, 2026; originally announced April 2026.

arXiv:2604.10444 [pdf, ps, other]

First Observation of \boldmath{$D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$} Decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, X. L. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (734 additional authors not shown)

Abstract: We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measure… ▽ More We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measured to be $(3.20\pm0.06_{\text{stat.}}\pm0.03_{\text{syst.}})\times 10^{-3}$ and $(2.43 \pm 0.11_{\text{stat.}} \pm 0.04_{\text{syst.}}) \times 10^{-3}$, respectively. % , both achieving three times better precision than the current PDG values. The decay process $D^{+}\to a_0(980)^{+}f_0(500)$ is observed for the first time with an unexpectedly large branching fraction. Moreover, we observe the decays $D^+ \to a_0(980)^{+(0)} ρ(770)^{0(+)}$ and measure the ratio $r_{+/0} \equiv \frac{\mathcal{B}(D^+ \to a_0(980)^+ ρ(770)^0)}{\mathcal{B}(D^+ \to a_0(980)^0 ρ(770)^+)}$ for the first time to be $0.55\pm0.08_{\text{stat.}}\pm0.05_{\text{syst.}}$. These results offer a novel insight into our comprehension of the nature of the $a_0(980)$ and $f_0(500)$ states. △ Less

Submitted 11 April, 2026; originally announced April 2026.

arXiv:2604.10065 [pdf, ps, other]

ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models

Authors: Chi-Yuan Hsiao, Ke-Han Lu, Yu-Kuan Fu, Guan-Ting Lin, Hsiao-Tsung Hung, Hung-yi Lee

Abstract: End-to-end full-duplex Speech Language Models (SLMs) require precise turn-taking for natural interaction. However, optimizing temporal dynamics via standard raw-token reinforcement learning (RL) degrades semantic quality, causing severe generative collapse and repetition. We propose ASPIRin, an interactivity-optimized RL framework that explicitly decouples when to speak from what to say. Using Act… ▽ More End-to-end full-duplex Speech Language Models (SLMs) require precise turn-taking for natural interaction. However, optimizing temporal dynamics via standard raw-token reinforcement learning (RL) degrades semantic quality, causing severe generative collapse and repetition. We propose ASPIRin, an interactivity-optimized RL framework that explicitly decouples when to speak from what to say. Using Action Space Projection, ASPIRin maps the text vocabulary into a coarse-grained binary state (active speech vs. inactive silence). By applying Group Relative Policy Optimization (GRPO) with rule-based rewards, it balances user interruption and response latency. Empirical evaluations show ASPIRin optimizes interactivity across turn-taking, backchanneling, and pause handling. Crucially, isolating timing from token selection preserves semantic coherence and reduces the portion of duplicate n-grams by over 50% compared to standard GRPO, effectively eliminating degenerative repetition. △ Less

Submitted 11 April, 2026; originally announced April 2026.

arXiv:2604.08905 [pdf, ps, other]

StaRPO: Stability-Augmented Reinforcement Policy Optimization

Authors: Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Ruimin Dai, Xiaoyan Han, Yanjie Fu, Dakuo Wang, Kunpeng Liu

Abstract: Reinforcement learning (RL) is effective in enhancing the accuracy of large language models in complex reasoning tasks. Existing RL policy optimization frameworks rely on final-answer correctness as feedback signals and rarely capture the internal logical structure of the reasoning process. Consequently, the models would generate fluent and semantically relevant responses but logically inconsisten… ▽ More Reinforcement learning (RL) is effective in enhancing the accuracy of large language models in complex reasoning tasks. Existing RL policy optimization frameworks rely on final-answer correctness as feedback signals and rarely capture the internal logical structure of the reasoning process. Consequently, the models would generate fluent and semantically relevant responses but logically inconsistent, structurally erratic, or redundant. To this end, we propose StaRPO, a stability-augmented reinforcement learning framework that explicitly incorporates reasoning stability into the optimization objective. Our StaRPO decomposes stability into two computable lightweight metrics: the Autocorrelation Function (ACF) to evaluate local step-to-step coherence, and Path Efficiency (PE) to evaluate global goal-directedness of the reasoning trajectory. These stability rewards are combined with task rewards to provide complementary and process-aware feedback. We validate the effectiveness of using ACF and PE rewards by showing their correlation with logic errors on two backbone models. Experiments on four reasoning benchmarks show that StaRPO consistently outperforms compared baselines and can enhance both final-answer accuracy and logical stability. △ Less

Submitted 9 April, 2026; originally announced April 2026.

arXiv:2604.08453 [pdf, ps, other]

Hard-constrained Physics-informed Neural Networks for Interface Problems

Authors: Seung Whan Chung, Stephen Castonguay, Sumanta Roy, Michael Penwarden, Yucheng Fu, Pratanu Roy

Abstract: Physics-informed neural networks (PINNs) have emerged as a flexible framework for solving partial differential equations, but their performance on interface problems remains challenging because continuity and flux conditions are typically imposed through soft penalty terms. The standard soft-constraint formulation leads to imperfect interface enforcement and degraded accuracy near interfaces. We i… ▽ More Physics-informed neural networks (PINNs) have emerged as a flexible framework for solving partial differential equations, but their performance on interface problems remains challenging because continuity and flux conditions are typically imposed through soft penalty terms. The standard soft-constraint formulation leads to imperfect interface enforcement and degraded accuracy near interfaces. We introduce two ansatz-based hard-constrained PINN formulations for interface problems that embed the interface physics into the solution representation and thereby decouple interface enforcement from PDE residual minimization. The first, termed the windowing approach, constructs the trial space from compactly supported windowed subnetworks so that interface continuity and flux balance are satisfied by design. The second, called the buffer approach, augments unrestricted subnetworks with auxiliary buffer functions that enforce boundary and interface constraints at discrete points through a lightweight correction. We study these formulations on one- and two-dimensional elliptic interface benchmarks and compare them with soft-constrained baselines. In one-dimensional problems, hard constraints consistently improve interface fidelity and remove the need for loss-weight tuning; the windowing approach attains very high accuracy (as low as $O(10^{-9})$) on simple structured cases, whereas the buffer approach remains accurate ($\sim O(10^{-5})$) across a wider range of source terms and interface configurations. In two dimensions, the buffer formulation is shown to be more robust because it enforces constraints through a discrete buffer correction, as the windowing construction becomes more sensitive to overlap and corner effects and over-constrains the problem. This positions the buffer method as a straightforward and geometrically flexible approach to complex interface problems. △ Less

Submitted 9 April, 2026; originally announced April 2026.

Comments: 53 pages, 14 figures

Report number: 25-ERD-052, LLNL-JRNL-2010925 MSC Class: 68T07; 35J25

arXiv:2604.07734 [pdf, ps, other]

Resolving the 2024 Outburst of Magnetar 1E 1841-045 from its host Supernova Remnant with EP-FXT

Authors: Yu-Cong Fu, Lin Lin, Yu-Jia Zheng, Ming-Yu Ge, Han-Long Peng, Dong-Ming Li, Francesco Coti Zelati, Ersin Göǧüş, Nanda Rea, Bing Zhang, Wei-Wei Zhu, Ke-Jia Lee, Teruaki Enoto, Chryssa Kouveliotou

Abstract: The magnetar 1E 1841-045 exhibited a new active episode starting on August 20, 2024, marked by X-ray bursts and enhanced persistent emission. Using data from the Einstein Probe (EP), we report on the timing and spectral results following the onset of this outburst. The pulse profile displays a multi-peaked structure, with notable phase shifts in the secondary peak. Energy-resolved pulse profile an… ▽ More The magnetar 1E 1841-045 exhibited a new active episode starting on August 20, 2024, marked by X-ray bursts and enhanced persistent emission. Using data from the Einstein Probe (EP), we report on the timing and spectral results following the onset of this outburst. The pulse profile displays a multi-peaked structure, with notable phase shifts in the secondary peak. Energy-resolved pulse profile analysis indicates a transition in the dominant peak of the pulse profile above 5.8 keV. The 0.5-10 keV X-ray spectrum is well-modeled by a combined blackbody and power-law (BB+PL) model, showing a $\sim 20\%$ flux increase following the outburst. Phase-resolved spectroscopy indicates a correlation between BB temperature and pulse profile intensity, along with spectral hardening at a specific pulse phase. The high spatial resolution of EP enables effective separation of the supernova remnant emission, which is crucial for measuring the intrinsic pulse emission of the source. These findings underscore the intricate relationship between magnetar outbursts, pulse profile evolution, and spectral characteristics. △ Less

Submitted 8 April, 2026; originally announced April 2026.

Comments: Accepted by ApJ

arXiv:2604.06832 [pdf, ps, other]

Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM

Authors: Chengyue Wu, Shiyi Lan, Yonggan Fu, Sensen Gao, Jin Wang, Jincheng Yu, Jose M. Alvarez, Pavlo Molchanov, Ping Luo, Song Han, Ligeng Zhu, Enze Xie

Abstract: Vision-language models (VLMs) predominantly rely on autoregressive decoding, which generates tokens one at a time and fundamentally limits inference throughput. This limitation is especially acute in physical AI scenarios such as robotics and autonomous driving, where VLMs are deployed on edge devices at batch size one, making AR decoding memory-bandwidth-bound and leaving hardware parallelism und… ▽ More Vision-language models (VLMs) predominantly rely on autoregressive decoding, which generates tokens one at a time and fundamentally limits inference throughput. This limitation is especially acute in physical AI scenarios such as robotics and autonomous driving, where VLMs are deployed on edge devices at batch size one, making AR decoding memory-bandwidth-bound and leaving hardware parallelism underutilized. While block-wise discrete diffusion has shown promise for parallel text generation, extending it to VLMs remains challenging due to the need to jointly handle continuous visual representations and discrete text tokens while preserving pretrained multimodal capabilities. We present Fast-dVLM, a block-diffusion-based VLM that enables KV-cache-compatible parallel decoding and speculative block decoding for inference acceleration. We systematically compare two AR-to-diffusion conversion strategies: a two-stage approach that first adapts the LLM backbone with text-only diffusion fine-tuning before multimodal training, and a direct approach that converts the full AR VLM in one stage. Under comparable training budgets, direct conversion proves substantially more efficient by leveraging the already multimodally aligned VLM; we therefore adopt it as our recommended recipe. We introduce a suite of multimodal diffusion adaptations, block size annealing, causal context attention, auto-truncation masking, and vision efficient concatenation, that collectively enable effective block diffusion in the VLM setting. Extensive experiments across 11 multimodal benchmarks show Fast-dVLM matches its autoregressive counterpart in generation quality. With SGLang integration and FP8 quantization, Fast-dVLM achieves over 6x end-to-end inference speedup over the AR baseline. △ Less

Submitted 10 April, 2026; v1 submitted 8 April, 2026; originally announced April 2026.

arXiv:2604.05712 [pdf, ps, other]

Precise measurement of the CKM angle $γ$ with a novel approach

Authors: The BESIII, LHCb Collaborations, :, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, H. R. Bao, X. L. Bao, M. Barbagiovanni, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco , et al. (1936 additional authors not shown)

Abstract: A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from… ▽ More A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from ${B^{\pm}\rightarrow D(\rightarrow K_{\rm S}^{0} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays in LHCb data, where $h^{(\prime)}$ is either a pion or kaon, while the corresponding strong-phase parameters are measured using doubly tagged ${D\rightarrow K_{\rm S/L}^0 h^{\prime+} h^{\prime-}}$ decays in the quantum-correlated $D\overline{D}$ system present in BESIII data. A joint fit to both datasets, which allows for a simultaneous determination of the associated $C\!P$-violating observables and strong-phase parameters, yields ${γ= (71.3\pm 5.0)^{\circ}}$. The result is the most precise to date and consistent with previous measurements and world averages. △ Less

Submitted 7 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/5991/ (LHCb public pages)

Report number: LHCb-PAPER-2025-064, CERN-EP-2026-068

arXiv:2604.05701 [pdf, ps, other]

Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach

Authors: The BESIII, LHCb Collaborations, :, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, H. R. Bao, X. L. Bao, M. Barbagiovanni, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco , et al. (1936 additional authors not shown)

Abstract: A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider… ▽ More A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider II during 2010--2011 and 2021--2022, corresponding to an integrated luminosity of 8 fb$^{-1}$, and proton-proton collisions collected by the LHCb experiment at the Large Hadron Collider during 2011--2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. The two datasets are analyzed simultaneously by applying per-event weights based on the amplitude variation over the $D$-decay phase space to enhance the sensitivity to $C\!P$-violating observables. The CKM angle $γ$ is determined to be $γ= (71.3\pm 5.0)^{\circ}$, which constitutes the most precise single measurement to date. △ Less

Submitted 7 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/3989/ (LHCb public pages)

Report number: LHCb-PAPER-2025-063, CERN-EP-2026-067

arXiv:2604.04765 [pdf, ps, other]

Precision QCD with the Electron-Ion Collider

Authors: C. Alexandrou, M. Arratia, E. C. Aschenauer, A. Avkhadiev, P. V. Balachandran, V. Bertone, I. Borsa, M. Cerutti, X. Chu, W. Cosyn, D. de Florian, A. Dumitru, M. Engelhardt, R. Fatemi, S. Forte, Y. Fu, L. Gamberg, H. Gao, T. Gehrmann, A. Gehrmann-De Ridder, Y. Go, Y. Guo, Y. Hatta, J. Haug, T. J. Hobbs , et al. (44 additional authors not shown)

Abstract: This document summarizes the discussions at the program "Precision QCD with the Electron Ion Collider", held from May to June 2025 at the Institute for Nuclear Theory (INT) at the University of Washington. The program was co-sponsored by the INT and by the Center for Frontiers in Nuclear Science (CFNS, Stony Brook University). Over its five-week duration it brought together about 70 theorists, exp… ▽ More This document summarizes the discussions at the program "Precision QCD with the Electron Ion Collider", held from May to June 2025 at the Institute for Nuclear Theory (INT) at the University of Washington. The program was co-sponsored by the INT and by the Center for Frontiers in Nuclear Science (CFNS, Stony Brook University). Over its five-week duration it brought together about 70 theorists, experimentalists and computer scientists all interested in the physics program at the future Electron Ion Collider in preparation at Brookhaven National Laboratory. Key topics at the program were: higher-order perturbative-QCD calculations and techniques; nuclear structure and tomography; comparisons of phenomenological and lattice determinations of parton distribution functions; identification of signature observables for saturated gluons; assessment of the importance of AI techniques for EIC studies and detector development. △ Less

Submitted 6 April, 2026; originally announced April 2026.

Comments: Summary of the 2025 joint CFNS-INT program: Precision QCD with the Electron-Ion Collider. 165 pages, 35 figures

Report number: INT-PUB-26-011

arXiv:2604.04693 [pdf, ps, other]

3D Gaussian Splatting for Annular Dark Field Scanning Transmission Electron Microscopy Tomography Reconstruction

Authors: Beiyuan Zhang, Hesong Li, Ruiwen Shao, Ying Fu

Abstract: Analytical Dark Field Scanning Transmission Electron Microscopy (ADF-STEM) tomography reconstructs nanoscale materials in 3D by integrating multi-view tilt-series images, enabling precise analysis of their structural and compositional features. Although integrating more tilt views improves 3D reconstruction, it requires extended electron exposure that risks damaging dose-sensitive materials and in… ▽ More Analytical Dark Field Scanning Transmission Electron Microscopy (ADF-STEM) tomography reconstructs nanoscale materials in 3D by integrating multi-view tilt-series images, enabling precise analysis of their structural and compositional features. Although integrating more tilt views improves 3D reconstruction, it requires extended electron exposure that risks damaging dose-sensitive materials and introduces drift and misalignment, making it difficult to balance reconstruction fidelity with sample preservation. In practice, sparse-view acquisition is frequently required, yet conventional ADF-STEM methods degrade under limited views, exhibiting artifacts and reduced structural fidelity. To resolve these issues, in this paper, we adapt 3D GS to this domain with three key components. We first model the local scattering strength as a learnable scalar field, denza, to address the mismatch between 3DGS and ADF-STEM imaging physics. Then we introduce a coefficient $γ$ to stabilize scattering across tilt angles, ensuring consistent denza via scattering view normalization. Finally, We incorporate a loss function that includes a 2D Fourier amplitude term to suppress missing wedge artifacts in sparse-view reconstruction. Experiments on 45-view and 15-view tilt series show that DenZa-Gaussian produces high-fidelity reconstructions and 2D projections that align more closely with original tilts, demonstrating superior robustness under sparse-view conditions. △ Less

Submitted 6 April, 2026; originally announced April 2026.

arXiv:2604.04496 [pdf, ps, other]

The Indra Representation Hypothesis for Multimodal Alignment

Authors: Jianglin Lu, Hailing Wang, Kuo Yang, Yitian Zhang, Simon Jenni, Yun Fu

Abstract: Recent studies have uncovered an interesting phenomenon: unimodal foundation models tend to learn convergent representations, regardless of differences in architecture, training objectives, or data modalities. However, these representations are essentially internal abstractions of samples that characterize samples independently, leading to limited expressiveness. In this paper, we propose The Indr… ▽ More Recent studies have uncovered an interesting phenomenon: unimodal foundation models tend to learn convergent representations, regardless of differences in architecture, training objectives, or data modalities. However, these representations are essentially internal abstractions of samples that characterize samples independently, leading to limited expressiveness. In this paper, we propose The Indra Representation Hypothesis, inspired by the philosophical metaphor of Indra's Net. We argue that representations from unimodal foundation models are converging to implicitly reflect a shared relational structure underlying reality, akin to the relational ontology of Indra's Net. We formalize this hypothesis using the V-enriched Yoneda embedding from category theory, defining the Indra representation as a relational profile of each sample with respect to others. This formulation is shown to be unique, complete, and structure-preserving under a given cost function. We instantiate the Indra representation using angular distance and evaluate it in cross-model and cross-modal scenarios involving vision, language, and audio. Extensive experiments demonstrate that Indra representations consistently enhance robustness and alignment across architectures and modalities, providing a theoretically grounded and practical framework for training-free alignment of unimodal foundation models. Our code is available at https://github.com/Jianglin954/Indra. △ Less

Submitted 6 April, 2026; originally announced April 2026.

arXiv:2604.04135 [pdf, ps, other]

NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results

Authors: Shuhong Liu, Chenyu Bao, Ziteng Cui, Xuangeng Chu, Bin Ren, Lin Gu, Xiang Chen, Mingrui Li, Long Ma, Marcos V. Conde, Radu Timofte, Yun Liu, Ryo Umagami, Tomohiro Hashimoto, Zijian Hu, Yuan Gan, Tianhan Xu, Yusuke Kurose, Tatsuya Harada, Junwei Yuan, Gengjia Chang, Xining Ge, Mache You, Qida Cao, Zeliang Li , et al. (81 additional authors not shown)

Abstract: This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participa… ▽ More This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participants registered for the competition, of whom 33 teams submitted valid results. We thoroughly evaluate the submitted approaches against state-of-the-art baselines, revealing significant progress in 3D reconstruction under adverse conditions. Our analysis highlights shared design principles among top-performing methods and provides insights into effective strategies for handling 3D scene degradation. △ Less

Submitted 5 April, 2026; originally announced April 2026.

arXiv:2604.03477 [pdf, ps, other]

Towards Trans-Exponential O-minimal Expansion of $(\mathbb{R},+,\cdot, 0, 1 <)$

Authors: Yayi Fu

Abstract: We add an analytic trans-exponential function $\varphi$ to $\mathbb{R}_{an,\exp}$. We reduce the o-minimality of $\mathbb{R}_{an,\exp,\varphi}$ to the existence of "many" regular values for some definable systems of functions, which is a necessary condition for the o-minimality of $\mathbb{R}_{an,\exp,\varphi}$. We add an analytic trans-exponential function $\varphi$ to $\mathbb{R}_{an,\exp}$. We reduce the o-minimality of $\mathbb{R}_{an,\exp,\varphi}$ to the existence of "many" regular values for some definable systems of functions, which is a necessary condition for the o-minimality of $\mathbb{R}_{an,\exp,\varphi}$. △ Less

Submitted 3 April, 2026; originally announced April 2026.

arXiv:2604.02486 [pdf, ps, other]

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

Authors: Haz Sameen Shahgir, Xiaofu Chen, Yu Fu, Erfan Shayegani, Nael Abu-Ghazaleh, Yova Kementchedjhieva, Yue Dong

Abstract: Vision-language models (VLMs) have achieved impressive performance across a wide range of multimodal tasks. However, they often fail on tasks that require fine-grained visual perception, even when the required information is still present in their internal representations. Prior work has attributed this ``hidden-in-plain-sight'' gap to the language model, but the cause remains unexplained. In this… ▽ More Vision-language models (VLMs) have achieved impressive performance across a wide range of multimodal tasks. However, they often fail on tasks that require fine-grained visual perception, even when the required information is still present in their internal representations. Prior work has attributed this ``hidden-in-plain-sight'' gap to the language model, but the cause remains unexplained. In this work, we demonstrate that this gap arises from the language model's lack of semantic labels for fine-grained visual details: when visual entities can be mapped to known concepts, VLMs bypass visual comparison and reason through language; when they cannot, VLMs resort to brittle and hallucinated descriptions. We verify this across semantic correspondence, synthetic shape matching, and face matching, and find that VLMs perform much better when the relevant entities are nameable than when they are unnamable. Mechanistically, Logit Lens analysis confirms that VLMs explicitly recover semantic labels for nameable entities and surface more unique tokens compared to unnameable entities. Furthermore, we show that this limitation can be addressed: teaching completely arbitrary names for unknown entities improves performance. More importantly, task-specific finetuning yields even stronger generalization without relying on language priors, i.e. through real visual perception. Our findings suggest that current VLM failures on visual tasks reflect a learned shortcut rather than a fundamental limitation of multimodal reasoning. △ Less

Submitted 15 April, 2026; v1 submitted 2 April, 2026; originally announced April 2026.

arXiv:2604.02029 [pdf, ps, other]

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Authors: Xinlei Yu, Zhangquan Chen, Yongbo He, Tianyu Fu, Cheng Yang, Chengming Xu, Yue Ma, Xiaobin Hu, Zhe Cao, Jie Xu, Guibin Zhang, Jiale Tao, Jiayi Zhang, Siyuan Ma, Kaituo Feng, Haojie Huang, Youxing Li, Ronghao Chen, Huacan Wang, Chenglin Wu, Zikun Su, Xiaogang Xu, Kelu Yao, Kun Wang, Chen Gao , et al. (12 additional authors not shown)

Abstract: Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-readable verbal traces. This shift is driven by the structural limitations of expli… ▽ More Latent space is rapidly emerging as a native substrate for language-based models. While modern systems are still commonly understood through explicit token-level generation, an increasing body of work shows that many critical internal processes are more naturally carried out in continuous latent space than in human-readable verbal traces. This shift is driven by the structural limitations of explicit-space computation, including linguistic redundancy, discretization bottlenecks, sequential inefficiency, and semantic loss. This survey aims to provide a unified and up-to-date landscape of latent space in language-based models. We organize the survey into five sequential perspectives: Foundation, Evolution, Mechanism, Ability, and Outlook. We begin by delineating the scope of latent space, distinguishing it from explicit or verbal space and from the latent spaces commonly studied in generative visual models. We then trace the field's evolution from early exploratory efforts to the current large-scale expansion. To organize the technical landscape, we examine existing work through the complementary lenses of mechanism and ability. From the perspective of Mechanism, we identify four major lines of development: Architecture, Representation, Computation, and Optimization. From the perspective of Ability, we show how latent space supports a broad capability spectrum spanning Reasoning, Planning, Modeling, Perception, Memory, Collaboration, and Embodiment. Beyond consolidation, we discuss the key open challenges, and outline promising directions for future research. We hope this survey serves not only as a reference for existing work, but also as a foundation for understanding latent space as a general computational and systems paradigm for next-generation intelligence. △ Less

Submitted 2 April, 2026; originally announced April 2026.

arXiv:2604.02022 [pdf, ps, other]

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

Authors: Yu Li, Haoyu Luo, Yuejin Xie, Yuqian Fu, Zhonghao Yang, Shuai Shao, Qihan Ren, Wanying Qu, Yanwei Fu, Yujiu Yang, Jing Shao, Xia Hu, Dongrui Liu

Abstract: Evaluating the safety of LLM-based agents is increasingly important because risks in realistic deployments often emerge over multi-step interactions rather than isolated prompts or final responses. Existing trajectory-level benchmarks remain limited by insufficient interaction diversity, coarse observability of safety failures, and weak long-horizon realism. We introduce ATBench, a trajectory-leve… ▽ More Evaluating the safety of LLM-based agents is increasingly important because risks in realistic deployments often emerge over multi-step interactions rather than isolated prompts or final responses. Existing trajectory-level benchmarks remain limited by insufficient interaction diversity, coarse observability of safety failures, and weak long-horizon realism. We introduce ATBench, a trajectory-level benchmark for structured, diverse, and realistic evaluation of agent safety. ATBench organizes agentic risk along three dimensions: risk source, failure mode, and real-world harm. Based on this taxonomy, we construct trajectories with heterogeneous tool pools and a long-context delayed-trigger protocol that captures realistic risk emergence across multiple stages. The benchmark contains 1,000 trajectories (503 safe and 497 unsafe), averaging 9.01 turns and 3.95k tokens, with 1,954 invoked tools drawn from pools spanning 2,084 available tools. Data quality is supported by rule-based and LLM-based filtering plus full human audit. Experiments on frontier LLMs, open-source models, and specialized guard systems show that ATBench is challenging even for strong evaluators, while enabling taxonomy-stratified analysis, cross-benchmark comparison, and diagnosis of long-horizon failure patterns. △ Less

Submitted 8 April, 2026; v1 submitted 2 April, 2026; originally announced April 2026.

arXiv:2604.01934 [pdf, ps, other]

Rethinking Representations for Cross-Domain Infrared Small Target Detection: A Generalizable Perspective from the Frequency Domain

Authors: Yimin Fu, Songbo Wang, Feiyan Wu, Jialin Lyu, Zhunga Liu, Michael K. Ng

Abstract: The accurate target-background separation in infrared small target detection (IRSTD) highly depends on the discriminability of extracted representations. However, most existing methods are confined to domain-consistent settings, while overlooking whether such discriminability can generalize to unseen domains. In practice, distribution shifts between training and testing data are inevitable due to… ▽ More The accurate target-background separation in infrared small target detection (IRSTD) highly depends on the discriminability of extracted representations. However, most existing methods are confined to domain-consistent settings, while overlooking whether such discriminability can generalize to unseen domains. In practice, distribution shifts between training and testing data are inevitable due to variations in observational conditions and environmental factors. Meanwhile, the intrinsic indistinctiveness of infrared small targets aggravates overfitting to domain-specific patterns. Consequently, the detection performance of models trained on source domains can be severely degraded when deployed in unseen domains. To address this challenge, we propose a spatial-spectral collaborative perception network (S$^2$CPNet) for cross-domain IRSTD. Moving beyond conventional spatial learning pipelines, we rethink IRSTD representations from a frequency perspective and reveal inconsistencies in spectral phase as the primary manifestation of domain discrepancies. Based on this insight, we develop a phase rectification module (PRM) to derive generalizable target awareness. Then, we employ an orthogonal attention mechanism (OAM) in skip connections to preserve positional information while refining informative representations. Moreover, the bias toward domain-specific patterns is further mitigated through selective style recomposition (SSR). Extensive experiments have been conducted on three IRSTD datasets, and the proposed method consistently achieves state-of-the-art performance under diverse cross-domain settings. △ Less

Submitted 2 April, 2026; originally announced April 2026.

Comments: The code will be released at https://github.com/fuyimin96/S2CPNet upon acceptance

arXiv:2604.00453 [pdf, ps, other]

In-vivo entropy production of A. subaru

Authors: Yu Fu, Emmy Dobson, Benjamin B. Machta, Michael C. Abbott

Abstract: Entropy production is often used as a proxy for energy consumption of a non-equilibrium system. Lower bounds can be estimated from coarse-grained observations, and this has been done for various biological systems. Here, we apply these tools to a more macroscopic system whose true energy consumption is also known. We find that while entropy production does give a lower bound, it is some 25 orders… ▽ More Entropy production is often used as a proxy for energy consumption of a non-equilibrium system. Lower bounds can be estimated from coarse-grained observations, and this has been done for various biological systems. Here, we apply these tools to a more macroscopic system whose true energy consumption is also known. We find that while entropy production does give a lower bound, it is some 25 orders of magnitude away from being saturated. To be certain of this result, we survey different methods of estimating irreversibility, and write down a novel kNN estimator. △ Less

Submitted 1 April, 2026; originally announced April 2026.

Comments: 9 pages, 6 figures

arXiv:2604.00368 [pdf, ps, other]

TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving

Authors: Feng Ren, Ruoyu Qin, Teng Ma, Shangming Cai, Zheng Liu, Chao Lei, Dejiang Zhu, Ke Yang, Zheming Li, Jialei Cui, Weixiao Huang, Yikai Zhao, Yineng Zhang, Hao Wu, Xiang Gao, Yuhao Fu, Jinlei Jiang, Yongwei Wu, Mingxing Zhang

Abstract: Modern GPU clusters are built upon a complex hierarchy of heterogeneous interconnects, ranging from multi-rail RDMA to proprietary fabrics such as Multi-Node NVLink and Ascend UB. Orchestrating these diverse links effectively remains a critical challenge in disaggregated LLM serving. Operating Mooncake TE on thousands of GPUs exposed a critical limitation shared by existing frameworks: imperative,… ▽ More Modern GPU clusters are built upon a complex hierarchy of heterogeneous interconnects, ranging from multi-rail RDMA to proprietary fabrics such as Multi-Node NVLink and Ascend UB. Orchestrating these diverse links effectively remains a critical challenge in disaggregated LLM serving. Operating Mooncake TE on thousands of GPUs exposed a critical limitation shared by existing frameworks: imperative, statically bound path selection. This rigidity forces engines to rely on state-blind striping that ignores congestion signals, creating communication silos, wasting multi-rail bandwidth due to head-of-line blocking, and leading to operational fragility where routine faults require manual intervention. We present TENT, a data-movement engine that decouples transfer intent from physical execution. Instead of locking workloads to fixed backends, TENT unifies heterogeneous interconnects into a single dynamic resource pool. Applications simply declare transfer intents, while TENT dynamically decomposes elephant flows into fine-grained slices and "sprays" them across links based on instantaneous link quality. This telemetry-driven orchestration eliminates head-of-line blocking and enables transparent, sub-50 ms self-healing by rerouting slices around failures without application logic. TENT serves as the production data plane for LLM inference and RL pipelines at multiple industrial sites. Our evaluation on H800 HGX clusters shows that TENT outperforms state-of-the-art baselines, including Mooncake TE, NIXL, and UCCL. In LLM inference with SGLang HiCache, TENT achieves up to 1.36x higher throughput and 26% lower P90 TTFT than Mooncake TE. In RL pipelines, TENT accelerates parameter updates in Moonshot Checkpoint Engine by 20-26%. △ Less

Submitted 31 March, 2026; originally announced April 2026.

arXiv:2603.29854 [pdf, ps, other]

First energy scan measurement of $e^{+}e^{-}\to K^{+}K^{-}$ around the $ψ(2S)$ resonance

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, X. L. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (683 additional authors not shown)

Abstract: We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and el… ▽ More We report the first measurement of the $e^{+}e^{-}\to K^{+}K^{-}$ cross sections around the $ψ(2S)$ resonance using the energy scan method. The analysis is based on $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 495~pb$^{-1}$ collected with the BESIII detector at BEPCII. By analyzing the cross section line-shape, we extract the relative phase $Φ$ between the strong and electromagnetic amplitudes of the $ψ(2S)$ resonance, a fundamental parameter in charmonium physics, based on the assumption that the relative phase between the electromagnetic amplitude of the $ψ(2S)$ resonance and the continuum is zero. Two distinct solutions for the branching fraction $\mathcal{B}$ of $ψ(2S)\to K^{+}K^{-}$ are observed: a constructive interference solution with $\mathcal{B}=(7.49\pm0.41)\times10^{-5}$ and $Φ=(110.1 \pm6.7)^\circ$, and a destructive interference solution with $\mathcal{B}=(10.94\pm0.48)\times10^{-5}$ and $Φ=(-106.8\pm5.7)^\circ$. A significant correlation between $Φ$ and $\mathcal{B}$ is established, demonstrating that interference effects must be taken into account in the $ψ(2S)$ branching fraction measurements. Additionally, the first results for both the $ψ(2S)$ strong form factor, which characterizes the strong coupling between $ψ(2S)$ and $K^{+}K^{-}$, and the energy-dependent electromagnetic form factor of the charged kaon in this energy region are here reported. △ Less

Submitted 31 March, 2026; originally announced March 2026.

Comments: 9 pages, 4 figures

arXiv:2603.28232 [pdf, ps, other]

Observation of $Λ^+_c\to nπ^+η$ and search for $Λ^+_c\to na_0(980)^+$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, X. L. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (722 additional authors not shown)

Abstract: By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be… ▽ More By analysing 6.1 ${\rm fb}^{-1}$ of data collected at center-of-mass energies between $\sqrt{s}=4.600$ and 4.843 $\rm GeV$ with the BESIII detector at the BEPCII collider, we observe the decay $Λ_c^+\to nπ^+η$ for the first time with a statistical significance of $9.5σ$. The ratio of branching fractions $\mathcal{B}(Λ_c^+\to nπ^+η)/\mathcal{B}(Λ_c^+\to Λπ^+η)$ is measured to be $0.155\pm0.031_{\rm stat.}\pm0.012_{\rm syst.}$ Taking the world average of $\mathcal{B}(Λ_c^+\to Λπ^+η)$ as reference, the absolute branching fraction is calculated to be $\mathcal{B}(Λ_c^+\to nπ^+η)=(2.94\pm0.59_{\rm stat.}\pm0.23_{\rm syst.}\pm0.13_{\rm ref.})\times10^{-3}$. The intermediate process $Λ_c^+\to na_0(980)^+$ is also searched for in the $π^+η$ invariant mass spectrum. Since no significant signal is found, the upper limit on $\mathcal{B}(Λ_c^+\to na_0(980)^+)\times\mathcal{B}(a_0(980)^+\toπ^+η)$ is set to $8.4\times10^{-4}$ at 90\% confidence level. A sophisticated deep learning approach using a Transformer-based architecture is employed to distinguish signals from prevalent hadronic backgrounds, complemented by thorough validation and systematic uncertainty quantification. △ Less

Submitted 30 March, 2026; originally announced March 2026.

Comments: 25 pages, 6 figures

arXiv:2603.28020 [pdf, ps, other]

Physically Inspired Gaussian Splatting for HDR Novel View Synthesis

Authors: Huimin Zeng, Yue Bai, Hailing Wang, Yun Fu

Abstract: High dynamic range novel view synthesis (HDR-NVS) reconstructs scenes with dynamic details by fusing multi-exposure low dynamic range (LDR) views, yet it struggles to capture ambient illumination-dependent appearance. Implicitly supervising HDR content by constraining tone-mapped results fails in correcting abnormal HDR values, and results in limited gradients for Gaussians in under/over-exposed r… ▽ More High dynamic range novel view synthesis (HDR-NVS) reconstructs scenes with dynamic details by fusing multi-exposure low dynamic range (LDR) views, yet it struggles to capture ambient illumination-dependent appearance. Implicitly supervising HDR content by constraining tone-mapped results fails in correcting abnormal HDR values, and results in limited gradients for Gaussians in under/over-exposed regions. To this end, we introduce PhysHDR-GS, a physically inspired HDR-NVS framework that models scene appearance via intrinsic reflectance and adjustable ambient illumination. PhysHDR-GS employs a complementary image-exposure (IE) branch and Gaussian-illumination (GI) branch to faithfully reproduce standard camera observations and capture illumination-dependent appearance changes, respectively. During training, the proposed cross-branch HDR consistency loss provides explicit supervision for HDR content, while an illumination-guided gradient scaling strategy mitigates exposure-biased gradient starvation and reduces under-densified representations. Experimental results across realistic and synthetic datasets demonstrate our superiority in reconstructing HDR details (e.g., a PSNR gain of 2.04 dB over HDR-GS), while maintaining real-time rendering speed (up to 76 FPS). Code and models are available at https://huimin-zeng.github.io/PhysHDR-GS/. △ Less

Submitted 30 March, 2026; originally announced March 2026.

Comments: Accepted to CVPR 2026

arXiv:2603.27965 [pdf, ps, other]

ExFusion: Efficient Transformer Training via Multi-Experts Fusion

Authors: Jiacheng Ruan, Daize Dong, Xiaoye Qu, Tong Zhu, Ting Liu, Yuzhuo Fu, Yu Cheng, Suncheng Xiang

Abstract: Mixture-of-Experts (MoE) models substantially improve performance by increasing the capacity of dense architectures. However, directly training MoE models requires considerable computational resources and introduces extra overhead in parameter storage and deployment. Therefore, it is critical to develop an approach that leverages the multi-expert capability of MoE to enhance performance while incu… ▽ More Mixture-of-Experts (MoE) models substantially improve performance by increasing the capacity of dense architectures. However, directly training MoE models requires considerable computational resources and introduces extra overhead in parameter storage and deployment. Therefore, it is critical to develop an approach that leverages the multi-expert capability of MoE to enhance performance while incurring minimal additional cost. To this end, we propose a novel pre-training approach, termed ExFusion, which improves the efficiency of Transformer training through multi-expert fusion. Specifically, during the initialization phase, ExFusion upcycles the feed-forward network (FFN) of the Transformer into a multi-expert configuration, where each expert is assigned a weight for later parameter fusion. During training, these weights allow multiple experts to be fused into a single unified expert equivalent to the original FFN, which is subsequently used for forward computation. As a result, ExFusion introduces multi-expert characteristics into the training process while incurring only marginal computational cost compared to standard dense training. After training, the learned weights are used to integrate multi-experts into a single unified expert, thereby eliminating additional overhead in storage and deployment. Extensive experiments on a variety of computer vision and natural language processing tasks demonstrate the effectiveness of the proposed method. △ Less

Submitted 29 March, 2026; originally announced March 2026.

Comments: Accepted by IEEE TMM2026

arXiv:2603.25698 [pdf, ps, other]

Notes on Diagrammatic Coaction for Cosmological Wavefunction Coefficients: A Two-Site Prelude

Authors: Yuhan Fu, Jiahao Liu

Abstract: We study the coaction of cosmological wavefunction coefficients of conformally coupled scalars in FRW background of a two-site example, which turns out to have an elegant diagrammatic interpretation. We show how the coaction acts on the twisted integrals for wavefunction coefficients, decomposing them into contributions associated with subtopologies and cuts, with the subtopologies admitting an in… ▽ More We study the coaction of cosmological wavefunction coefficients of conformally coupled scalars in FRW background of a two-site example, which turns out to have an elegant diagrammatic interpretation. We show how the coaction acts on the twisted integrals for wavefunction coefficients, decomposing them into contributions associated with subtopologies and cuts, with the subtopologies admitting an interpretation as time-ordered integrals. This provides a clear interpretation of their analytic structure and suggests a broader applicability to more general cosmological diagrams. △ Less

Submitted 26 March, 2026; originally announced March 2026.

Comments: 18 pages

arXiv:2603.25649 [pdf, ps, other]

Amplitude analysis and branching fraction measurement of the decay $D^0 \to K^+K^-π^0π^0$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, M. S. Anderson, Y. Bai, O. Bakina, H. R. Bao, X. L. Bao, M. Barbagiovanni, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone , et al. (749 additional authors not shown)

Abstract: An amplitude analysis of the singly Cabibbo-suppressed decay $D^0 \to K^+ K^- π^0 π^0$ is performed, for the first time, to determine the relative magnitudes and phases of different intermediate processes. The analysis uses $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy 3.773~GeV corresponding to an integrated luminosity of 20.3 $\rm fb^{-1}$. The absolute… ▽ More An amplitude analysis of the singly Cabibbo-suppressed decay $D^0 \to K^+ K^- π^0 π^0$ is performed, for the first time, to determine the relative magnitudes and phases of different intermediate processes. The analysis uses $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy 3.773~GeV corresponding to an integrated luminosity of 20.3 $\rm fb^{-1}$. The absolute branching fraction of $D^0 \to K^+ K^- π^0 π^0$ is measured to be \BF. The dominant intermediate process is $D^0 \to K^{*}(892)^+K^{*}(892)^-$, with a branching fraction of $(2.79 \pm 0.13_{\rm{stat.}} \pm 0.11_{\rm{syst.}}) \times 10^{-3}$. Amplitude analysis reveals that the $D^0 \to K^{*}(892)^+K^{*}(892)^-$ decay is S-wave dominant. The longitudinal polarization fraction of $D^0 \to K^{*}(892)^+ K^{*}(892)^-$ is measured to be $0.468\pm0.046_{\rm{stat.}}\pm0.011_{\rm{syst.}}$. △ Less

Submitted 30 March, 2026; v1 submitted 26 March, 2026; originally announced March 2026.

arXiv:2603.25633 [pdf, ps, other]

Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

Authors: Liang Zhang, Yu Fu, Xinyi Jin

Abstract: Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship using the GSM8K and MATH subsets of PROCESSBENCH, a human-annotated benchmark for… ▽ More Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship using the GSM8K and MATH subsets of PROCESSBENCH, a human-annotated benchmark for identifying the earliest erroneous step in mathematical reasoning. We evaluate two LLM-based math tutor agent settings, instantiated with GPT-4 and GPT-5, in two independent tasks on the same math problems: solving the original problem and assessing a benchmark-provided solution by predicting the earliest erroneous step. Results show a consistent within-model pattern: assessment accuracy is substantially higher on math problem items the same model solved correctly than on items it solved incorrectly, with statistically significant associations across both models and datasets. At the same time, assessment remains more difficult than direct problem solving, especially on error-present solutions. These findings suggest that math problem-solving expertise supports stronger assessment performance, but reliable step-level diagnosis also requires additional capabilities such as step tracking, monitoring, and precise error localization. The results have implications for the design and evaluation of AI-supported Adaptive Instructional Systems (AISs) for formative assessment in math education. △ Less

Submitted 26 March, 2026; originally announced March 2026.

arXiv:2603.25562 [pdf, ps, other]

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Authors: Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Yuanheng Zhu, Dongbin Zhao

Abstract: On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token signal and becomes increasingly unreliable as rollouts drift away from prefixes the… ▽ More On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback on student-generated rollouts rather than fixed teacher traces. In long-horizon settings, however, the common sampled-token variant is fragile: it reduces distribution matching to a one-token signal and becomes increasingly unreliable as rollouts drift away from prefixes the teacher commonly visits. We revisit OPD from the estimator and implementation sides. Theoretically, token-level OPD is biased relative to sequence-level reverse-KL, but it has a much tighter worst-case variance bound; our toy study shows the same tradeoff empirically, with stronger future-reward coupling producing higher gradient variance and less stable learning. Empirically, we identify three failure modes of sampled-token OPD: an imbalanced one-token signal, unreliable teacher guidance on student-generated prefixes, and distortions caused by tokenizer or special-token mismatch. We address these issues with teacher top-K local support matching, implemented as truncated reverse-KL with top-p rollout sampling and special-token masking. Across single-task math reasoning and multi-task agentic-plus-math training, this objective yields more stable optimization and better downstream performance than sampled-token OPD. △ Less

Submitted 26 March, 2026; originally announced March 2026.

arXiv:2603.25085 [pdf, ps, other]

Beam Test Characterization of Silicon Microstrip Detector Flight-Model Ladders for the AMS-02 Upgrade

Authors: Dexing Miao, Giovanni Ambrosi, Mattia Barbanera, Baasansuren Batsukh, Hengyi Cai, Mengke Cai, Xudong Cai, Yuman Cai, Yuan-Hann Chang, Shanzhen Chen, Hsin-Yi Chou, Xingzhu Cui, Mingyi Dong, Matteo Duranti, Ke Gong, Mingjie Feng, Valerio Formato, Yisheng Fu, Daojin Hong, Maria Ionica, Xiaojie Jiang, Yaozu Jiang, Liangchenglong Jin, Shengjie Jin, Vladimir Koutsenko , et al. (34 additional authors not shown)

Abstract: The AMS-02 experiment plans to install a new silicon microstrip tracker layer (Layer-0) on top of the existing detector, increasing the cosmic-ray acceptance by a factor of 3. Layer-0 employs a design in which multiple silicon microstrip detectors (SSDs) are connected in series to form long detector ladders. We present a detailed performance study of the flight-model ladders using a 350~GeV mixed… ▽ More The AMS-02 experiment plans to install a new silicon microstrip tracker layer (Layer-0) on top of the existing detector, increasing the cosmic-ray acceptance by a factor of 3. Layer-0 employs a design in which multiple silicon microstrip detectors (SSDs) are connected in series to form long detector ladders. We present a detailed performance study of the flight-model ladders using a 350~GeV mixed hadron beam at the CERN SPS. The study focuses on the following aspects: (i) the performance of ladders with different numbers of SSDs, for which the intrinsic spatial resolution at normal incidence varies from $9.5~μ\mathrm{m}$ to $11.4~μ\mathrm{m}$ for ladders composed of 8 to 12 SSDs; (ii) the response consistency for particles impacting on the \emph{Head} and \emph{Tail} regions of the ladder; and (iii) the dependence of the detector performance on the particle incidence angle. △ Less

Submitted 26 March, 2026; originally announced March 2026.

arXiv:2603.24652 [pdf, ps, other]

Demystifying When Pruning Works via Representation Hierarchies

Authors: Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li

Abstract: Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. To understand this discrepancy, we analyze network pruning from a representation… ▽ More Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. To understand this discrepancy, we analyze network pruning from a representation-hierarchy perspective, decomposing the internal computation of language models into three sequential spaces: embedding (hidden representations), logit (pre-softmax outputs), and probability (post-softmax distributions). We find that representations in the embedding and logit spaces are largely robust to pruning-induced perturbations. However, the nonlinear transformation from logits to probabilities amplifies these deviations, which accumulate across time steps and lead to substantial degradation during generation. In contrast, the stability of the categorical-token probability subspace, together with the robustness of the embedding space, supports the effectiveness of pruning for non-generative tasks such as retrieval and multiple-choice selection. Our analysis disentangles the effects of pruning across tasks and provides practical guidance for its application. Code is available at https://github.com/CASE-Lab-UMD/Pruning-on-Representations △ Less

Submitted 6 April, 2026; v1 submitted 25 March, 2026; originally announced March 2026.

Comments: 27 pages, 21 figures, and 3 tables. Includes appendix with supplementary experiments and derivations

arXiv:2603.24272 [pdf, ps, other]

Cross Section Measurements of $\bar{n}p \rightarrow K^{+}K^{-}π^{+}(π^{0})$ via Antineutrons Produced by $J/ψ\to p π^{-} \bar{n}$ Decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, X. L. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (737 additional authors not shown)

Abstract: Based on a novel method for producing antineutrons via $J/ψ$ decays, we report a study of $\bar{n}p$ inelastic scattering into final states containing kaons. The analysis uses $(10087\pm44)\times 10^6$ $J/ψ$ events collected at the BESIII detector operating at the BEPCII storage ring. Antineutrons are produced via $J/ψ\to p π^{-} \bar{n}$ decays and tagged by the detected protons and pions, result… ▽ More Based on a novel method for producing antineutrons via $J/ψ$ decays, we report a study of $\bar{n}p$ inelastic scattering into final states containing kaons. The analysis uses $(10087\pm44)\times 10^6$ $J/ψ$ events collected at the BESIII detector operating at the BEPCII storage ring. Antineutrons are produced via $J/ψ\to p π^{-} \bar{n}$ decays and tagged by the detected protons and pions, resulting in antineutron momenta ranging from 0 to 1174~MeV/$c$, while target protons are provided by the hydrogen in the beam-pipe material. The cross sections of the reactions $\bar{n}p \rightarrow K^{+}K^{-}π^{+}$ and $\bar{n}p \rightarrow K^{+}K^{-}π^{+}π^{0}$ are measured to be $0.53^{+0.15}_{-0.12} \pm 0.08$~mb and $1.09^{+0.36}_{-0.30} \pm 0.31$~mb respectively, where the first uncertainties are statistical and the second systematic. Due to limited statistics, the intermediate states in these processes are not investigated. The observation of clean antineutron-proton scattering events indicates the potential of this approach for future investigations of antineutron-proton interactions. △ Less

Submitted 25 March, 2026; originally announced March 2026.

arXiv:2603.24176 [pdf, ps, other]

Modeling Spatiotemporal Neural Frames for High Resolution Brain Dynamic

Authors: Wanying Qu, Jianxiong Gao, Wei Wang, Yanwei Fu

Abstract: Capturing dynamic spatiotemporal neural activity is essential for understanding large-scale brain mechanisms. Functional magnetic resonance imaging (fMRI) provides high-resolution cortical representations that form a strong basis for characterizing fine-grained brain activity patterns. The high acquisition cost of fMRI limits large-scale applications, therefore making high-quality fMRI reconstruct… ▽ More Capturing dynamic spatiotemporal neural activity is essential for understanding large-scale brain mechanisms. Functional magnetic resonance imaging (fMRI) provides high-resolution cortical representations that form a strong basis for characterizing fine-grained brain activity patterns. The high acquisition cost of fMRI limits large-scale applications, therefore making high-quality fMRI reconstruction a crucial task. Electroencephalography (EEG) offers millisecond-level temporal cues that complement fMRI. Leveraging this complementarity, we present an EEG-conditioned framework for reconstructing dynamic fMRI as continuous neural sequences with high spatial fidelity and strong temporal coherence at the cortical-vertex level. To address sampling irregularities common in real fMRI acquisitions, we incorporate a null-space intermediate-frame reconstruction, enabling measurement-consistent completion of arbitrary intermediate frames and improving sequence continuity and practical applicability. Experiments on the CineBrain dataset demonstrate superior voxel-wise reconstruction quality and robust temporal consistency across whole-brain and functionally specific regions. The reconstructed fMRI also preserves essential functional information, supporting downstream visual decoding tasks. This work provides a new pathway for estimating high-resolution fMRI dynamics from EEG and advances multimodal neuroimaging toward more dynamic brain activity modeling. △ Less

Submitted 31 March, 2026; v1 submitted 25 March, 2026; originally announced March 2026.

Comments: CVPR 2026

arXiv:2603.23081 [pdf, ps, other]

Amplitude Analysis of the Isospin-Violating Decay $J/ψ\rightarrowγηπ^{0}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, H. -R. Bao, X. L. Bao, M. Barbagiovanni, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (736 additional authors not shown)

Abstract: Using $(10087 \pm 44)\times 10^{6}$ $\jpsi$ events collected with the BESIII detector, we perform the first amplitude analysis of the process $\jpsi\toγη\piz$. The decay is dominated by the intermediate processes $\jpsi\to\piz \bo \left( \toγη\right)$, $\jpsi\to\pizρ(1450)^0 \left( \toγη\right)$ and $\jpsi\toηh_1(1170) \left( \toγ\piz\right)$. Contributions from $\jpsi\toγa_0(980)^0(\toη\piz)$,… ▽ More Using $(10087 \pm 44)\times 10^{6}$ $\jpsi$ events collected with the BESIII detector, we perform the first amplitude analysis of the process $\jpsi\toγη\piz$. The decay is dominated by the intermediate processes $\jpsi\to\piz \bo \left( \toγη\right)$, $\jpsi\to\pizρ(1450)^0 \left( \toγη\right)$ and $\jpsi\toηh_1(1170) \left( \toγ\piz\right)$. Contributions from $\jpsi\toγa_0(980)^0(\toη\piz)$, $\jpsi\toγa_2(1320)^0(\toη\piz)$ and $\jpsi\toγa_2(1700)^0(\toη\piz)$ are observed with a statistical significance exceeding $5σ$, constituting the first observation of radiative transitions of $\jpsi$ to isospin-triplet scalar mesons. The total branching fraction of $\jpsi\toγη\piz$ is measured to be \num{25.7\pm0.3\pm1.5e-6}, where the first uncertainty is statistical and the second systematic. This result is consistent with the previous measurement, with the precision improved by more than a factor of two. △ Less

Submitted 24 March, 2026; originally announced March 2026.

Comments: 14 pages, 4 figures

arXiv:2603.22804 [pdf, ps, other]

Search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (678 additional authors not shown)

Abstract: A search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ is conducted using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper limits on the branching fractions of $D^0\to γ\bar K_1(1270)^0$ and… ▽ More A search for the radiative decays $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ is conducted using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ annihilation data collected at the center-of-mass energy $\sqrt{s}=3.773$ GeV by the BESIII detector operating at the BEPCII collider. No significant signals are observed, and upper limits on the branching fractions of $D^0\to γ\bar K_1(1270)^0$ and $D^+\to γK_1(1270)^+$ at 90\% confidence level are determined to be $7.7\times10^{-4}$ and $3.9\times10^{-5}$, respectively. This represents the first test of the Vector Meson Dominance mechanism in the radiative decays of charmed mesons to axial-vector mesons. △ Less

Submitted 24 March, 2026; originally announced March 2026.

Comments: 11 pages 5 figures 4 table

arXiv:2603.22529 [pdf, ps, other]

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

Authors: Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong

Abstract: Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This limitation prevents evaluation in crucial scenarios, such as when an agent must us… ▽ More Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This limitation prevents evaluation in crucial scenarios, such as when an agent must use egocentric visual perception (e.g., via AR glasses) to recognize an object in the user's surroundings and then complete a related task online. To address this gap, we introduce Ego2Web, the first benchmark designed to bridge egocentric video perception and web agent execution. Ego2Web pairs real-world first-person video recordings with web tasks that require visual understanding, web task planning, and interaction in an online environment for successful completion. We utilize an automatic data-generation pipeline combined with human verification and refinement to curate well-constructed, high-quality video-task pairs across diverse web task types, including e-commerce, media retrieval, knowledge lookup, etc. To facilitate accurate and scalable evaluation for our benchmark, we also develop a novel LLM-as-a-Judge automatic evaluation method, Ego2WebJudge, which achieves approximately 84% agreement with human judgment, substantially higher than existing evaluation methods. Experiments with diverse SoTA agents on our Ego2Web show that their performance is weak, with substantial headroom across all task categories. We also conduct a comprehensive ablation study on task design, highlighting the necessity of accurate video understanding in the proposed task and the limitations of current agents. We hope Ego2Web can be a critical new resource for developing truly capable AI assistants that can seamlessly see, understand, and act across the physical and digital worlds. △ Less

Submitted 23 March, 2026; originally announced March 2026.

Comments: CVPR 2026. Project page: https://ego2web.github.io/

arXiv:2603.22293 [pdf, ps, other]

TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

Authors: Yutao Xie, Nathaniel Thomas, Nicklas Hansen, Yang Fu, Li Erran Li, Xiaolong Wang

Abstract: Search-augmented large language models (LLMs) trained with reinforcement learning (RL) have achieved strong results on open-domain question answering (QA), but training still remains a significant challenge. The optimization is often unstable due to sparse rewards and difficult credit assignments across reasoning and tool calls. To address this, we introduce Turn-Level Information Potential Reward… ▽ More Search-augmented large language models (LLMs) trained with reinforcement learning (RL) have achieved strong results on open-domain question answering (QA), but training still remains a significant challenge. The optimization is often unstable due to sparse rewards and difficult credit assignments across reasoning and tool calls. To address this, we introduce Turn-Level Information Potential Reward Shaping (TIPS), a simple framework that assigns dense, turn-level rewards to each reasoning + tool-call segment based on the increased likelihood of the correct answer under a teacher model. By leveraging the potential-based reward shaping, TIPS offers fine-grained and policy-invariant guidance that overcomes the limitations of outcome-only optimization. Evaluated on seven QA benchmarks, TIPS consistently outperforms GRPO/PPO baselines and substantially improves training stability. For instance, with a Qwen-2.5 7B Instruct model, TIPS improves the average Exact Match score by 11.8% and F1 by 13.6% relative to PPO. Our results demonstrate that turn-level information-potential reward shaping provides an effective and general solution to sparse-reward credit assignment for multi-turn LLM reasoning. △ Less

Submitted 11 March, 2026; originally announced March 2026.

Comments: Code: https://github.com/ucsd-wang-lab-lm/tips

arXiv:2603.22281 [pdf, ps, other]

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Authors: Haichao Zhang, Yijiang Li, Shwai He, Tushar Nagarajan, Mingfei Chen, Jianglin Lu, Ang Li, Yun Fu

Abstract: Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level extrapolation, making it difficult to capture long-horizon semantics and reducing downstream utility. Vision--language… ▽ More Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level extrapolation, making it difficult to capture long-horizon semantics and reducing downstream utility. Vision--language models (VLMs), in contrast, provide strong semantic grounding and general knowledge by reasoning over uniformly sampled frames, but they are not ideal as standalone dense predictors due to compute-driven sparse sampling, a language-output bottleneck that compresses fine-grained interaction states into text-oriented representations, and a data-regime mismatch when adapting to small action-conditioned datasets. We propose a VLM-guided JEPA-style latent world modeling framework that combines dense-frame dynamics modeling with long-horizon semantic guidance via a dual-temporal pathway: a dense JEPA branch for fine-grained motion and interaction cues, and a uniformly sampled VLM \emph{thinker} branch with a larger temporal stride for knowledge-rich guidance. To transfer the VLM's progressive reasoning signals effectively, we introduce a hierarchical pyramid representation extraction module that aggregates multi-layer VLM representations into guidance features compatible with latent prediction. Experiments on hand-manipulation trajectory prediction show that our method outperforms both a strong VLM-only baseline and a JEPA-predictor baseline, and yields more robust long-horizon rollout behavior. △ Less

Submitted 23 March, 2026; originally announced March 2026.

Comments: 10 pages, 5 figures

MSC Class: 68T45; 68T07; 68U10; 68T10; 93C85; 93B40; 70Q05 ACM Class: I.2.9; I.2.10; I.2.6; I.2.7; I.4.8; I.4.9; I.2.8

arXiv:2603.22004 [pdf, ps, other]

A Fast Method for Correlated Updates of Proton PDFs and the Strong Coupling $α_s$

Authors: Yao Fu, Carl Schmidt, C. --P. Yuan

Abstract: We present an extended version of the \texttt{ePump} framework that enables the simultaneous profiling of proton parton distribution functions (PDFs) and the strong coupling $α_s$ using new experimental data. By promoting $α_s$ to a fit parameter within the Hessian updating formalism, the method performs coherent updates of $\{\text{PDFs},α_s\}$ while preserving parameter correlations and the full… ▽ More We present an extended version of the \texttt{ePump} framework that enables the simultaneous profiling of proton parton distribution functions (PDFs) and the strong coupling $α_s$ using new experimental data. By promoting $α_s$ to a fit parameter within the Hessian updating formalism, the method performs coherent updates of $\{\text{PDFs},α_s\}$ while preserving parameter correlations and the full covariance structure. Validation studies based on CTEQ-TEA analyses with collider data demonstrate that the upgraded \texttt{ePump} accurately reproduces the shifts in PDFs, the preferred $α_s(m_Z)$, and the associated uncertainty reductions obtained in full global fits, including those inferred from Lagrange--Multiplier scans; small deviations arise only for data sets whose $χ^2$ profiles exhibit nonlinear behavior. Applications to representative collider measurements illustrate the impact on the gluon distribution and on precision observables such as the Higgs boson production cross section via gluon fusion. This enhanced framework provides a fast and reliable tool for assessing the effects of new data on the global QCD parameter space, offering near-global-fit accuracy at a fraction of the computational cost. △ Less

Submitted 23 March, 2026; originally announced March 2026.

arXiv:2603.21621 [pdf, ps, other]

Proximal Policy Optimization in Path Space: A Schrödinger Bridge Perspective

Authors: Yuehu Gong, Zeyuan Wang, Yulin Chen, Yanwei Fu

Abstract: On-policy reinforcement learning with generative policies is promising but remains underexplored. A central challenge is that proximal policy optimization (PPO) is traditionally formulated in terms of action-space probability ratios, whereas diffusion- and flow-based policies are more naturally represented as trajectory-level generative processes. In this work, we propose GSB-PPO, a path-space for… ▽ More On-policy reinforcement learning with generative policies is promising but remains underexplored. A central challenge is that proximal policy optimization (PPO) is traditionally formulated in terms of action-space probability ratios, whereas diffusion- and flow-based policies are more naturally represented as trajectory-level generative processes. In this work, we propose GSB-PPO, a path-space formulation of generative PPO inspired by the Generalized Schrödinger Bridge (GSB). Our framework lifts PPO-style proximal updates from terminal actions to full generation trajectories, yielding a unified view of on-policy optimization for generative policies. Within this framework, we develop two concrete objectives: a clipping-based objective, GSB-PPO-Clip, and a penalty-based objective, GSB-PPO-Penalty. Experimental results show that while both objectives are compatible with on-policy training, the penalty formulation consistently delivers better stability and performance than the clipping counterpart. Overall, our results highlight path-space proximal regularization as an effective principle for training generative policies with PPO. △ Less

Submitted 23 March, 2026; originally announced March 2026.

Comments: 12 pages, 3figures

arXiv:2603.20930 [pdf, ps, other]

Causally-Guided Diffusion for Stable Feature Selection

Authors: Arun Vignesh Malarkkan, Xinyuan Wang, Kunpeng Liu, Denghui Zhang, Yanjie Fu

Abstract: Feature selection is fundamental to robust data-centric AI, but most existing methods optimize predictive performance under a single data distribution. This often selects spurious features that fail under distribution shifts. Motivated by principles from causal invariance, we study feature selection from a stability perspective and introduce Causally-Guided Diffusion for Stable Feature Selection (… ▽ More Feature selection is fundamental to robust data-centric AI, but most existing methods optimize predictive performance under a single data distribution. This often selects spurious features that fail under distribution shifts. Motivated by principles from causal invariance, we study feature selection from a stability perspective and introduce Causally-Guided Diffusion for Stable Feature Selection (CGDFS). In CGDFS, we formalized feature selection as approximate posterior inference over feature subsets, whose posterior mass favors low prediction error and low cross-environment variance. Our framework combines three key insights: First, we formulate feature selection as stability-aware posterior sampling. Here, causal invariance serves as a soft inductive bias rather than explicit causal discovery. Second, we train a diffusion model as a learned prior over plausible continuous selection masks, combined with a stability-aware likelihood that rewards invariance across environments. This diffusion prior captures structural dependencies among features and enables scalable exploration of the combinatorially large selection space. Third, we perform guided annealed Langevin sampling that combines the diffusion prior with the stability objective, which yields a tractable, uncertainty-aware posterior inference that avoids discrete optimization and produces robust feature selections. We evaluate CGDFS on open-source real-world datasets exhibiting distribution shifts. Across both classification and regression tasks, CGDFS consistently selects more stable and transferable feature subsets, which leads to improved out-of-distribution performance and greater selection robustness compared to sparsity-based, tree-based, and stability-selection baselines. △ Less

Submitted 21 March, 2026; originally announced March 2026.

Comments: 8 pages + references + appendix

Showing 1–50 of 2,692 results for author: Fu, Y