High-temperature charge-4e superconductivity in SU(4) interacting fermions

Authors: Shao-Hang Shi, Zhengzhi Wu, Jiangping Hu, Zi-Xiang Li

Abstract: The condensation of electron quartets, known as charge-4e superconductivity (SC), represents a novel quantum state of matter beyond the standard paradigm of Cooper pairing. However, concrete microscopic models realizing this phase in two dimensions remain a central challenge. Here, we introduce a non-engineered and sign-problem-free model, unambiguously demonstrating the emergence of a robust and… ▽ More The condensation of electron quartets, known as charge-4e superconductivity (SC), represents a novel quantum state of matter beyond the standard paradigm of Cooper pairing. However, concrete microscopic models realizing this phase in two dimensions remain a central challenge. Here, we introduce a non-engineered and sign-problem-free model, unambiguously demonstrating the emergence of a robust and high-temperature charge-4e SC phase using unbiased quantum Monte Carlo simulations. At zero temperature, the phase diagram reveals that charge-4e SC is the primary ground state in the strong-coupling regime. At finite temperature in the absence of charge-2e SC, we identify charge-4e SC through a Berezinskii-Kosterlitz-Thouless transition, marked by a universal jump in the superfluid stiffness consistent with a condensate of charge 4e. Remarkably, the transition temperature Tc increases nearly linearly with interaction strength, providing a robust mechanism for high-Tc quartet superconductivity. Furthermore, spectral analysis reveals a prominent pseudogap above Tc arising from strong phase fluctuations. Our results establish a canonical and numerically exact model system for charge-4e superconductivity, offering crucial guidance for its realization in experimental platforms such as moiré materials and ultracold atomic systems. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: 7 pages, 3 figures + 9 pages supplementary material

arXiv:2604.15040 [pdf, ps, other]

Study of the $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ decay

Authors: LHCb collaboration, R. Aaij, M. Abdelfatah, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, R. Aleksiejunas, F. Alessio, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1111 additional authors not shown)

Abstract: The decay $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ is studied at LHCb for the first time using proton-proton collision data recorded by the LHCb experiment at a center-of-mass energy of $\sqrt{s} = 13$ TeV, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. The branching ratio relative to the decay $B^+ \to Λ_c^+ \barΛ_c^- K^+$ is measured to be… ▽ More The decay $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ is studied at LHCb for the first time using proton-proton collision data recorded by the LHCb experiment at a center-of-mass energy of $\sqrt{s} = 13$ TeV, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. The branching ratio relative to the decay $B^+ \to Λ_c^+ \barΛ_c^- K^+$ is measured to be $$ \frac{{\cal B}(B^0 \to Λ_c^+ \barΛ_c^- K_S^0)}{{\cal B}(B^+ \to Λ_c^+ \barΛ_c^- K^+)} = 0.53 \pm 0.05 \pm 0.05, $$ where the first uncertainty is statistical and the second is systematic. Evidence is found for contributions from two resonant states, $Ξ_c(2923)^+$ and $Ξ_c(2939)^+$, in the $Λ_c^+ K_S^0$ system. The two states show a significance of $3.9σ$ relative to the nonresonant hypothesis. These two $Ξ_c^+$ states are consistent with being the isospin partners of the states observed in $Λ_c^+ K^-$ system. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/4095/ (LHCb public pages)

Report number: LHCb-PAPER-2025-072, CERN-EP-2026-073

arXiv:2604.15023 [pdf, ps, other]

DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

Authors: Ziyu Shan, Yuheng Zhou, Gaoyuan Wu, Ziheng Ji, Zhenyu Wu, Ziwei Wang

Abstract: Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization… ▽ More Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization problem due to shifts of docking points. To address this issue, we propose a novel low-cost demonstration generation framework named DockAnywhere, which improves viewpoint generalization under docking variability by lifting a single demonstration to diverse feasible docking configurations. Specifically, DockAnywhere lifts a trajectory to any feasible docking points by decoupling docking-dependent base motions from contact-rich manipulation skills that remain invariant across viewpoints. Feasible docking proposals are sampled under feasibility constraints, and corresponding trajectories are generated via structure-preserving augmentation. Visual observations are synthesized in 3D space by representing the robot and objects as point clouds and applying point-level spatial editing to ensure the consistency of observation and action across viewpoints. Extensive experiments on ManiSkill and real-world platforms demonstrate that DockAnywhere substantially improves policy success rates and easily generalizes to novel viewpoints from unseen docking points during training, significantly enhancing the generalization capability of mobile manipulation policy in real-world deployment. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: Accepted to RA-L

arXiv:2604.14866 [pdf]

doi 10.1177/00220345261424242

MetaDent: Labeling Clinical Images for Vision-Language Models in Dentistry

Authors: Meng-Xun Li, Wen-Hui Deng, Zhi-Xing Wu, Chun-Xiao Jin, Jia-Min Wu, Yue Han, James Kit Hon Tsoi, Gui-Song Xia, Cui Huang

Abstract: Vision-Language Models (VLMs) have demonstrated significant potential in medical image analysis, yet their application in intraoral photography remains largely underexplored due to the lack of fine-grained, annotated datasets and comprehensive benchmarks. To address this, we present MetaDent, a comprehensive resource that includes (1) a novel and large-scale dentistry image dataset collected from… ▽ More Vision-Language Models (VLMs) have demonstrated significant potential in medical image analysis, yet their application in intraoral photography remains largely underexplored due to the lack of fine-grained, annotated datasets and comprehensive benchmarks. To address this, we present MetaDent, a comprehensive resource that includes (1) a novel and large-scale dentistry image dataset collected from clinical, public, and web sources; (2) a semi-structured annotation framework designed to capture the hierarchical and clinically nuanced nature of dental photography; and (3) comprehensive benchmark suites for evaluating state-of-the-art VLMs on clinical image understanding. Our labeling approach combines a high-level image summary with point-by-point, free-text descriptions of abnormalities. This method enables rich, scalable, and task-agnostic representations. We curated 60,669 dental images from diverse sources and annotated a representative subset of 2,588 images using this meta-labeling scheme. Leveraging Large Language Models (LLMs), we derive standardized benchmarks: approximately 15K Visual Question Answering (VQA) pairs and an 18-class multi-label classification dataset, which we validated with human review and error analysis to justify that the LLM-driven transition reliably preserves fidelity and semantic accuracy. We then evaluate state-of-the-art VLMs across VQA, classification, and image captioning tasks. Quantitative results reveal that even the most advanced models struggle with a fine-grained understanding of intraoral scenes, achieving moderate accuracy and producing inconsistent or incomplete descriptions in image captioning. We publicly release our dataset, annotations, and tools to foster reproducible research and accelerate the development of vision-language systems for dental applications. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: Project website: https://menxli.github.io/metadent

Journal ref: Journal of Dental Research, p.00220345261424242 (2026)

arXiv:2604.14801 [pdf, ps, other]

doi 10.1088/1402-4896/ad0584

Coherence dynamics in quantum algorithm for linear systems of equations

Authors: Linlin Ye, Zhaoqi Wu, Shao-Ming Fei

Abstract: Quantum coherence is a fundamental issue in quantum mechanics and quantum information processing. We explore the coherence dynamics of the evolved states in HHL quantum algorithm for solving the linear system of equation $A\overrightarrow{x}=\overrightarrow{b}$. By using the Tsallis relative $α$ entropy of coherence and the $l_{1,p}$ norm of coherence, we show that the operator coherence of the ph… ▽ More Quantum coherence is a fundamental issue in quantum mechanics and quantum information processing. We explore the coherence dynamics of the evolved states in HHL quantum algorithm for solving the linear system of equation $A\overrightarrow{x}=\overrightarrow{b}$. By using the Tsallis relative $α$ entropy of coherence and the $l_{1,p}$ norm of coherence, we show that the operator coherence of the phase estimation $P$ relies on the coefficients $β_{i}$ obtained by decomposing $|b\rangle$ in the eigenbasis of $A$. We prove that the operator coherence of the inverse phase estimation $\widetilde{P}$ relies on the coefficients $β_{i}$, eigenvalues of $A$ and the success probability $P_{s}$, and it decreases with the increase of the probability when $α\in(1,2]$. Moreover, the variations of coherence deplete with the increase of the success probability and rely on the eigenvalues of $A$ as well as the success probability. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: 24 pages, 2 figures

Journal ref: Phys. Scr. 98 (2023) 125104

arXiv:2604.14692 [pdf, ps, other]

Chain-of-Glimpse: Search-Guided Progressive Object-Grounded Reasoning for Video Understanding

Authors: Zhixuan Wu, Quanxing Zha, Teng Wang, Genbao Xu, Wenyuan Gu, Wei Rao, Nan Ma, Bo Cheng, Soujanya Poria

Abstract: Video understanding requires identifying and reasoning over semantically discriminative visual objects across frames, yet existing object-agnostic solutions struggle to effectively handle substantial object variations over time. To address this, we introduce Chain-of-Glimpse, a search-guided progressive object-grounded reasoning framework that explicitly anchors each reasoning step to specific vis… ▽ More Video understanding requires identifying and reasoning over semantically discriminative visual objects across frames, yet existing object-agnostic solutions struggle to effectively handle substantial object variations over time. To address this, we introduce Chain-of-Glimpse, a search-guided progressive object-grounded reasoning framework that explicitly anchors each reasoning step to specific visual evidence regions, enabling compositional and multi-step decision-making. Formally, Chain-of-Glimpse formulates video reasoning as a step-by-step process that incrementally builds spatially grounded traces around task-relevant visual objects, thereby mitigating over-reliance on saliency-driven cues. Specifically, Chain-of-Glimpse features a search-guided controller, optimized via reinforcement learning with a format reward that significantly incentivizes grounding capability, to iteratively ground visual evidence regions and form reliable reasoning trajectories, yielding accurate and interpretable multi-step decisions. Extensive evaluations on both in domain NExTQA and out-of-domain Video-Holmes, CG-Bench Reasoning, and VRBench benchmarks demonstrate consistent performance gains, robustness and generalization of Chain-of-Glimpse across diverse video reasoning tasks. △ Less

Submitted 16 April, 2026; originally announced April 2026.

arXiv:2604.14661 [pdf, ps, other]

AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime

Authors: Jianhao Su, Zhanwei Wu, ShengTing Huang, Weidong Feng

Abstract: Edge AI model deployment is a multi-stage engineering process involving model conversion, operator compatibility handling, quantization calibration, runtime integration, and accuracy validation. In practice, this workflow is long, failure-prone, and heavily dependent on deployment expertise, particularly when targeting hardware-specific inference runtimes. This technical report presents AIPC (AI… ▽ More Edge AI model deployment is a multi-stage engineering process involving model conversion, operator compatibility handling, quantization calibration, runtime integration, and accuracy validation. In practice, this workflow is long, failure-prone, and heavily dependent on deployment expertise, particularly when targeting hardware-specific inference runtimes. This technical report presents AIPC (AI Porting Conversion), an AI agent-driven approach for constrained automation of AI model deployment. AIPC decomposes deployment into standardized, verifiable stages and injects deployment-domain knowledge into agent execution through Agent Skills, helper scripts, and a stage-wise validation loop. This design reduces both the expertise barrier and the engineering time required for hardware deployment. Using Qualcomm AI Runtime (QAIRT) as the primary scenario, this report examines automated deployment across representative vision, multimodal, and speech models. In the cases covered here, AIPC can complete deployment from PyTorch to runnable QNN/SNPE inference within 7-20 minutes for structurally regular vision models, with indicative API costs roughly in the range of USD 0.7-10. For more complex models involving less-supported operators, dynamic shapes, or autoregressive decoding structures, fully automated deployment may still require further advances, but AIPC already provides practical support for execution, failure localization, and bounded repair. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: 19 pages, 1 figure, technical report

arXiv:2604.14653 [pdf, ps, other]

Closing the Observational Gap in Cosmic Dynamics: AI-Enabled Reconstruction of the Universe's Vorticity and Rotational Flow Morphology

Authors: Ziyong Wu, Xu Xiao, Fuyu Dong, Juhan Kim, Yan-Chuan Cai, Yang Wang, Xi Kang, Le Zhang, Xin Wang, Xiao-Dong Li

Abstract: The cosmic vorticity field, an essential tracer of nonlinear structure formation, has remained observationally inaccessible because transverse galaxy motions are difficult to measure and analytic models struggle to capture shell-crossing. Here we report an empirical reconstruction of this field by applying an artificial intelligence framework trained on simulations of the concordance LambdaCDM mod… ▽ More The cosmic vorticity field, an essential tracer of nonlinear structure formation, has remained observationally inaccessible because transverse galaxy motions are difficult to measure and analytic models struggle to capture shell-crossing. Here we report an empirical reconstruction of this field by applying an artificial intelligence framework trained on simulations of the concordance LambdaCDM model to Sloan Digital Sky Survey galaxies. The recovered three-dimensional velocity and vorticity fields reveal coherent vortical structures, including spiral flows in clusters, filaments, and voids, and the cosmic web inferred from vorticity closely matches that derived from density segmentation. The power spectra of the reconstructed velocity and vorticity fields agree statistically with LambdaCDM predictions, and the inferred velocity field effectively removes redshift-space distortions, yielding an almost isotropic clustering signal. These converging lines of evidence, obtained from an independent perspective, reinforce the concordance cosmological model. By closing a long-standing observational gap, our results highlight the potential of AI-driven reconstruction to access otherwise unobservable quantities and to address fundamental questions in cosmology and galaxy formation. △ Less

Submitted 16 April, 2026; originally announced April 2026.

Comments: 18 pages, 9 figures

arXiv:2604.14548 [pdf, ps, other]

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

Authors: Yuxiang Wang, Hongyu Liu, Yijiang Xu, Qinke Ni, Li Wang, Wan Lin, Kunyu Feng, Dekun Chen, Xu Tan, Lei Wang, Jie Shi, Zhizheng Wu

Abstract: As speech language models (SLMs) transition from personal devices into shared, multi-user environments, their responses must account for far more than the words alone. Who is speaking, how they sound, and where the conversation takes place can each turn an otherwise benign request into one that is unsafe, unfair, or privacy-violating. Existing benchmarks, however, largely focus on basic audio comp… ▽ More As speech language models (SLMs) transition from personal devices into shared, multi-user environments, their responses must account for far more than the words alone. Who is speaking, how they sound, and where the conversation takes place can each turn an otherwise benign request into one that is unsafe, unfair, or privacy-violating. Existing benchmarks, however, largely focus on basic audio comprehension, study individual risks in isolation, or conflate content that is inherently harmful with content that only becomes problematic due to its acoustic context. We introduce VoxSafeBench, among the first benchmarks to jointly evaluate social alignment in SLMs across three dimensions: safety, fairness, and privacy. VoxSafeBench adopts a Two-Tier design: Tier1 evaluates content-centric risks using matched text and audio inputs, while Tier2 targets audio-conditioned risks in which the transcript is benign but the appropriate response hinges on the speaker, paralinguistic cues, or the surrounding environment. To validate Tier2, we include intermediate perception probes and confirm that frontier SLMs can successfully detect these acoustic cues yet still fail to act on them appropriately. Across 22 tasks with bilingual coverage, we find that safeguards appearing robust on text often degrade in speech: safety awareness drops for speaker- and scene-conditioned risks, fairness erodes when demographic differences are conveyed vocally, and privacy protections falter when contextual cues arrive acoustically. Together, these results expose a pervasive speech grounding gap: current SLMs frequently recognize the relevant social norm in text but fail to apply it when the decisive cue must be grounded in speech. Code and data are publicly available at: https://amphionteam.github.io/VoxSafeBench_demopage/ △ Less

Submitted 15 April, 2026; originally announced April 2026.

arXiv:2604.14353 [pdf, ps, other]

RoSLAC: Robust Simultaneous Localization and Calibration of Multiple Magnetometers

Authors: Qiyang Lyu, Zhenyu Wu, Wei Wang, Hongming Shen, Danwei Wang

Abstract: Localization of autonomous mobile robots (AMRs) in enclosed or semi-enclosed environments such as offices, hotels, hospitals, indoor parking facilities, and underground spaces where GPS signals are weak or unavailable remains a major obstacle to the deployment of fully autonomous systems. Infrastructure-based localization approaches, such as QR codes and RFID, are constrained by high installation… ▽ More Localization of autonomous mobile robots (AMRs) in enclosed or semi-enclosed environments such as offices, hotels, hospitals, indoor parking facilities, and underground spaces where GPS signals are weak or unavailable remains a major obstacle to the deployment of fully autonomous systems. Infrastructure-based localization approaches, such as QR codes and RFID, are constrained by high installation and maintenance costs as well as limited flexibility, while onboard sensor-based methods, including LiDAR- and vision-based solutions, are affected by ambiguous geometric features and frequent occlusions caused by dynamic obstacles such as pedestrians. Ambient magnetic field (AMF)-based localization has therefore attracted growing interest in recent years because it does not rely on external infrastructure or geometric features, making it well-suited for AMR applications such as service robots and security robots. However, magnetometer measurements are often corrupted by distortions caused by ferromagnetic materials present on the sensor platform, which bias the AMF and degrade localization reliability. As a result, accurate magnetometer calibration to estimate distortion parameters becomes essential. Conventional calibration methods that rely on rotating the magnetometer are impractical for large and heavy platforms. To address this limitation, this paper proposes a robust simultaneous localization and calibration (RoSLAC) approach based on alternating optimization, which iteratively and efficiently estimates both the platform pose and magnetometer calibration parameters. Extensive evaluations conducted in high-fidelity simulation and real-world environments demonstrate that the proposed RoSLAC method achieves high localization accuracy while maintaining low computational cost compared with state-of-the-art magnetometer calibration techniques. △ Less

Submitted 15 April, 2026; originally announced April 2026.

arXiv:2604.14288 [pdf, ps, other]

Exact Toda Black Holes of Rank-2 Lie Groups

Authors: H. Lu, Peng-Yu Wu, Ze-Hua Wu, Weicheng Zhao

Abstract: We consider Einstein gravity coupled to two Maxwell fields and one dilatonic scalar, and construct spherically-symmetric and static black holes that are charged under both Maxwell fields in general $D$ dimensions. We find that for suitable dilaton couplings, the equations of motion can be cast into one-dimensional Toda equations of all rank-2 Lie groups. We devise a brute-force approach to obtain… ▽ More We consider Einstein gravity coupled to two Maxwell fields and one dilatonic scalar, and construct spherically-symmetric and static black holes that are charged under both Maxwell fields in general $D$ dimensions. We find that for suitable dilaton couplings, the equations of motion can be cast into one-dimensional Toda equations of all rank-2 Lie groups. We devise a brute-force approach to obtain the most general but remarkably elegant solutions to the Toda equations. This allows us to construct exact black holes associated with all the rank-2 Lie groups. The $B_2$ and $G_2$ Toda black holes are new. We study their thermodynamics and verify explicitly an earlier claim in the literature that all these thermodynamic quantities can be derived without having to solve for these black hole solutions. △ Less

Submitted 15 April, 2026; originally announced April 2026.

Comments: Latex 31 pages

arXiv:2604.13910 [pdf, ps, other]

doi 10.1088/1572-9494/acdce5

Tsallis relative $α$ entropy of coherence dynamics in Grover's search algorithm

Authors: Linlin Ye, Zhaoqi Wu, Shao-Ming Fei

Abstract: Quantum coherence plays a central role in Grover's search algorithm. We study the Tsallis relative $α$ entropy of coherence dynamics of the evolved state in Grover's search algorithm. We prove that the Tsallis relative $α$ entropy of coherence decreases with the increase of the success probability, and derive the complementarity relations between the coherence and the success probability. We show… ▽ More Quantum coherence plays a central role in Grover's search algorithm. We study the Tsallis relative $α$ entropy of coherence dynamics of the evolved state in Grover's search algorithm. We prove that the Tsallis relative $α$ entropy of coherence decreases with the increase of the success probability, and derive the complementarity relations between the coherence and the success probability. We show that the operator coherence of the first $H^{\otimes n}$ relies on the size of the database $N$, the success probability and the target states. Moreover, we illustrate the relationships between coherence and entanglement of the superposition state of targets, as well as the production and deletion of coherence in Grover iterations. △ Less

Submitted 16 April, 2026; v1 submitted 15 April, 2026; originally announced April 2026.

Comments: 26 pages,6 figures

Journal ref: Commun. Theor. Phys. 75 (2023) 085101

arXiv:2604.13747 [pdf, ps, other]

Realistic Detector Geometry Modeling and Its Impact on Event Reconstruction in JUNO

Authors: Zhaoxiang Wu, Miao He, Wuming Luo, Ziyan Deng, Wei He, Yuekun Heng, Xiaoping Jing, Bo Li, Xiaoyan Ma, Xiaohui Qian, Zhonghua Qin, Yifang Wang, Peidong Yu

Abstract: JUNO is designed to determine the neutrino mass ordering with an energy resolution of 3% at 1 MeV. In the real detector, however, deformations of the central stainless-steel structure during installation lead to deviations of the photomultiplier tube (PMT) positions from their design values. Based on the limited survey data of the PMTs and the stainless-steel truss, we perform a correlation analys… ▽ More JUNO is designed to determine the neutrino mass ordering with an energy resolution of 3% at 1 MeV. In the real detector, however, deformations of the central stainless-steel structure during installation lead to deviations of the photomultiplier tube (PMT) positions from their design values. Based on the limited survey data of the PMTs and the stainless-steel truss, we perform a correlation analysis of the measured points and propose a method to predict the positions of all PMTs. Using the resulting realistic geometry, we demonstrate that the detector deformation has a negligible effect on the energy reconstruction. In contrast, inaccuracies in the assumed geometry can introduce vertex biases of up to 40 mm. Incorporating the realistic geometry into the calibration-based PMT response model removes this bias and preserves the stability of the reconstruction algorithms. △ Less

Submitted 15 April, 2026; originally announced April 2026.

Comments: 10 pages, 12 figures

arXiv:2604.12706 [pdf, ps, other]

Measurement of the $W$-boson production cross-sections in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region

Authors: LHCb collaboration, R. Aaij, M. Abdelfatah, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, R. Aleksiejunas, F. Alessio, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1112 additional authors not shown)

Abstract: A precision measurement of the $W$-boson production cross-section is performed using the $W \to μν$ decay channel, based on a sample of proton-proton collision data collected by the LHCb experiment at $\sqrt{s}$ = 13 TeV and corresponding to an integrated luminosity of 5.1 $fb^{-1}$. The cross-section is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2… ▽ More A precision measurement of the $W$-boson production cross-section is performed using the $W \to μν$ decay channel, based on a sample of proton-proton collision data collected by the LHCb experiment at $\sqrt{s}$ = 13 TeV and corresponding to an integrated luminosity of 5.1 $fb^{-1}$. The cross-section is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. The integrated production cross-sections of $W$ bosons are measured to be $$ \begin{array}{lcl} σ_{W^+ \to μ^+ν} &=& 1754.2 \pm 1.5 \pm 11.9 \pm 35.1\text{ pb} \\ σ_{W^- \to μ^-\barν} &=& 1178.1 \pm 1.3 \pm 9.7 \pm 23.6\text{ pb} \end{array} $$ where uncertainties are statistical, systematic, and due to the luminosity determination, respectively. Results are in good agreement with theoretical predictions at next-to-next-to-leading order in perturbative quantum chromodynamics. This measurement is significantly more precise than previous results in this kinematic regime. △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/5190/

Report number: LHCb-PAPER-2025-070, CERN-EP-2026-083

arXiv:2604.12593 [pdf, ps, other]

Precision measurement of the muon charge asymmetry from $W$-boson decays in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region

Authors: LHCb collaboration, R. Aaij, M. Abdelfatah, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, R. Aleksiejunas, F. Alessio, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1112 additional authors not shown)

Abstract: A precision measurement of the muon charge asymmetry from $W$-boson decays in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The analysis utilizes data corresponding to an integrated luminosity of 5.1 $fb^{-1}$, recorded by the LHCb detector during 2016, 2017 and 2018. The asymmetry is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 a… ▽ More A precision measurement of the muon charge asymmetry from $W$-boson decays in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The analysis utilizes data corresponding to an integrated luminosity of 5.1 $fb^{-1}$, recorded by the LHCb detector during 2016, 2017 and 2018. The asymmetry is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. This result represents the most precise determination of the muon charge asymmetry in the forward region to date, exhibiting excellent agreement with next-to-next-to-leading-order predictions in perturbative quantum chromodynamics. △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/5472/

Report number: LHCb-PAPER-2025-071, CERN-EP-2026-084

arXiv:2604.12524 [pdf, ps, other]

Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, X. L. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (728 additional authors not shown)

Abstract: A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$,… ▽ More A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$. △ Less

Submitted 14 April, 2026; v1 submitted 14 April, 2026; originally announced April 2026.

arXiv:2604.12447 [pdf, ps, other]

HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models

Authors: Zixing Chen, Yifeng Gao, Li Wang, Yunhan Zhao, Yi Liu, Jiayu Li, Xiang Zheng, Zuxuan Wu, Cong Wang, Xingjun Ma, Yu-Gang Jiang

Abstract: Vision-Language-Action (VLA) models inherit rich world knowledge from vision-language backbones and acquire executable skills via action demonstrations. However, existing evaluations largely focus on action execution success, leaving action policies loosely coupled with visual-linguistic semantics. This decoupling exposes a systematic vulnerability whereby correct action execution may induce unsaf… ▽ More Vision-Language-Action (VLA) models inherit rich world knowledge from vision-language backbones and acquire executable skills via action demonstrations. However, existing evaluations largely focus on action execution success, leaving action policies loosely coupled with visual-linguistic semantics. This decoupling exposes a systematic vulnerability whereby correct action execution may induce unsafe outcomes under semantic risk. To expose this vulnerability, we introduce HazardArena, a benchmark designed to evaluate semantic safety in VLAs under controlled yet risk-bearing contexts. HazardArena is constructed from safe/unsafe twin scenarios that share matched objects, layouts, and action requirements, differing only in the semantic context that determines whether an action is unsafe. We find that VLA models trained exclusively on safe scenarios often fail to behave safely when evaluated in their corresponding unsafe counterparts. HazardArena includes over 2,000 assets and 40 risk-sensitive tasks spanning 7 real-world risk categories grounded in established robotic safety standards. To mitigate this vulnerability, we propose a training-free Safety Option Layer that constrains action execution using semantic attributes or a vision-language judge, substantially reducing unsafe behaviors with minimal impact on task performance. We hope that HazardArena highlights the need to rethink how semantic safety is evaluated and enforced in VLAs as they scale toward real-world deployment. △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: Submitted to conference; 12 pages, 8 figures, including supplementary material

arXiv:2604.12285 [pdf, ps, other]

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

Authors: Zhaofen Wu, Hanrong Zhang, Fulin Lin, Wujiang Xu, Xinran Xu, Yankai Chen, Henry Peng Zou, Shaowen Chen, Weizhi Zhang, Xue Liu, Philip S. Yu, Hongwei Wang

Abstract: To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often st… ▽ More To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to evolving narratives. To address this, we propose GAM, a hierarchical Graph-based Agentic Memory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention. By isolating ongoing dialogue in an event progression graph and integrating it into a topic associative network only upon semantic shifts, our approach minimizes interference while preserving long-term consistency. Additionally, we introduce a graph-guided, multi-factor retrieval strategy to enhance context precision. Experiments on LoCoMo and LongDialQA indicate that our method consistently outperforms state-of-the-art baselines in both reasoning accuracy and efficiency. △ Less

Submitted 14 April, 2026; originally announced April 2026.

Comments: 18 pages, 6 figures

arXiv:2604.12251 [pdf, ps, other]

ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models

Authors: Xinliang Wang, Yifeng Shi, Zhenyu Wu

Abstract: 3D Gaussian Splatting (3DGS) delivers high-fidelity real-time rendering but suffers from geometric and photometric degradations under sparse-view constraints. Current generative restoration approaches are often limited by insufficient temporal coherence, a lack of explicit spatial constraints, and a lack of large-scale training data, resulting in multi-view inconsistencies, erroneous geometric hal… ▽ More 3D Gaussian Splatting (3DGS) delivers high-fidelity real-time rendering but suffers from geometric and photometric degradations under sparse-view constraints. Current generative restoration approaches are often limited by insufficient temporal coherence, a lack of explicit spatial constraints, and a lack of large-scale training data, resulting in multi-view inconsistencies, erroneous geometric hallucinations, and limited generalization to diverse real-world artifact distributions. In this paper, we present ArtifactWorld, a framework that resolves 3DGS artifact repair through systematic data expansion and a homogeneous dual-model paradigm. To address the data bottleneck, we establish a fine-grained phenomenological taxonomy of 3DGS artifacts and construct a comprehensive training set of 107.5K diverse paired video clips to enhance model robustness. Architecturally, we unify the restoration process within a video diffusion backbone, utilizing an isomorphic predictor to localize structural defects via an artifact heatmap. This heatmap then guides the restoration through an Artifact-Aware Triplet Fusion mechanism, enabling precise, intensity-guided spatio-temporal repair within native self-attention. Extensive experiments demonstrate that ArtifactWorld achieves state-of-the-art performance in sparse novel view synthesis and robust 3D reconstruction. Code and dataset will be made public. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: The second author is the corresponding author

arXiv:2604.11998 [pdf, ps, other]

The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results

Authors: Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timofte, Nicu Sebe, Mohamed Elhoseiny, Lingyi Hong, Mingxi Cheng, Xingqi He, Runze Li, Xingdong Sheng, Wenqiang Zhang, Jiacong Liu, Shu Luo, Yikai Qin, Yaze Zhao, Yongwei Jiang, Yixiong Zou, Zhe Zhang, Yang Yang , et al. (49 additional authors not shown)

Abstract: Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The chal… ▽ More Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The challenge received strong community interest, with 128 registered participants and a total of 696 submissions. Among them, 31 teams actively participated, and 19 teams submitted valid final results. Participants explored a wide range of strategies, introducing innovative methods that push the performance frontier under both open-source and closed-source tracks. This report presents a detailed overview of the NTIRE 2026 CD-FSOD Challenge, including a summary of the submitted approaches and an analysis of the final results across all participating teams. Challenge Codes: https://github.com/ohMargin/NTIRE2026_CDFSOD. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: accepted by CVPRW 26 @ NTIRE

arXiv:2604.11899 [pdf, ps, other]

Intense and extended CIII] emission suggests a strong outflow in JADES-GS-z14-0

Authors: Stefano Carniani, Peter Jakobsen, Giacomo Venturi, Francesco D'Eugenio, Tobias J. Looser, Joris Witstok, Christopher N. A. Willmer, Andrea Ferrara, Zihao Wu, Santiago Arribas, Andrew J. Bunker, Stéphane Charlot, Jacopo Chevallard, Mirko Curti, Emma Curtis-Lake, Daniel J. Eisenstein, Kevin Hainline, Jakob M. Helton, Zhiyuan Ji, Xihan Ji, Benjamin D. Johnson, Mahsa Kohandel, Nimisha Kumari, Roberto Maiolino, Andrea Pallottini , et al. (9 additional authors not shown)

Abstract: JWST has revealed an overabundance of very bright, blue galaxies at z>10, raising fundamental questions about how star formation and feedback operate at Cosmic Dawn. We present new JWST/NIRSpec MSA PRISM/CLEAR spectroscopy of JADES-GS-z14-0 (z=14.18) obtained with the JADES and OASIS programmes. While the rest-frame UV continuum flux level and shape are consistent between the two datasets, the OAS… ▽ More JWST has revealed an overabundance of very bright, blue galaxies at z>10, raising fundamental questions about how star formation and feedback operate at Cosmic Dawn. We present new JWST/NIRSpec MSA PRISM/CLEAR spectroscopy of JADES-GS-z14-0 (z=14.18) obtained with the JADES and OASIS programmes. While the rest-frame UV continuum flux level and shape are consistent between the two datasets, the OASIS spectrum shows a 10$σ$ detection of the CIII]$λ\lambda1907,1909$ emission line, with a luminosity three times higher than that measured in the JADES data. This difference is naturally explained by the offset in shutter placement between OASIS and JADES, implying that the CIII] emission is spatially displaced by $\sim400$ pc from the stellar continuum. The non-detection of CIII] in NIRCam medium-band imaging indicates that the emitting region is extended on scales $\gtrsim165$ pc, with a surface brightness below the detection threshold. Interpreting this diffuse, carbon-enriched gas as the result of ongoing or past outflows, we infer a mass outflow rate of $\dot{M}_{\rm out}\sim160~{\rm M_\odot\,yr^{-1}}$. We compare it with the star-formation rate (SFR) and derive a mass-loading factor of $η= \dot{M}_{\rm out}/{\rm SFR} = 4-15$, suggesting highly efficient feedback at very early times. Finally, we show that, if outflows are one of the mechanisms regulating star formation in JADES-GS-z14-0, the instantaneous star-formation efficiency in massive haloes is constrained to $ε_\star\lesssim0.08$. These results support a scenario in which outflows play a crucial role during the earliest phases of galaxy formation. Comparing our results with the current theoretical galaxy formation model, we conclude that a combination of moderate star-formation efficiency and reduced dust attenuation can account for the emergence of luminous galaxies at the highest redshifts. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: submitted to A&A journal, 10 pages, 8 figures

arXiv:2604.11557 [pdf, ps, other]

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents

Authors: Yijuan Liang, Xinghao Chen, Yifan Ge, Ziyi Wu, Hao Wu, Changyu Zeng, Wei Xing, Xiaoyu Shen

Abstract: Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, largely overlooks the structural distribution of tool-use trajectories, and relies on incompatible evaluation benchmarks. We present UniToolCall, a unified framework for tool le… ▽ More Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, largely overlooks the structural distribution of tool-use trajectories, and relies on incompatible evaluation benchmarks. We present UniToolCall, a unified framework for tool learning that standardizes the entire pipeline from toolset construction and dataset generation to evaluation. The framework curates a large tool pool of 22k+ tools and constructs a hybrid training corpus of 390k+ instances by combining 10 standardized public datasets with structurally controlled synthetic trajectories. It explicitly models diverse interaction patterns, including single-hop vs. multi-hop and single-turn vs. multi-turn, while capturing both serial and parallel execution structures. To support coherent multi-turn reasoning, we further introduce an Anchor Linkage mechanism that enforces cross-turn dependencies. Furthermore, we convert 7 public benchmarks into a unified Query--Action--Observation--Answer (QAOA) representation with fine-grained evaluation at the function-call, turn, and conversation levels. Experiments show that fine-tuning Qwen3-8B on our dataset substantially improves tool-use performance. Under the distractor-heavy Hybrid-20 setting, achieves 93.0% single-turn Strict Precision, outperforming commercial models including GPT, Gemini, and Claude. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: 18 pages, 8 figures, 6 tables. Code and datasets are publicly available at: https://github.com/EIT-NLP/UniToolCall

arXiv:2604.11552 [pdf, ps, other]

MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora

Authors: Tao Feng, Yuxiang Wang, Yuancheng Wang, Xueyao Zhang, Dekun Chen, Chaoren Wang, Xun Guan, Zhizheng Wu

Abstract: Voice imitation aims to transform source speech to match a reference speaker's timbre and speaking style while preserving linguistic content. A straightforward approach is to train on triplets of (source, reference, target), where source and target share the same content but target matches the reference's voice characteristics, yet such data is extremely scarce. Existing approaches either employ c… ▽ More Voice imitation aims to transform source speech to match a reference speaker's timbre and speaking style while preserving linguistic content. A straightforward approach is to train on triplets of (source, reference, target), where source and target share the same content but target matches the reference's voice characteristics, yet such data is extremely scarce. Existing approaches either employ carefully designed disentanglement architectures to bypass this data scarcity or leverage external systems to synthesize pseudo-parallel training data. However, the former requires intricate model design, and the latter faces a quality ceiling when synthetic speech is used as training targets. To address these limitations, we propose MimicLM, which takes a novel approach by using synthetic speech as training sources while retaining real recordings as targets. This design enables the model to learn directly from real speech distributions, breaking the synthetic quality ceiling. Building on this data construction approach, we incorporate interleaved text-audio modeling to guide the generation of content-accurate speech and apply post-training with preference alignment to mitigate the inherent distributional mismatch when training on synthetic data. Experiments demonstrate that MimicLM achieves superior voice imitation quality with a simple yet effective architecture, significantly outperforming existing methods in naturalness while maintaining competitive similarity scores across speaker identity, accent, and emotion dimensions. △ Less

Submitted 13 April, 2026; originally announced April 2026.

arXiv:2604.11487 [pdf, ps, other]

NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild

Authors: Aleksandr Gushchin, Khaled Abud, Ekaterina Shumitskaya, Artem Filippov, Georgii Bychkov, Sergey Lavrushkin, Mikhail Erofeev, Anastasia Antsiferova, Changsheng Chen, Shunquan Tan, Radu Timofte, Dmitry Vatolin, Chuanbiao Song, Zijian Yu, Hao Tan, Jun Lan, Zhiqiang Yang, Yongwei Tang, Zhiqiang Wu, Jia Wen Seow, Hong Vin Koay, Haodong Ren, Feng Xu, Shuai Chen, Ruiyang Xia , et al. (29 additional authors not shown)

Abstract: This paper presents an overview of the NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild, held in conjunction with the NTIRE workshop at CVPR 2026. The goal of this challenge was to develop detection models capable of distinguishing real images from generated ones in realistic scenarios: the images are often transformed (cropped, resized, compressed, blurred) for practical us… ▽ More This paper presents an overview of the NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild, held in conjunction with the NTIRE workshop at CVPR 2026. The goal of this challenge was to develop detection models capable of distinguishing real images from generated ones in realistic scenarios: the images are often transformed (cropped, resized, compressed, blurred) for practical usage, and therefore, the detection models should be robust to such transformations. The challenge is based on a novel dataset consisting of 108,750 real and 185,750 AI-generated images from 42 generators comprising a large variety of open-source and closed-source models of various architectures, augmented with 36 image transformations. Methods were evaluated using ROC AUC on the full test set, including both transformed and untransformed images. A total of 511 participants registered, with 20 teams submitting valid final solutions. This report provides a comprehensive overview of the challenge, describes the proposed solutions, and can be used as a valuable reference for researchers and practitioners in increasing the robustness of the detection models to real-world transformations. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: CVPR 2026 NTIRE Workshop Paper, Robust AI-Generated Image Detection Technical Report

arXiv:2604.11484 [pdf, ps, other]

PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery

Authors: Weidong Tang, Bohan Zhang, Zhixiang Chi, ZiZhang Wu, Yang Wang, Yanan Wu

Abstract: On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference i… ▽ More On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference is typically reduced to a single threshold. We argue that this paradigm is fundamentally flawed as OCD is not a static classification problem, but a dynamic process. The model must continuously decide 1) whether a sample belongs to a known class, 2) matches an existing novel category, or 3) should initiate a new one. Moreover, prior methods treat the support set as fixed knowledge. They do not update their decision boundaries as new evidence arrives during inference. This leads to unstable and inconsistent category formation. Our experiments confirm these issues. With properly calibrated and adaptive thresholds, substantial improvements can be achieved, even without changing the representation. Motivated by this, we propose PACO, a support-set-calibrated, tree-structured online decision framework. The framework models inference as a sequence of hierarchical decisions, including known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. Furthermore, we simulate the proxy discovery process to initialize the thresholds during offline training to align with inference. Thresholds are continuously updated during inference using mature novel prototypes. Importantly, PACO requires no heavy training and no dataset-specific tuning. It can be directly integrated into existing OCD pipelines as an inference-time module. Extensive experiments show significant improvements over SOTA baselines across seven benchmarks. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: 16 pages, 6 figures, 7 tables, 1 algorithm

arXiv:2604.11229 [pdf, ps, other]

RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering

Authors: Zhuoyu Wu, Wenhui Ou, Pei-Sze Tan, Wenqi Fang, Sailaja Rajanala, Raphaël C. -W. Phan

Abstract: Retrieving procedure-oriented evidence from materials science papers is difficult because key synthesis details are often scattered across long, context-heavy documents and are not well captured by paragraph-only dense retrieval. We present RECIPER, a dual-view retrieval pipeline that indexes both paragraph-level context and compact large language model-extracted procedural summaries, then combine… ▽ More Retrieving procedure-oriented evidence from materials science papers is difficult because key synthesis details are often scattered across long, context-heavy documents and are not well captured by paragraph-only dense retrieval. We present RECIPER, a dual-view retrieval pipeline that indexes both paragraph-level context and compact large language model-extracted procedural summaries, then combines the two candidate streams with lightweight lexical reranking. Across four dense retrieval backbones, RECIPER consistently improves early-rank retrieval over paragraph-only dense retrieval, achieving average gains of +3.73 in Recall@1, +2.85 in nDCG@10, and +3.13 in MRR. With BGE-large-en-v1.5, it reaches 86.82%, 97.07%, and 97.85% on Recall@1, Recall@5, and Recall@10, respectively. We further observe improved downstream question answering under automatic metrics, suggesting that procedural summaries can serve as a useful complementary retrieval signal for procedure-oriented materials question answering. Code and data are available at https://github.com/ReaganWu/RECIPER. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: 5 pages, 1 figure

arXiv:2604.11123 [pdf, ps, other]

Measurement of inclusive production of charmonium states in $b$-hadron decays via their decay into $φφ$

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, R. Aleksiejunas, F. Alessio, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis, L. An , et al. (1173 additional authors not shown)

Abstract: The inclusive production of the $η_c(1S)$, $η_c(2S)$ and $χ_{c}$ charmonium states in $b$-hadron decays is studied with LHCb Run~2 data, corresponding to an integrated luminosity of $5.9~\text{fb}^{-1}$, using charmonia decays to $φφ$ pairs. The production branching fractions of the $χ_{c}(1P)$ states in $b$-hadron decays are measured, using $b \to η_c(1S) (\to φφ) X$ as a normalisation channel, w… ▽ More The inclusive production of the $η_c(1S)$, $η_c(2S)$ and $χ_{c}$ charmonium states in $b$-hadron decays is studied with LHCb Run~2 data, corresponding to an integrated luminosity of $5.9~\text{fb}^{-1}$, using charmonia decays to $φφ$ pairs. The production branching fractions of the $χ_{c}(1P)$ states in $b$-hadron decays are measured, using $b \to η_c(1S) (\to φφ) X$ as a normalisation channel, with $X$ indicating any additional particles. The results are \begin{align*} &{\cal{B}} (b \to χ_{c0} X) = (1.34 \pm 0.13 \pm 0.06 \pm 0.37) \times 10^{-3}, &{\cal{B}} (b \to χ_{c1} X) = (1.58 \pm 0.12 \pm 0.09 \pm 0.44) \times 10^{-3}, &{\cal{B}} (b \to χ_{c2} X) = (0.55 \pm 0.08 \pm 0.05 \pm 0.15) \times 10^{-3}, \end{align*} where the first uncertainty is statistical, the second systematic and the last is due to the limited knowledge of externally measured branching fractions. The production branching fraction of $η_c(2S)$ times the branching fraction of its decay into $φφ$ is measured as ${\cal{B}} (b \to η_c(2S) X) \times {\cal{B}} (η_c(2S) \to φφ) = (4.0 \pm 0.6 \pm 0.6 \pm 1.1) \times 10^{-7}$. Furthermore, the mass of the $η_c(1S)$ state is measured to be $M_{η_c(1S)} = 2984.1 \pm 0.5 \pm 0.5$ MeV with the best precision to date. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/5410/ (LHCb public pages)

Report number: LHCb-PAPER-2025-058, CERN-EP-2026-058

arXiv:2604.11112 [pdf, ps, other]

Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental Learning

Authors: Linjie Li, Huiyu Xiao, Jiarui Cao, Zhenyu Wu, Yang Ji

Abstract: Class-incremental learning (CIL) aims to continuously accumulate knowledge from a stream of tasks and construct a unified classifier over all seen classes. Although pretrained models (PTMs) have shown promising performance in CIL, they still struggle with the entanglement of multi-task subspaces, leading to catastrophic forgetting when task routing parameters are poorly calibrated or task-level re… ▽ More Class-incremental learning (CIL) aims to continuously accumulate knowledge from a stream of tasks and construct a unified classifier over all seen classes. Although pretrained models (PTMs) have shown promising performance in CIL, they still struggle with the entanglement of multi-task subspaces, leading to catastrophic forgetting when task routing parameters are poorly calibrated or task-level representations are rigidly fixed. To address this issue, we propose a novel Quantum-Gated Task-interaction Knowledge Distillation (QKD) framework that leverages quantum gating to guide inter-task knowledge transfer. Specifically, we introduce a quantum-gated task modulation gating mechanism to model the relational dependencies among task embedding, dynamically capturing the sample-to-task relevance for both joint training and inference across streaming tasks. Guided by the quantum gating outputs, we perform task-interaction knowledge distillation guided by these task-embedding-level correlation weights from old to new adapters, enabling the model to bridge the representation gaps between independent task subspaces. Extensive experiments demonstrate that QKD effectively mitigates forgetting and achieves state-of-the-art performance. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: Accepted to CVPR2026

arXiv:2604.11091 [pdf, ps, other]

LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning

Authors: Linjie Li, Zhenyu Wu, Huiyu Xiao, Yang Ji

Abstract: Prompt-based class-incremental learning methods typically construct a prompt pool consisting of multiple trainable key-prompts and perform instance-level matching to select the most suitable prompt embeddings, which has shown promising results. However, existing approaches face several limitations, including fixed prompt pools, manual selection of prompt embeddings, and strong reliance on the pret… ▽ More Prompt-based class-incremental learning methods typically construct a prompt pool consisting of multiple trainable key-prompts and perform instance-level matching to select the most suitable prompt embeddings, which has shown promising results. However, existing approaches face several limitations, including fixed prompt pools, manual selection of prompt embeddings, and strong reliance on the pretrained backbone for prompt selection. To address these issues, we propose a \textbf{L}ayer-importance guided \textbf{D}ual \textbf{E}xpandable \textbf{P}rompt Pool (\textbf{LDEPrompt}), which enables adaptive layer selection as well as dynamic freezing and expansion of the prompt pool. Extensive experiments on widely used class-incremental learning benchmarks demonstrate that LDEPrompt achieves state-of-the-art performance, validating its effectiveness and scalability. △ Less

Submitted 13 April, 2026; originally announced April 2026.

Comments: Accepted to ICASSP2026

arXiv:2604.10740 [pdf, ps, other]

RCBSF: A Multi-Agent Framework for Automated Contract Revision via Stackelberg Game

Authors: Shijia Xu, Yu Wang, Xiaolong Jia, Zhou Wu, Kai Liu, April Xiaowen Dong

Abstract: Despite the widespread adoption of Large Language Models (LLMs) in Legal AI, their utility for automated contract revision remains impeded by hallucinated safety and a lack of rigorous behavioral constraints. To address these limitations, we propose the Risk-Constrained Bilevel Stackelberg Framework (RCBSF), which formulates revision as a non-cooperative Stackelberg game. RCBSF establishes a hiera… ▽ More Despite the widespread adoption of Large Language Models (LLMs) in Legal AI, their utility for automated contract revision remains impeded by hallucinated safety and a lack of rigorous behavioral constraints. To address these limitations, we propose the Risk-Constrained Bilevel Stackelberg Framework (RCBSF), which formulates revision as a non-cooperative Stackelberg game. RCBSF establishes a hierarchical Leader Follower structure where a Global Prescriptive Agent (GPA) imposes risk budgets upon a follower system constituted by a Constrained Revision Agent (CRA) and a Local Verification Agent (LVA) to iteratively optimize output. We provide theoretical guarantees that this bilevel formulation converges to an equilibrium yielding strictly superior utility over unguided configurations. Empirical validation on a unified benchmark demonstrates that RCBSF achieves state-of-the-art performance, surpassing iterative baselines with an average Risk Resolution Rate (RRR) of 84.21\% while enhancing token efficiency. Our code is available at https://github.com/xjiacs/RCBSF . △ Less

Submitted 12 April, 2026; originally announced April 2026.

arXiv:2604.10734 [pdf, ps, other]

Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS

Authors: Shijia Xu, Zhou Wu, Xiaolong Jia, Yu Wang, Kai Liu, April Xiaowen Dong

Abstract: Retrieval-augmented generation (RAG) substantially extends the knowledge boundary of large language models. However, it still faces two major challenges when handling complex reasoning tasks: low context utilization and frequent hallucinations. To address these issues, we propose Self-Correcting RAG, a unified framework that reformulates retrieval and generation as constrained optimization and pat… ▽ More Retrieval-augmented generation (RAG) substantially extends the knowledge boundary of large language models. However, it still faces two major challenges when handling complex reasoning tasks: low context utilization and frequent hallucinations. To address these issues, we propose Self-Correcting RAG, a unified framework that reformulates retrieval and generation as constrained optimization and path planning. On the input side, we move beyond traditional greedy retrieval and, for the first time, formalize context selection as a multi-dimensional multiple-choice knapsack problem (MMKP), thereby maximizing information density and removing redundancy under a strict token budget. On the output side, we introduce a natural language inference (NLI)-guided Monte Carlo Tree Search (MCTS) mechanism, which leverages test-time compute to dynamically explore reasoning trajectories and validate the faithfulness of generated answers. Experiments on six multi-hop question answering and fact-checking datasets demonstrate that our method significantly improves reasoning accuracy on complex queries while effectively reducing hallucinations, outperforming strong existing baselines.Our code is available at https://github.com/xjiacs/Self-Correcting-RAG . △ Less

Submitted 12 April, 2026; originally announced April 2026.

arXiv:2604.10634 [pdf, ps, other]

NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Authors: Xin Li, Yeying Jin, Suhang Yao, Beibei Lin, Zhaoxin Fan, Wending Yan, Xin Jin, Zongwei Wu, Bingchen Li, Peishu Shi, Yufei Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Runzhe Li, Kui Jiang, Zhaocheng Yu, Yiang Chen, Junjun Jiang, Xianming Liu, Hongde Gu, Zeliang Li, Mache You , et al. (73 additional authors not shown)

Abstract: This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for train… ▽ More This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for training, 407 images for validation, and 593 images for testing. The primary goal of this challenge is to establish a strong and practical benchmark for the removal of raindrops under various illumination and focus conditions. In total, 168 teams have registered for the competition, and 17 teams submitted valid final solutions and fact sheets for the testing phase. The submitted methods achieved strong performance on the Raindrop Clarity dataset, demonstrating the growing progress in this challenging task. △ Less

Submitted 12 April, 2026; originally announced April 2026.

Comments: Accepted by CVPR2026 Workshop; NTIRE 2026 Challenge Report

arXiv:2604.10532 [pdf, ps, other]

The Second Challenge on Real-World Face Restoration at NTIRE 2026: Methods and Results

Authors: Jingkai Wang, Jue Gong, Zheng Chen, Kai Liu, Jiatong Li, Yulun Zhang, Radu Timofte, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Jiajia Liu, Yingsi Chen, Yijiao Liu, Hui Li, Yu Wang, Congchao Zhu, Alexandru-Gabriel Lefterache, Anamaria Radoi, Chuanyue Yan, Tao Lu, Yanduo Zhang, Kanghui Zhao, Jiaming Wang, Yuqi Li , et al. (28 additional authors not shown)

Abstract: This paper provides a review of the NTIRE 2026 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural and realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources… ▽ More This paper provides a review of the NTIRE 2026 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural and realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources or training data. Performance is evaluated using a weighted image quality assessment (IQA) score and employs the AdaFace model as an identity checker. The competition attracted 96 registrants, with 10 teams submitting valid models; ultimately, 9 teams achieved valid scores in the final ranking. This collaborative effort advances the performance of real-world face restoration while offering an in-depth overview of the latest trends in the field. △ Less

Submitted 15 April, 2026; v1 submitted 12 April, 2026; originally announced April 2026.

Comments: NTIRE 26: https://cvlai.net/ntire/2026 . NTIRE Real-World Face Restoration: https://ntire-face.github.io/2026/ . CVPR 2026 Workshop

arXiv:2604.10523 [pdf, ps, other]

Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, H. R. Bao, X. L. Bao, M. Barbagiovanni, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (741 additional authors not shown)

Abstract: Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$,… ▽ More Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$, $\mathcal{B}(χ_{c1} \to π^{+}π^{-}π^{0}π^{0}) = (1.16 \pm 0.01 \pm 0.05) \times 10^{-2}$, and $\mathcal{B}(χ_{c2} \to π^{+}π^{-}π^{0}π^{0}) = (1.92 \pm 0.01 \pm 0.08) \times 10^{-2}$, where the first uncertainties are statistical and the second systematic. The dominant intermediate states are found to be $χ_{cJ}\toρ^+ρ^-$. These results supersede the previous most precise measurements and provide significantly improved precision. △ Less

Submitted 12 April, 2026; originally announced April 2026.

arXiv:2604.10444 [pdf, ps, other]

First Observation of \boldmath{$D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$} Decays

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, C. S. Akondi, R. Aliberti, A. Amoroso, Q. An, Y. H. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, X. L. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. B. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko , et al. (734 additional authors not shown)

Abstract: We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measure… ▽ More We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measured to be $(3.20\pm0.06_{\text{stat.}}\pm0.03_{\text{syst.}})\times 10^{-3}$ and $(2.43 \pm 0.11_{\text{stat.}} \pm 0.04_{\text{syst.}}) \times 10^{-3}$, respectively. % , both achieving three times better precision than the current PDG values. The decay process $D^{+}\to a_0(980)^{+}f_0(500)$ is observed for the first time with an unexpectedly large branching fraction. Moreover, we observe the decays $D^+ \to a_0(980)^{+(0)} ρ(770)^{0(+)}$ and measure the ratio $r_{+/0} \equiv \frac{\mathcal{B}(D^+ \to a_0(980)^+ ρ(770)^0)}{\mathcal{B}(D^+ \to a_0(980)^0 ρ(770)^+)}$ for the first time to be $0.55\pm0.08_{\text{stat.}}\pm0.05_{\text{syst.}}$. These results offer a novel insight into our comprehension of the nature of the $a_0(980)$ and $f_0(500)$ states. △ Less

Submitted 15 April, 2026; v1 submitted 11 April, 2026; originally announced April 2026.

arXiv:2604.10188 [pdf, ps, other]

Radiology Report Generation for Low-Quality X-Ray Images

Authors: Hongze Zhu, Chen Hu, Jiaxuan Jiang, Hong Liu, Yawen Huang, Ming Hu, Tianyu Wang, Zhijian Wu, Yefeng Zheng

Abstract: Vision-Language Models (VLMs) have significantly advanced automated Radiology Report Generation (RRG). However, existing methods implicitly assume high-quality inputs, overlooking the noise and artifacts prevalent in real-world clinical environments. Consequently, current models exhibit severe performance degradation when processing suboptimal images. To bridge this gap, we propose a robust report… ▽ More Vision-Language Models (VLMs) have significantly advanced automated Radiology Report Generation (RRG). However, existing methods implicitly assume high-quality inputs, overlooking the noise and artifacts prevalent in real-world clinical environments. Consequently, current models exhibit severe performance degradation when processing suboptimal images. To bridge this gap, we propose a robust report generation framework explicitly designed for image quality variations. We first introduce an Automated Quality Assessment Agent (AQAA) to identify low-quality samples within the MIMIC-CXR dataset and establish the Low-quality Radiology Report Generation (LRRG) benchmark. To tackle degradation-induced shifts, we propose a novel Dual-loop Training Strategy leveraging bi-level optimization and gradient consistency. This approach ensures the model learns quality-agnostic diagnostic features by aligning gradient directions across varying quality regimes. Extensive experiments demonstrate that our approach effectively mitigates model performance degradation caused by image quality deterioration. The code and data will be released upon acceptance. △ Less

Submitted 11 April, 2026; originally announced April 2026.

arXiv:2604.10100 [pdf, ps, other]

Non-solvable groups whose non-linear character degrees have the same number of different prime divisors

Authors: Junying Guo, Yanjun Liu, Ziyi Wu, Di Xiao

Abstract: By a result of Noritzsch, a finite solvable group whose non-linear character degrees have the same set of prime divisors is meta-abelian. In this note we investigate finite non-solvable groups whose non-linear character degrees have the same number of different prime divisors, and show that up to an abelian direct factor, such groups are exactly $L_2(4), L_2(8), A_7, S_7$, the central product of a… ▽ More By a result of Noritzsch, a finite solvable group whose non-linear character degrees have the same set of prime divisors is meta-abelian. In this note we investigate finite non-solvable groups whose non-linear character degrees have the same number of different prime divisors, and show that up to an abelian direct factor, such groups are exactly $L_2(4), L_2(8), A_7, S_7$, the central product of a cyclic $3$-group with $3.A_7$, or the semi-direct product of $A_7$ by a cyclic $2$-group $\langle a\rangle$ such that $a$ non-trivially acts on $A_7$ by conjugation. As consequence, we show that only the primes $2,3,5,7$ may occur as prime divisors of their irreducible character degrees, and that Huppert's $ρ$-$σ$ conjecture holds for them. △ Less

Submitted 11 April, 2026; originally announced April 2026.

MSC Class: 20C15; 20D05

arXiv:2604.09877 [pdf, ps, other]

DINO_4D: Semantic-Aware 4D Reconstruction

Authors: Yiru Yang, Zhuojie Wu, Quentin Marguet, Nishant Kumar Singh, Max Schulthess

Abstract: In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO\_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic… ▽ More In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO\_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic tracking. Experiments on the Point Odyssey and TUM-Dynamics benchmarks demonstrate that our method maintains the linear time complexity $O(T)$ of its predecessors while significantly improving Tracking Accuracy (APD) and Reconstruction Completeness. DINO\_4D establishes a new paradigm for constructing 4D World Models that possess both geometric precision and semantic understanding. △ Less

Submitted 10 April, 2026; originally announced April 2026.

arXiv:2604.09875 [pdf, ps, other]

Galactic Archaeology with the Subaru `Ōnohi`ula Prime Focus Spectrograph Strategic Program

Authors: Masashi Chiba, Rosemary F. G. Wyse, Evan N. Kirby, Judith G. Cohen, László Dobos, Roman Gerasimov, Miho N. Ishigaki, Kohei Hayashi, Carrie Filion, Magda Arnaboldi, Souradeep Bhattacharya, Yutaka Hirai, Chiaki Kobayashi, Yutaka Komiyama, Pete B. Kuzma, Itsuki Ogami, Ana L. Chies-Santos, Nicole L. Klock-Miranda, Federico Sestito, Tamás Budavári, Andrew P. Cooper, Keyi Ding, Ivanna Escala, Elisa G. M. Ferreira, Ortwin Gerhard , et al. (24 additional authors not shown)

Abstract: The recently commissioned Subaru `Ōnohi`ula Prime Focus Spectrograph (PFS) will obtain spectra from nearly 2,400 fibers that cover 1.24 square degrees. The 360 night Subaru Strategic Program for PFS is dedicating approximately one-third of its allocation (130 nights) to study the structure and evolution of galaxies in the Local Group. This Galactic Archaeological survey has three pillars. (1) We w… ▽ More The recently commissioned Subaru `Ōnohi`ula Prime Focus Spectrograph (PFS) will obtain spectra from nearly 2,400 fibers that cover 1.24 square degrees. The 360 night Subaru Strategic Program for PFS is dedicating approximately one-third of its allocation (130 nights) to study the structure and evolution of galaxies in the Local Group. This Galactic Archaeological survey has three pillars. (1) We will determine whether the mass density profiles of dwarf galaxies are consistent with cusps, as expected for cold dark matter, or cores, as expected from alternative dark matter theories or baryonic feedback. We will deduce the density profiles as a function of radius from modeling of the full line-of-sight velocity and abundance distributions for six dwarf galaxies. Our total sample will consist of 18,000 member stars to beyond the nominal tidal radius of each system. (2) From measurements of the [alpha/Fe] abundance ratio, we will learn the difference in assembly history of the two most massive galaxies in the Local Group: M31 and the Milky Way. We will observe 30,000 member stars over 45 square degrees of M31's halo and outer disk. (3) We will uncover how the most fragile (outer) part of the Milky Way responded to accretion events both in the distant past (such as Gaia-Sausage Enceladus) and in more recent history (such as the Sagittarius dwarf spheroidal galaxy). To support this study, PFS will provide velocities and metallicities--from which, in combination with photometry, we will deduce ages--for tens of thousands of main-sequence stars out to a Galactocentric distance of ~30 kpc. △ Less

Submitted 15 April, 2026; v1 submitted 10 April, 2026; originally announced April 2026.

Comments: The Galactic Archaeology science case for the Subaru Strategic Program for the `Ōnohi`ula Prime Focus Spectrograph. Not yet submitted to any journal. v2: Author list and affiliations updated

arXiv:2604.09874 [pdf, ps, other]

Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis

Authors: Xinkai Zou, Yiming Huang, Zhuohang Wu, Jian Sha, Nan Huang, Longfei Yun, Jingbo Shang, Letian Peng

Abstract: Simulating how organized groups (e.g., corporations) make decisions (e.g., responding to a competitor's move) is essential for understanding real-world dynamics and could benefit relevant applications (e.g., market prediction). In this paper, we formalize this problem as a concrete research platform for group behavior understanding, providing: (1) a task definition with benchmark and evaluation cr… ▽ More Simulating how organized groups (e.g., corporations) make decisions (e.g., responding to a competitor's move) is essential for understanding real-world dynamics and could benefit relevant applications (e.g., market prediction). In this paper, we formalize this problem as a concrete research platform for group behavior understanding, providing: (1) a task definition with benchmark and evaluation criteria, (2) a structured analytical framework with a corresponding algorithm, and (3) detailed temporal and cross-group analysis. Specifically, we propose Organized Group Behavior Simulation, a task that models organized groups as collective entities from a practical perspective: given a group facing a particular situation (e.g., AI Boom), predict the decision it would take. To support this task, we present GROVE (GRoup Organizational BehaVior Evaluation), a benchmark covering 44 entities with 8,052 real-world context-decision pairs collected from Wikipedia and TechCrunch across 9 domains, with an end-to-end evaluation protocol assessing consistency, initiative, scope, magnitude, and horizon. Beyond straightforward prompting pipelines, we propose a structured analytical framework that converts collective decision-making events into an interpretable, adaptive, and traceable behavioral model, achieving stronger performance than summarization- and retrieval-based baselines. It further introduces an adapter mechanism for time-aware evolution and group-aware transfer, and traceable evidence nodes grounding each decision rule in originating historical events. Our analysis reveals temporal behavioral drift within individual groups, which the time-aware adapter effectively captures for stronger prediction, and structured cross-group similarity that enables knowledge transfer for data-scarce organizations. △ Less

Submitted 10 April, 2026; originally announced April 2026.

arXiv:2604.09304 [pdf, ps, other]

GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic

Authors: Jiayuan Lu, Rengan Xie, Xuancheng Jin, Zhizhen Wu, Qi Ye, Tian Xie, Hujun Bao, Rui Wang. Yuchi Huo

Abstract: For decades, Physically-Based Rendering (PBR) is the fundation of synthesizing photorealisitic images, and therefore sometimes roughly referred as Photorealistic Rendering (PRR). While PBR is indeed a mathematical simulation of light transport that guarantees physical reality, photorealism has additional reliance on the realistic digital model of geometry and appearance of the real world, leaving… ▽ More For decades, Physically-Based Rendering (PBR) is the fundation of synthesizing photorealisitic images, and therefore sometimes roughly referred as Photorealistic Rendering (PRR). While PBR is indeed a mathematical simulation of light transport that guarantees physical reality, photorealism has additional reliance on the realistic digital model of geometry and appearance of the real world, leaving a barely explored gap from PBR to PRR (P2P). Consequently, the path toward photorealism faces a critical dilemma: the explicit simulation of PRR encumbered by unreachable realistic digital models for real-world existence, while implicit generation models sacrifice controllability and geometric consistency. Based on this insight, this paper presents the problem, data, and approach of mitigating P2P gap, followed by the first multi-modal generative rendering model, dubbed GeRM, to unify PBR and PRR. GeRM integrates physical attributes like G-buffers with text prompts, and progressive incremental injection to generate controllable photorealistic images, allowing users to fluidly navigate the continuum between strict physical fidelity and perceptual photorealism. Technically, we model the transition between PBR and PRR images as a distribution transfer and aim to learn a distribution transfer vector field (DTV Field) to guide this process. To define the learning objective, we first leverage a multi-agent VLM framework to construct an expert-guided pairwise P2P transfer dataset, named P2P-50K, where each paired sample in the dataset corresponds to a transfer vector in the DTV Field. Subsequently, we propose a multi-condition ControlNet to learn the DTV Field, which synthesizes PBR images and progressively transitions them into PRR images, guided by G-buffers, text prompts, and cues for enhanced regions. △ Less

Submitted 10 April, 2026; originally announced April 2026.

arXiv:2604.09201 [pdf, ps, other]

CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation

Authors: Haoyu Zhao, Zihao Zhang, Jiaxi Gu, Haoran Chen, Qingping Zheng, Pin Tang, Yeyin Jin, Yuang Zhang, Junqi Cheng, Zenghui Lu, Peng Shu, Zuxuan Wu, Yu-Gang Jiang

Abstract: Camera-controllable video generation aims to synthesize videos with flexible and physically plausible camera movements. However, existing methods either provide imprecise camera control from text prompts or rely on labor-intensive manual camera trajectory parameters, limiting their use in automated scenarios. To address these issues, we propose a novel Vision-Language-Camera model, termed CT-1 (Ca… ▽ More Camera-controllable video generation aims to synthesize videos with flexible and physically plausible camera movements. However, existing methods either provide imprecise camera control from text prompts or rely on labor-intensive manual camera trajectory parameters, limiting their use in automated scenarios. To address these issues, we propose a novel Vision-Language-Camera model, termed CT-1 (Camera Transformer 1), a specialized model designed to transfer spatial reasoning knowledge to video generation by accurately estimating camera trajectories. Built upon vision-language modules and a Diffusion Transformer model, CT-1 employs a Wavelet-based Regularization Loss in the frequency domain to effectively learn complex camera trajectory distributions. These trajectories are integrated into a video diffusion model to enable spatially aware camera control that aligns with user intentions. To facilitate the training of CT-1, we design a dedicated data curation pipeline and construct CT-200K, a large-scale dataset containing over 47M frames. Experimental results demonstrate that our framework successfully bridges the gap between spatial reasoning and video synthesis, yielding faithful and high-quality camera-controllable videos and improving camera control accuracy by 25.7% over prior methods. △ Less

Submitted 10 April, 2026; originally announced April 2026.

arXiv:2604.08922 [pdf, ps, other]

Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios

Authors: Yu Shi, Yu Liu, Zhong-Cheng Wu, Juan Cheng, Huafeng Li, Xun Chen

Abstract: Complex degradations like noise, blur, and low resolution are typical challenges in real world image fusion tasks, limiting the performance and practicality of existing methods. End to end neural network based approaches are generally simple to design and highly efficient in inference, but their black-box nature leads to limited interpretability. Diffusion based methods alleviate this to some exte… ▽ More Complex degradations like noise, blur, and low resolution are typical challenges in real world image fusion tasks, limiting the performance and practicality of existing methods. End to end neural network based approaches are generally simple to design and highly efficient in inference, but their black-box nature leads to limited interpretability. Diffusion based methods alleviate this to some extent by providing powerful generative priors and a more structured inference process. However, they are trained to learn a single domain target distribution, whereas fusion lacks natural fused data and relies on modeling complementary information from multiple sources, making diffusion hard to apply directly in practice. To address these challenges, this paper proposes an efficient degradation aware diffusion framework for image fusion under arbitrary degradation scenarios. Specifically, instead of explicitly predicting noise as in conventional diffusion models, our method performs implicit denoising by directly regressing the fused image, enabling flexible adaptation to diverse fusion tasks under complex degradations with limited steps. Moreover, we design a joint observation model correction mechanism that simultaneously imposes degradation and fusion constraints during sampling to ensure high reconstruction accuracy. Experiments on diverse fusion tasks and degradation configurations demonstrate the superiority of the proposed method under complex degradation scenarios. △ Less

Submitted 9 April, 2026; originally announced April 2026.

Comments: Accepted by CVPR 2026

arXiv:2604.08631 [pdf, ps, other]

Test of lepton flavour universality with $B^0\to K^{*0}\ell^+\ell^-$ decays at large dilepton invariant mass

Authors: LHCb collaboration, R. Aaij, M. Abdelfatah, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, R. Aleksiejunas, F. Alessio, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1113 additional authors not shown)

Abstract: Muon-electron universality is tested in $B^0 \to K^{*0} \ \ell^+ \ell^-$ decays, in the dilepton-invariant-mass region above the $ψ(2S)$ resonance. The analysis uses beauty mesons produced in proton-proton collisions recorded by the LHCb detector at center-of-mass energies of 7, 8, and 13 $\text{TeV}$, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$. The ratio of branching fraction… ▽ More Muon-electron universality is tested in $B^0 \to K^{*0} \ \ell^+ \ell^-$ decays, in the dilepton-invariant-mass region above the $ψ(2S)$ resonance. The analysis uses beauty mesons produced in proton-proton collisions recorded by the LHCb detector at center-of-mass energies of 7, 8, and 13 $\text{TeV}$, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$. The ratio of branching fractions between the muon and electron channels, $R_{K^{*0}}$, is measured to be $1.08\,^{+0.14}_{-0.12}\text{(stat)} \ \pm 0.07\text{(syst)}$ for a dilepton-invariant-mass squared above 14.0 $\text{GeV}^{2}/\text{c}^{4}$, consistent with the Standard Model prediction. This result represents the most precise measurement of $R_{K^{*0}}$ in this region and the first such measurement performed at a hadron collider. △ Less

Submitted 9 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/5670 (LHCb public pages)

Report number: LHCb-PAPER-2025-066, CERN-EP-2026-064

arXiv:2604.08497 [pdf, ps, other]

Bridging the Gap between Micro-scale Traffic Simulation and 4D Digital Cityscapes

Authors: Longxiang Jiao, Lukas Hofmann, Yiru Yang, Zhanyi Wu, Jonas Egeler

Abstract: While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photorealistic, geospatially accurate VR representation of Zurich in Unreal Engine 5. Our… ▽ More While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photorealistic, geospatially accurate VR representation of Zurich in Unreal Engine 5. Our architecture implements a robust C++ data pipeline for synchronized vehicle visualization and features an Open Sound Control (OSC) interface to support external auralization engines. We validate the framework through a user study assessing the correlation between simulated traffic dynamics and human perception. Results demonstrate a high degree of perceptual alignment, where users correctly interpret safety risks from the 4D simulation. Furthermore, our findings indicate that the inclusion of spatialized audio alters the user's sense of safety, showing the importance of multimodality in traffic simulations. △ Less

Submitted 9 April, 2026; originally announced April 2026.

arXiv:2604.08396 [pdf, ps, other]

Search for the lepton-flavour violating decays $B^+ \to π^+ μ^\pm e^\mp$

Authors: LHCb collaboration, R. Aaij, M. Abdelfatah, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, R. Aleksiejunas, F. Alessio, P. Alvarez Cartelle, S. Amato, J. L. Amey, Y. Amhis, L. An, L. Anderlini , et al. (1105 additional authors not shown)

Abstract: The first search for the lepton-flavour violating decays $B^+ \to π^+ μ^{\pm} e^{\mp}$ in proton-proton collisions is presented, using data collected by the LHCb experiment between 2011 and 2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signal is observed and an upper limit on the branching fraction is set at… ▽ More The first search for the lepton-flavour violating decays $B^+ \to π^+ μ^{\pm} e^{\mp}$ in proton-proton collisions is presented, using data collected by the LHCb experiment between 2011 and 2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signal is observed and an upper limit on the branching fraction is set at $\mathcal{B}(B^+ \to π^+ μ^{\pm} e^{\mp}) < 1.8 \times 10^{-9}$ at the $90\%$ confidence level, two orders of magnitude more restrictive than the current world average. This is the first constraint on lepton-flavour violating $b \to d$ quark transitions at the LHC and also sets the most stringent upper limits to date on $b \to d μ^{\pm} e^{\mp}$ transitions. Limits on left-handed and scalar scenarios beyond the Standard Model are also reported. △ Less

Submitted 9 April, 2026; originally announced April 2026.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://lbfence.cern.ch/alcm/public/analysis/full-details/6012/ (LHCb public pages)

Report number: LHCb-PAPER-2026-013, CERN-EP-2026-093

arXiv:2604.08297 [pdf, ps, other]

Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models

Authors: Weiwei Qi, Zefeng Wu, Tianhang Zheng, Zikang Zhang, Xiaojun Jia, Zhan Qin, Kui Ren

Abstract: Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of precise and reliable methodologies for safety intervention across diverse tasks. To better understand and control LLM safety, we propose the Expected Safety Impact (ESI) framework for quantifying how different parameters affect LLM safety. Based on ESI, we… ▽ More Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of precise and reliable methodologies for safety intervention across diverse tasks. To better understand and control LLM safety, we propose the Expected Safety Impact (ESI) framework for quantifying how different parameters affect LLM safety. Based on ESI, we reveal distinct safety-critical patterns across different LLM architectures: In dense LLMs, many safety-critical parameters are located in value matrices (V) and MLPs in middle layers, whereas in Mixture-of-Experts (MoE) models, they shift to the late-layer MLPs. Leveraging ESI, we further introduce two targeted intervention paradigms for safety enhancement and preservation, i.e., Safety Enhancement Tuning (SET) and Safety Preserving Adaptation (SPA). SET can align unsafe LLMs by updating only a few safety-critical parameters, effectively enhancing safety while preserving original performance. SPA safeguards well-aligned LLMs during capability-oriented intervention (e.g., instruction tuning) by preventing disruption of safety-critical weights, allowing the LLM to acquire new abilities and maintain safety capabilities. Extensive evaluations on different LLMs demonstrate that SET can reduce the attack success rates of unaligned LLMs by over 50% with only a 100-iteration update on 1% of model weights. SPA can limit the safety degradation of aligned LLMs within 1% after a 1,000-iteration instruction fine-tuning on different tasks. Our code is available at: https://github.com/ZJU-LLM-Safety/SafeWeights-ACL. △ Less

Submitted 9 April, 2026; originally announced April 2026.

Comments: 20 pages, 6 figures, 8 tables

arXiv:2604.07615 [pdf, ps, other]

ADAG: Automatically Describing Attribution Graphs

Authors: Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt, Sarah Schwettmann

Abstract: In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular output and how they affected each other, with the goal of explaining the computations underlying some behaviour. However, all prior circuit tracing work has relied on ad-hoc human interpretation of the role that each feature in the circuit plays, via m… ▽ More In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular output and how they affected each other, with the goal of explaining the computations underlying some behaviour. However, all prior circuit tracing work has relied on ad-hoc human interpretation of the role that each feature in the circuit plays, via manual inspection of data artifacts such as the dataset examples the component activates on. We introduce \textbf{ADAG}, an end-to-end pipeline for describing these attribution graphs which is fully automated. To achieve this, we introduce \textit{attribution profiles} which quantify the functional role of a feature via its input and output gradient effects. We then introduce a novel clustering algorithm for grouping features, and an LLM explainer--simulator setup which generates and scores natural-language explanations of the functional role of these feature groups. We run our system on known human-analysed circuit-tracing tasks and recover interpretable circuits, and further show ADAG can find steerable clusters which are responsible for a harmful advice jailbreak in Llama 3.1 8B Instruct. △ Less

Submitted 8 April, 2026; originally announced April 2026.

ACM Class: I.2.7

arXiv:2604.07364 [pdf, ps, other]

doi 10.1007/978-3-032-21321-1_12

Improving Search Suggestions for Alphanumeric Queries

Authors: Samarth Agrawal, Jayanth Yetukuri, Diptesh Kanojia, Qunzhi Zhou, Zhe Wu

Abstract: Alphanumeric identifiers such as manufacturer part numbers (MPNs), SKUs, and model codes are ubiquitous in e-commerce catalogs and search. These identifiers are sparse, non linguistic, and highly sensitive to tokenization and typographical variation, rendering conventional lexical and embedding based retrieval methods ineffective. We propose a training free, character level retrieval framework tha… ▽ More Alphanumeric identifiers such as manufacturer part numbers (MPNs), SKUs, and model codes are ubiquitous in e-commerce catalogs and search. These identifiers are sparse, non linguistic, and highly sensitive to tokenization and typographical variation, rendering conventional lexical and embedding based retrieval methods ineffective. We propose a training free, character level retrieval framework that encodes each alphanumeric sequence as a fixed length binary vector. This representation enables efficient similarity computation via Hamming distance and supports nearest neighbor retrieval over large identifier corpora. An optional re-ranking stage using edit distance refines precision while preserving latency guarantees. The method offers a practical and interpretable alternative to learned dense retrieval models, making it suitable for production deployment in search suggestion generation systems. Significant gains in business metrics in the A/B test further prove utility of our approach. △ Less

Submitted 1 April, 2026; originally announced April 2026.

Comments: Published in Advances in Information Retrieval, 48th European Conference on Information Retrieval, ECIR 2026

arXiv:2604.07146 [pdf, ps, other]

Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering

Authors: Zhuohong Chen, Zhenxian Wu, Yunyao Yu, Hangrui Xu, Zirui Liao, Zhifang Liu, Xiangwen Deng, Pen Jiao, Haoqian Wang

Abstract: Knowledge-based visual question answering (KB-VQA) requires vision-language models to understand images and use external knowledge, especially for rare entities and long-tail facts. Most existing retrieval-augmented generation (RAG) methods adopt a fixed pipeline that sequentially retrieves information, filters it, and then produces an answer. Such a design makes it difficult to adapt to diverse q… ▽ More Knowledge-based visual question answering (KB-VQA) requires vision-language models to understand images and use external knowledge, especially for rare entities and long-tail facts. Most existing retrieval-augmented generation (RAG) methods adopt a fixed pipeline that sequentially retrieves information, filters it, and then produces an answer. Such a design makes it difficult to adapt to diverse question types. Moreover, it separates retrieval from reasoning, making it hard for the model to decide when to search, how to refine queries, or when to stop. As a result, the retrieved evidence is often poorly aligned with the question. To address these limitations, we reformulate KB-VQA as a search-agent problem and model the solving process as a multi-step decision-making procedure. At each step, the agent selects one of four actions-Answer, Image Retrieval, Text Retrieval, and Caption-based on its current information state. We further design an automated pipeline to collect multi-step trajectories that record the agent's reasoning process, tool usage, and intermediate decisions. These trajectories are then used as supervision for fine-tuning. Experiments on InfoSeek and E-VQA demonstrate that our method achieves state-of-the-art performance, consistently outperforming prior baselines and confirming the effectiveness of our framework. △ Less

Submitted 9 April, 2026; v1 submitted 8 April, 2026; originally announced April 2026.

Showing 1–50 of 5,508 results for author: Wu, Z