-
High-temperature charge-4e superconductivity in SU(4) interacting fermions
Authors:
Shao-Hang Shi,
Zhengzhi Wu,
Jiangping Hu,
Zi-Xiang Li
Abstract:
The condensation of electron quartets, known as charge-4e superconductivity (SC), represents a novel quantum state of matter beyond the standard paradigm of Cooper pairing. However, concrete microscopic models realizing this phase in two dimensions remain a central challenge. Here, we introduce a non-engineered and sign-problem-free model, unambiguously demonstrating the emergence of a robust and…
▽ More
The condensation of electron quartets, known as charge-4e superconductivity (SC), represents a novel quantum state of matter beyond the standard paradigm of Cooper pairing. However, concrete microscopic models realizing this phase in two dimensions remain a central challenge. Here, we introduce a non-engineered and sign-problem-free model, unambiguously demonstrating the emergence of a robust and high-temperature charge-4e SC phase using unbiased quantum Monte Carlo simulations. At zero temperature, the phase diagram reveals that charge-4e SC is the primary ground state in the strong-coupling regime. At finite temperature in the absence of charge-2e SC, we identify charge-4e SC through a Berezinskii-Kosterlitz-Thouless transition, marked by a universal jump in the superfluid stiffness consistent with a condensate of charge 4e. Remarkably, the transition temperature Tc increases nearly linearly with interaction strength, providing a robust mechanism for high-Tc quartet superconductivity. Furthermore, spectral analysis reveals a prominent pseudogap above Tc arising from strong phase fluctuations. Our results establish a canonical and numerically exact model system for charge-4e superconductivity, offering crucial guidance for its realization in experimental platforms such as moiré materials and ultracold atomic systems.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
Study of the $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ decay
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1111 additional authors not shown)
Abstract:
The decay $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ is studied at LHCb for the first time using proton-proton collision data recorded by the LHCb experiment at a center-of-mass energy of $\sqrt{s} = 13$ TeV, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. The branching ratio relative to the decay $B^+ \to Λ_c^+ \barΛ_c^- K^+$ is measured to be…
▽ More
The decay $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ is studied at LHCb for the first time using proton-proton collision data recorded by the LHCb experiment at a center-of-mass energy of $\sqrt{s} = 13$ TeV, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. The branching ratio relative to the decay $B^+ \to Λ_c^+ \barΛ_c^- K^+$ is measured to be
$$ \frac{{\cal B}(B^0 \to Λ_c^+ \barΛ_c^- K_S^0)}{{\cal B}(B^+ \to Λ_c^+ \barΛ_c^- K^+)} = 0.53 \pm 0.05 \pm 0.05, $$ where the first uncertainty is statistical and the second is systematic. Evidence is found for contributions from two resonant states, $Ξ_c(2923)^+$ and $Ξ_c(2939)^+$, in the $Λ_c^+ K_S^0$ system. The two states show a significance of $3.9σ$ relative to the nonresonant hypothesis. These two $Ξ_c^+$ states are consistent with being the isospin partners of the states observed in $Λ_c^+ K^-$ system.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation
Authors:
Ziyu Shan,
Yuheng Zhou,
Gaoyuan Wu,
Ziheng Ji,
Zhenyu Wu,
Ziwei Wang
Abstract:
Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization…
▽ More
Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization problem due to shifts of docking points. To address this issue, we propose a novel low-cost demonstration generation framework named DockAnywhere, which improves viewpoint generalization under docking variability by lifting a single demonstration to diverse feasible docking configurations. Specifically, DockAnywhere lifts a trajectory to any feasible docking points by decoupling docking-dependent base motions from contact-rich manipulation skills that remain invariant across viewpoints. Feasible docking proposals are sampled under feasibility constraints, and corresponding trajectories are generated via structure-preserving augmentation. Visual observations are synthesized in 3D space by representing the robot and objects as point clouds and applying point-level spatial editing to ensure the consistency of observation and action across viewpoints. Extensive experiments on ManiSkill and real-world platforms demonstrate that DockAnywhere substantially improves policy success rates and easily generalizes to novel viewpoints from unseen docking points during training, significantly enhancing the generalization capability of mobile manipulation policy in real-world deployment.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
MetaDent: Labeling Clinical Images for Vision-Language Models in Dentistry
Authors:
Meng-Xun Li,
Wen-Hui Deng,
Zhi-Xing Wu,
Chun-Xiao Jin,
Jia-Min Wu,
Yue Han,
James Kit Hon Tsoi,
Gui-Song Xia,
Cui Huang
Abstract:
Vision-Language Models (VLMs) have demonstrated significant potential in medical image analysis, yet their application in intraoral photography remains largely underexplored due to the lack of fine-grained, annotated datasets and comprehensive benchmarks. To address this, we present MetaDent, a comprehensive resource that includes (1) a novel and large-scale dentistry image dataset collected from…
▽ More
Vision-Language Models (VLMs) have demonstrated significant potential in medical image analysis, yet their application in intraoral photography remains largely underexplored due to the lack of fine-grained, annotated datasets and comprehensive benchmarks. To address this, we present MetaDent, a comprehensive resource that includes (1) a novel and large-scale dentistry image dataset collected from clinical, public, and web sources; (2) a semi-structured annotation framework designed to capture the hierarchical and clinically nuanced nature of dental photography; and (3) comprehensive benchmark suites for evaluating state-of-the-art VLMs on clinical image understanding. Our labeling approach combines a high-level image summary with point-by-point, free-text descriptions of abnormalities. This method enables rich, scalable, and task-agnostic representations. We curated 60,669 dental images from diverse sources and annotated a representative subset of 2,588 images using this meta-labeling scheme. Leveraging Large Language Models (LLMs), we derive standardized benchmarks: approximately 15K Visual Question Answering (VQA) pairs and an 18-class multi-label classification dataset, which we validated with human review and error analysis to justify that the LLM-driven transition reliably preserves fidelity and semantic accuracy. We then evaluate state-of-the-art VLMs across VQA, classification, and image captioning tasks. Quantitative results reveal that even the most advanced models struggle with a fine-grained understanding of intraoral scenes, achieving moderate accuracy and producing inconsistent or incomplete descriptions in image captioning. We publicly release our dataset, annotations, and tools to foster reproducible research and accelerate the development of vision-language systems for dental applications.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
Coherence dynamics in quantum algorithm for linear systems of equations
Authors:
Linlin Ye,
Zhaoqi Wu,
Shao-Ming Fei
Abstract:
Quantum coherence is a fundamental issue in quantum mechanics and quantum information processing. We explore the coherence dynamics of the evolved states in HHL quantum algorithm for solving the linear system of equation $A\overrightarrow{x}=\overrightarrow{b}$. By using the Tsallis relative $α$ entropy of coherence and the $l_{1,p}$ norm of coherence, we show that the operator coherence of the ph…
▽ More
Quantum coherence is a fundamental issue in quantum mechanics and quantum information processing. We explore the coherence dynamics of the evolved states in HHL quantum algorithm for solving the linear system of equation $A\overrightarrow{x}=\overrightarrow{b}$. By using the Tsallis relative $α$ entropy of coherence and the $l_{1,p}$ norm of coherence, we show that the operator coherence of the phase estimation $P$ relies on the coefficients $β_{i}$ obtained by decomposing $|b\rangle$ in the eigenbasis of $A$. We prove that the operator coherence of the inverse phase estimation $\widetilde{P}$ relies on the coefficients $β_{i}$, eigenvalues of $A$ and the success probability $P_{s}$, and it decreases with the increase of the probability when $α\in(1,2]$. Moreover, the variations of coherence deplete with the increase of the success probability and rely on the eigenvalues of $A$ as well as the success probability.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
Chain-of-Glimpse: Search-Guided Progressive Object-Grounded Reasoning for Video Understanding
Authors:
Zhixuan Wu,
Quanxing Zha,
Teng Wang,
Genbao Xu,
Wenyuan Gu,
Wei Rao,
Nan Ma,
Bo Cheng,
Soujanya Poria
Abstract:
Video understanding requires identifying and reasoning over semantically discriminative visual objects across frames, yet existing object-agnostic solutions struggle to effectively handle substantial object variations over time. To address this, we introduce Chain-of-Glimpse, a search-guided progressive object-grounded reasoning framework that explicitly anchors each reasoning step to specific vis…
▽ More
Video understanding requires identifying and reasoning over semantically discriminative visual objects across frames, yet existing object-agnostic solutions struggle to effectively handle substantial object variations over time. To address this, we introduce Chain-of-Glimpse, a search-guided progressive object-grounded reasoning framework that explicitly anchors each reasoning step to specific visual evidence regions, enabling compositional and multi-step decision-making. Formally, Chain-of-Glimpse formulates video reasoning as a step-by-step process that incrementally builds spatially grounded traces around task-relevant visual objects, thereby mitigating over-reliance on saliency-driven cues. Specifically, Chain-of-Glimpse features a search-guided controller, optimized via reinforcement learning with a format reward that significantly incentivizes grounding capability, to iteratively ground visual evidence regions and form reliable reasoning trajectories, yielding accurate and interpretable multi-step decisions. Extensive evaluations on both in domain NExTQA and out-of-domain Video-Holmes, CG-Bench Reasoning, and VRBench benchmarks demonstrate consistent performance gains, robustness and generalization of Chain-of-Glimpse across diverse video reasoning tasks.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime
Authors:
Jianhao Su,
Zhanwei Wu,
ShengTing Huang,
Weidong Feng
Abstract:
Edge AI model deployment is a multi-stage engineering process involving model conversion, operator compatibility handling, quantization calibration, runtime integration, and accuracy validation. In
practice, this workflow is long, failure-prone, and heavily dependent on deployment expertise, particularly when targeting hardware-specific inference runtimes. This technical report presents AIPC (AI…
▽ More
Edge AI model deployment is a multi-stage engineering process involving model conversion, operator compatibility handling, quantization calibration, runtime integration, and accuracy validation. In
practice, this workflow is long, failure-prone, and heavily dependent on deployment expertise, particularly when targeting hardware-specific inference runtimes. This technical report presents AIPC (AI
Porting Conversion), an AI agent-driven approach for constrained automation of AI model deployment. AIPC decomposes deployment into standardized, verifiable stages and injects deployment-domain knowledge
into agent execution through Agent Skills, helper scripts, and a stage-wise validation loop. This design reduces both the expertise barrier and the engineering time required for hardware deployment.
Using Qualcomm AI Runtime (QAIRT) as the primary scenario, this report examines automated deployment across representative vision, multimodal, and speech models. In the cases covered here, AIPC can
complete deployment from PyTorch to runnable QNN/SNPE inference within 7-20 minutes for structurally regular vision models, with indicative API costs roughly in the range of USD 0.7-10. For more complex
models involving less-supported operators, dynamic shapes, or autoregressive decoding structures, fully automated deployment may still require further advances, but AIPC already provides practical support
for execution, failure localization, and bounded repair.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
Closing the Observational Gap in Cosmic Dynamics: AI-Enabled Reconstruction of the Universe's Vorticity and Rotational Flow Morphology
Authors:
Ziyong Wu,
Xu Xiao,
Fuyu Dong,
Juhan Kim,
Yan-Chuan Cai,
Yang Wang,
Xi Kang,
Le Zhang,
Xin Wang,
Xiao-Dong Li
Abstract:
The cosmic vorticity field, an essential tracer of nonlinear structure formation, has remained observationally inaccessible because transverse galaxy motions are difficult to measure and analytic models struggle to capture shell-crossing. Here we report an empirical reconstruction of this field by applying an artificial intelligence framework trained on simulations of the concordance LambdaCDM mod…
▽ More
The cosmic vorticity field, an essential tracer of nonlinear structure formation, has remained observationally inaccessible because transverse galaxy motions are difficult to measure and analytic models struggle to capture shell-crossing. Here we report an empirical reconstruction of this field by applying an artificial intelligence framework trained on simulations of the concordance LambdaCDM model to Sloan Digital Sky Survey galaxies. The recovered three-dimensional velocity and vorticity fields reveal coherent vortical structures, including spiral flows in clusters, filaments, and voids, and the cosmic web inferred from vorticity closely matches that derived from density segmentation. The power spectra of the reconstructed velocity and vorticity fields agree statistically with LambdaCDM predictions, and the inferred velocity field effectively removes redshift-space distortions, yielding an almost isotropic clustering signal. These converging lines of evidence, obtained from an independent perspective, reinforce the concordance cosmological model. By closing a long-standing observational gap, our results highlight the potential of AI-driven reconstruction to access otherwise unobservable quantities and to address fundamental questions in cosmology and galaxy formation.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
VoxSafeBench: Not Just What Is Said, but Who, How, and Where
Authors:
Yuxiang Wang,
Hongyu Liu,
Yijiang Xu,
Qinke Ni,
Li Wang,
Wan Lin,
Kunyu Feng,
Dekun Chen,
Xu Tan,
Lei Wang,
Jie Shi,
Zhizheng Wu
Abstract:
As speech language models (SLMs) transition from personal devices into shared, multi-user environments, their responses must account for far more than the words alone. Who is speaking, how they sound, and where the conversation takes place can each turn an otherwise benign request into one that is unsafe, unfair, or privacy-violating. Existing benchmarks, however, largely focus on basic audio comp…
▽ More
As speech language models (SLMs) transition from personal devices into shared, multi-user environments, their responses must account for far more than the words alone. Who is speaking, how they sound, and where the conversation takes place can each turn an otherwise benign request into one that is unsafe, unfair, or privacy-violating. Existing benchmarks, however, largely focus on basic audio comprehension, study individual risks in isolation, or conflate content that is inherently harmful with content that only becomes problematic due to its acoustic context. We introduce VoxSafeBench, among the first benchmarks to jointly evaluate social alignment in SLMs across three dimensions: safety, fairness, and privacy. VoxSafeBench adopts a Two-Tier design: Tier1 evaluates content-centric risks using matched text and audio inputs, while Tier2 targets audio-conditioned risks in which the transcript is benign but the appropriate response hinges on the speaker, paralinguistic cues, or the surrounding environment. To validate Tier2, we include intermediate perception probes and confirm that frontier SLMs can successfully detect these acoustic cues yet still fail to act on them appropriately. Across 22 tasks with bilingual coverage, we find that safeguards appearing robust on text often degrade in speech: safety awareness drops for speaker- and scene-conditioned risks, fairness erodes when demographic differences are conveyed vocally, and privacy protections falter when contextual cues arrive acoustically. Together, these results expose a pervasive speech grounding gap: current SLMs frequently recognize the relevant social norm in text but fail to apply it when the decisive cue must be grounded in speech. Code and data are publicly available at: https://amphionteam.github.io/VoxSafeBench_demopage/
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
RoSLAC: Robust Simultaneous Localization and Calibration of Multiple Magnetometers
Authors:
Qiyang Lyu,
Zhenyu Wu,
Wei Wang,
Hongming Shen,
Danwei Wang
Abstract:
Localization of autonomous mobile robots (AMRs) in enclosed or semi-enclosed environments such as offices, hotels, hospitals, indoor parking facilities, and underground spaces where GPS signals are weak or unavailable remains a major obstacle to the deployment of fully autonomous systems. Infrastructure-based localization approaches, such as QR codes and RFID, are constrained by high installation…
▽ More
Localization of autonomous mobile robots (AMRs) in enclosed or semi-enclosed environments such as offices, hotels, hospitals, indoor parking facilities, and underground spaces where GPS signals are weak or unavailable remains a major obstacle to the deployment of fully autonomous systems. Infrastructure-based localization approaches, such as QR codes and RFID, are constrained by high installation and maintenance costs as well as limited flexibility, while onboard sensor-based methods, including LiDAR- and vision-based solutions, are affected by ambiguous geometric features and frequent occlusions caused by dynamic obstacles such as pedestrians. Ambient magnetic field (AMF)-based localization has therefore attracted growing interest in recent years because it does not rely on external infrastructure or geometric features, making it well-suited for AMR applications such as service robots and security robots. However, magnetometer measurements are often corrupted by distortions caused by ferromagnetic materials present on the sensor platform, which bias the AMF and degrade localization reliability. As a result, accurate magnetometer calibration to estimate distortion parameters becomes essential. Conventional calibration methods that rely on rotating the magnetometer are impractical for large and heavy platforms. To address this limitation, this paper proposes a robust simultaneous localization and calibration (RoSLAC) approach based on alternating optimization, which iteratively and efficiently estimates both the platform pose and magnetometer calibration parameters. Extensive evaluations conducted in high-fidelity simulation and real-world environments demonstrate that the proposed RoSLAC method achieves high localization accuracy while maintaining low computational cost compared with state-of-the-art magnetometer calibration techniques.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Exact Toda Black Holes of Rank-2 Lie Groups
Authors:
H. Lu,
Peng-Yu Wu,
Ze-Hua Wu,
Weicheng Zhao
Abstract:
We consider Einstein gravity coupled to two Maxwell fields and one dilatonic scalar, and construct spherically-symmetric and static black holes that are charged under both Maxwell fields in general $D$ dimensions. We find that for suitable dilaton couplings, the equations of motion can be cast into one-dimensional Toda equations of all rank-2 Lie groups. We devise a brute-force approach to obtain…
▽ More
We consider Einstein gravity coupled to two Maxwell fields and one dilatonic scalar, and construct spherically-symmetric and static black holes that are charged under both Maxwell fields in general $D$ dimensions. We find that for suitable dilaton couplings, the equations of motion can be cast into one-dimensional Toda equations of all rank-2 Lie groups. We devise a brute-force approach to obtain the most general but remarkably elegant solutions to the Toda equations. This allows us to construct exact black holes associated with all the rank-2 Lie groups. The $B_2$ and $G_2$ Toda black holes are new. We study their thermodynamics and verify explicitly an earlier claim in the literature that all these thermodynamic quantities can be derived without having to solve for these black hole solutions.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Tsallis relative $α$ entropy of coherence dynamics in Grover's search algorithm
Authors:
Linlin Ye,
Zhaoqi Wu,
Shao-Ming Fei
Abstract:
Quantum coherence plays a central role in Grover's search algorithm. We study the Tsallis relative $α$ entropy of coherence dynamics of the evolved state in Grover's search algorithm. We prove that the Tsallis relative $α$ entropy of coherence decreases with the increase of the success probability, and derive the complementarity relations between the coherence and the success probability. We show…
▽ More
Quantum coherence plays a central role in Grover's search algorithm. We study the Tsallis relative $α$ entropy of coherence dynamics of the evolved state in Grover's search algorithm. We prove that the Tsallis relative $α$ entropy of coherence decreases with the increase of the success probability, and derive the complementarity relations between the coherence and the success probability. We show that the operator coherence of the first $H^{\otimes n}$ relies on the size of the database $N$, the success probability and the target states. Moreover, we illustrate the relationships between coherence and entanglement of the superposition state of targets, as well as the production and deletion of coherence in Grover iterations.
△ Less
Submitted 16 April, 2026; v1 submitted 15 April, 2026;
originally announced April 2026.
-
Realistic Detector Geometry Modeling and Its Impact on Event Reconstruction in JUNO
Authors:
Zhaoxiang Wu,
Miao He,
Wuming Luo,
Ziyan Deng,
Wei He,
Yuekun Heng,
Xiaoping Jing,
Bo Li,
Xiaoyan Ma,
Xiaohui Qian,
Zhonghua Qin,
Yifang Wang,
Peidong Yu
Abstract:
JUNO is designed to determine the neutrino mass ordering with an energy resolution of 3% at 1 MeV. In the real detector, however, deformations of the central stainless-steel structure during installation lead to deviations of the photomultiplier tube (PMT) positions from their design values. Based on the limited survey data of the PMTs and the stainless-steel truss, we perform a correlation analys…
▽ More
JUNO is designed to determine the neutrino mass ordering with an energy resolution of 3% at 1 MeV. In the real detector, however, deformations of the central stainless-steel structure during installation lead to deviations of the photomultiplier tube (PMT) positions from their design values. Based on the limited survey data of the PMTs and the stainless-steel truss, we perform a correlation analysis of the measured points and propose a method to predict the positions of all PMTs. Using the resulting realistic geometry, we demonstrate that the detector deformation has a negligible effect on the energy reconstruction. In contrast, inaccuracies in the assumed geometry can introduce vertex biases of up to 40 mm. Incorporating the realistic geometry into the calibration-based PMT response model removes this bias and preserves the stability of the reconstruction algorithms.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Measurement of the $W$-boson production cross-sections in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1112 additional authors not shown)
Abstract:
A precision measurement of the $W$-boson production cross-section is performed using the $W \to μν$ decay channel, based on a sample of proton-proton collision data collected by the LHCb experiment at $\sqrt{s}$ = 13 TeV and corresponding to an integrated luminosity of 5.1 $fb^{-1}$. The cross-section is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2…
▽ More
A precision measurement of the $W$-boson production cross-section is performed using the $W \to μν$ decay channel, based on a sample of proton-proton collision data collected by the LHCb experiment at $\sqrt{s}$ = 13 TeV and corresponding to an integrated luminosity of 5.1 $fb^{-1}$. The cross-section is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. The integrated production cross-sections of $W$ bosons are measured to be $$ \begin{array}{lcl} σ_{W^+ \to μ^+ν} &=& 1754.2 \pm 1.5 \pm 11.9 \pm 35.1\text{ pb} \\ σ_{W^- \to μ^-\barν} &=& 1178.1 \pm 1.3 \pm 9.7 \pm 23.6\text{ pb} \end{array} $$ where uncertainties are statistical, systematic, and due to the luminosity determination, respectively. Results are in good agreement with theoretical predictions at next-to-next-to-leading order in perturbative quantum chromodynamics. This measurement is significantly more precise than previous results in this kinematic regime.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Precision measurement of the muon charge asymmetry from $W$-boson decays in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1112 additional authors not shown)
Abstract:
A precision measurement of the muon charge asymmetry from $W$-boson decays in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The analysis utilizes data corresponding to an integrated luminosity of 5.1 $fb^{-1}$, recorded by the LHCb detector during 2016, 2017 and 2018. The asymmetry is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 a…
▽ More
A precision measurement of the muon charge asymmetry from $W$-boson decays in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The analysis utilizes data corresponding to an integrated luminosity of 5.1 $fb^{-1}$, recorded by the LHCb detector during 2016, 2017 and 2018. The asymmetry is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. This result represents the most precise determination of the muon charge asymmetry in the forward region to date, exhibiting excellent agreement with next-to-next-to-leading-order predictions in perturbative quantum chromodynamics.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (728 additional authors not shown)
Abstract:
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$,…
▽ More
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$.
△ Less
Submitted 14 April, 2026; v1 submitted 14 April, 2026;
originally announced April 2026.
-
HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models
Authors:
Zixing Chen,
Yifeng Gao,
Li Wang,
Yunhan Zhao,
Yi Liu,
Jiayu Li,
Xiang Zheng,
Zuxuan Wu,
Cong Wang,
Xingjun Ma,
Yu-Gang Jiang
Abstract:
Vision-Language-Action (VLA) models inherit rich world knowledge from vision-language backbones and acquire executable skills via action demonstrations. However, existing evaluations largely focus on action execution success, leaving action policies loosely coupled with visual-linguistic semantics. This decoupling exposes a systematic vulnerability whereby correct action execution may induce unsaf…
▽ More
Vision-Language-Action (VLA) models inherit rich world knowledge from vision-language backbones and acquire executable skills via action demonstrations. However, existing evaluations largely focus on action execution success, leaving action policies loosely coupled with visual-linguistic semantics. This decoupling exposes a systematic vulnerability whereby correct action execution may induce unsafe outcomes under semantic risk. To expose this vulnerability, we introduce HazardArena, a benchmark designed to evaluate semantic safety in VLAs under controlled yet risk-bearing contexts. HazardArena is constructed from safe/unsafe twin scenarios that share matched objects, layouts, and action requirements, differing only in the semantic context that determines whether an action is unsafe. We find that VLA models trained exclusively on safe scenarios often fail to behave safely when evaluated in their corresponding unsafe counterparts. HazardArena includes over 2,000 assets and 40 risk-sensitive tasks spanning 7 real-world risk categories grounded in established robotic safety standards. To mitigate this vulnerability, we propose a training-free Safety Option Layer that constrains action execution using semantic attributes or a vision-language judge, substantially reducing unsafe behaviors with minimal impact on task performance. We hope that HazardArena highlights the need to rethink how semantic safety is evaluated and enforced in VLAs as they scale toward real-world deployment.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
Authors:
Zhaofen Wu,
Hanrong Zhang,
Fulin Lin,
Wujiang Xu,
Xinran Xu,
Yankai Chen,
Henry Peng Zou,
Shaowen Chen,
Weizhi Zhang,
Xue Liu,
Philip S. Yu,
Hongwei Wang
Abstract:
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often st…
▽ More
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information and retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to evolving narratives. To address this, we propose GAM, a hierarchical Graph-based Agentic Memory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention. By isolating ongoing dialogue in an event progression graph and integrating it into a topic associative network only upon semantic shifts, our approach minimizes interference while preserving long-term consistency. Additionally, we introduce a graph-guided, multi-factor retrieval strategy to enhance context precision. Experiments on LoCoMo and LongDialQA indicate that our method consistently outperforms state-of-the-art baselines in both reasoning accuracy and efficiency.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
ArtifactWorld: Scaling 3D Gaussian Splatting Artifact Restoration via Video Generation Models
Authors:
Xinliang Wang,
Yifeng Shi,
Zhenyu Wu
Abstract:
3D Gaussian Splatting (3DGS) delivers high-fidelity real-time rendering but suffers from geometric and photometric degradations under sparse-view constraints. Current generative restoration approaches are often limited by insufficient temporal coherence, a lack of explicit spatial constraints, and a lack of large-scale training data, resulting in multi-view inconsistencies, erroneous geometric hal…
▽ More
3D Gaussian Splatting (3DGS) delivers high-fidelity real-time rendering but suffers from geometric and photometric degradations under sparse-view constraints. Current generative restoration approaches are often limited by insufficient temporal coherence, a lack of explicit spatial constraints, and a lack of large-scale training data, resulting in multi-view inconsistencies, erroneous geometric hallucinations, and limited generalization to diverse real-world artifact distributions. In this paper, we present ArtifactWorld, a framework that resolves 3DGS artifact repair through systematic data expansion and a homogeneous dual-model paradigm. To address the data bottleneck, we establish a fine-grained phenomenological taxonomy of 3DGS artifacts and construct a comprehensive training set of 107.5K diverse paired video clips to enhance model robustness. Architecturally, we unify the restoration process within a video diffusion backbone, utilizing an isomorphic predictor to localize structural defects via an artifact heatmap. This heatmap then guides the restoration through an Artifact-Aware Triplet Fusion mechanism, enabling precise, intensity-guided spatio-temporal repair within native self-attention. Extensive experiments demonstrate that ArtifactWorld achieves state-of-the-art performance in sparse novel view synthesis and robust 3D reconstruction. Code and dataset will be made public.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results
Authors:
Xingyu Qiu,
Yuqian Fu,
Jiawei Geng,
Bin Ren,
Jiancheng Pan,
Zongwei Wu,
Hao Tang,
Yanwei Fu,
Radu Timofte,
Nicu Sebe,
Mohamed Elhoseiny,
Lingyi Hong,
Mingxi Cheng,
Xingqi He,
Runze Li,
Xingdong Sheng,
Wenqiang Zhang,
Jiacong Liu,
Shu Luo,
Yikai Qin,
Yaze Zhao,
Yongwei Jiang,
Yixiong Zou,
Zhe Zhang,
Yang Yang
, et al. (49 additional authors not shown)
Abstract:
Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The chal…
▽ More
Cross-domain few-shot object detection (CD-FSOD) remains a challenging problem for existing object detectors and few-shot learning approaches, particularly when generalizing across distinct domains. As part of NTIRE 2026, we hosted the second CD-FSOD Challenge to systematically evaluate and promote progress in detecting objects in unseen target domains under limited annotation conditions. The challenge received strong community interest, with 128 registered participants and a total of 696 submissions. Among them, 31 teams actively participated, and 19 teams submitted valid final results. Participants explored a wide range of strategies, introducing innovative methods that push the performance frontier under both open-source and closed-source tracks. This report presents a detailed overview of the NTIRE 2026 CD-FSOD Challenge, including a summary of the submitted approaches and an analysis of the final results across all participating teams. Challenge Codes: https://github.com/ohMargin/NTIRE2026_CDFSOD.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Intense and extended CIII] emission suggests a strong outflow in JADES-GS-z14-0
Authors:
Stefano Carniani,
Peter Jakobsen,
Giacomo Venturi,
Francesco D'Eugenio,
Tobias J. Looser,
Joris Witstok,
Christopher N. A. Willmer,
Andrea Ferrara,
Zihao Wu,
Santiago Arribas,
Andrew J. Bunker,
Stéphane Charlot,
Jacopo Chevallard,
Mirko Curti,
Emma Curtis-Lake,
Daniel J. Eisenstein,
Kevin Hainline,
Jakob M. Helton,
Zhiyuan Ji,
Xihan Ji,
Benjamin D. Johnson,
Mahsa Kohandel,
Nimisha Kumari,
Roberto Maiolino,
Andrea Pallottini
, et al. (9 additional authors not shown)
Abstract:
JWST has revealed an overabundance of very bright, blue galaxies at z>10, raising fundamental questions about how star formation and feedback operate at Cosmic Dawn. We present new JWST/NIRSpec MSA PRISM/CLEAR spectroscopy of JADES-GS-z14-0 (z=14.18) obtained with the JADES and OASIS programmes. While the rest-frame UV continuum flux level and shape are consistent between the two datasets, the OAS…
▽ More
JWST has revealed an overabundance of very bright, blue galaxies at z>10, raising fundamental questions about how star formation and feedback operate at Cosmic Dawn. We present new JWST/NIRSpec MSA PRISM/CLEAR spectroscopy of JADES-GS-z14-0 (z=14.18) obtained with the JADES and OASIS programmes. While the rest-frame UV continuum flux level and shape are consistent between the two datasets, the OASIS spectrum shows a 10$σ$ detection of the CIII]$λ\lambda1907,1909$ emission line, with a luminosity three times higher than that measured in the JADES data. This difference is naturally explained by the offset in shutter placement between OASIS and JADES, implying that the CIII] emission is spatially displaced by $\sim400$ pc from the stellar continuum. The non-detection of CIII] in NIRCam medium-band imaging indicates that the emitting region is extended on scales $\gtrsim165$ pc, with a surface brightness below the detection threshold. Interpreting this diffuse, carbon-enriched gas as the result of ongoing or past outflows, we infer a mass outflow rate of $\dot{M}_{\rm out}\sim160~{\rm M_\odot\,yr^{-1}}$. We compare it with the star-formation rate (SFR) and derive a mass-loading factor of $η= \dot{M}_{\rm out}/{\rm SFR} = 4-15$, suggesting highly efficient feedback at very early times. Finally, we show that, if outflows are one of the mechanisms regulating star formation in JADES-GS-z14-0, the instantaneous star-formation efficiency in massive haloes is constrained to $ε_\star\lesssim0.08$. These results support a scenario in which outflows play a crucial role during the earliest phases of galaxy formation. Comparing our results with the current theoretical galaxy formation model, we conclude that a combination of moderate star-formation efficiency and reduced dust attenuation can account for the emergence of luminous galaxies at the highest redshifts.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents
Authors:
Yijuan Liang,
Xinghao Chen,
Yifan Ge,
Ziyi Wu,
Hao Wu,
Changyu Zeng,
Wei Xing,
Xiaoyu Shen
Abstract:
Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, largely overlooks the structural distribution of tool-use trajectories, and relies on incompatible evaluation benchmarks. We present UniToolCall, a unified framework for tool le…
▽ More
Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems through structured function calls. However, existing research exhibits inconsistent interaction representations, largely overlooks the structural distribution of tool-use trajectories, and relies on incompatible evaluation benchmarks. We present UniToolCall, a unified framework for tool learning that standardizes the entire pipeline from toolset construction and dataset generation to evaluation. The framework curates a large tool pool of 22k+ tools and constructs a hybrid training corpus of 390k+ instances by combining 10 standardized public datasets with structurally controlled synthetic trajectories. It explicitly models diverse interaction patterns, including single-hop vs. multi-hop and single-turn vs. multi-turn, while capturing both serial and parallel execution structures. To support coherent multi-turn reasoning, we further introduce an Anchor Linkage mechanism that enforces cross-turn dependencies. Furthermore, we convert 7 public benchmarks into a unified Query--Action--Observation--Answer (QAOA) representation with fine-grained evaluation at the function-call, turn, and conversation levels. Experiments show that fine-tuning Qwen3-8B on our dataset substantially improves tool-use performance. Under the distractor-heavy Hybrid-20 setting, achieves 93.0% single-turn Strict Precision, outperforming commercial models including GPT, Gemini, and Claude.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora
Authors:
Tao Feng,
Yuxiang Wang,
Yuancheng Wang,
Xueyao Zhang,
Dekun Chen,
Chaoren Wang,
Xun Guan,
Zhizheng Wu
Abstract:
Voice imitation aims to transform source speech to match a reference speaker's timbre and speaking style while preserving linguistic content. A straightforward approach is to train on triplets of (source, reference, target), where source and target share the same content but target matches the reference's voice characteristics, yet such data is extremely scarce. Existing approaches either employ c…
▽ More
Voice imitation aims to transform source speech to match a reference speaker's timbre and speaking style while preserving linguistic content. A straightforward approach is to train on triplets of (source, reference, target), where source and target share the same content but target matches the reference's voice characteristics, yet such data is extremely scarce. Existing approaches either employ carefully designed disentanglement architectures to bypass this data scarcity or leverage external systems to synthesize pseudo-parallel training data. However, the former requires intricate model design, and the latter faces a quality ceiling when synthetic speech is used as training targets. To address these limitations, we propose MimicLM, which takes a novel approach by using synthetic speech as training sources while retaining real recordings as targets. This design enables the model to learn directly from real speech distributions, breaking the synthetic quality ceiling. Building on this data construction approach, we incorporate interleaved text-audio modeling to guide the generation of content-accurate speech and apply post-training with preference alignment to mitigate the inherent distributional mismatch when training on synthetic data. Experiments demonstrate that MimicLM achieves superior voice imitation quality with a simple yet effective architecture, significantly outperforming existing methods in naturalness while maintaining competitive similarity scores across speaker identity, accent, and emotion dimensions.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild
Authors:
Aleksandr Gushchin,
Khaled Abud,
Ekaterina Shumitskaya,
Artem Filippov,
Georgii Bychkov,
Sergey Lavrushkin,
Mikhail Erofeev,
Anastasia Antsiferova,
Changsheng Chen,
Shunquan Tan,
Radu Timofte,
Dmitry Vatolin,
Chuanbiao Song,
Zijian Yu,
Hao Tan,
Jun Lan,
Zhiqiang Yang,
Yongwei Tang,
Zhiqiang Wu,
Jia Wen Seow,
Hong Vin Koay,
Haodong Ren,
Feng Xu,
Shuai Chen,
Ruiyang Xia
, et al. (29 additional authors not shown)
Abstract:
This paper presents an overview of the NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild, held in conjunction with the NTIRE workshop at CVPR 2026. The goal of this challenge was to develop detection models capable of distinguishing real images from generated ones in realistic scenarios: the images are often transformed (cropped, resized, compressed, blurred) for practical us…
▽ More
This paper presents an overview of the NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild, held in conjunction with the NTIRE workshop at CVPR 2026. The goal of this challenge was to develop detection models capable of distinguishing real images from generated ones in realistic scenarios: the images are often transformed (cropped, resized, compressed, blurred) for practical usage, and therefore, the detection models should be robust to such transformations. The challenge is based on a novel dataset consisting of 108,750 real and 185,750 AI-generated images from 42 generators comprising a large variety of open-source and closed-source models of various architectures, augmented with 36 image transformations. Methods were evaluated using ROC AUC on the full test set, including both transformed and untransformed images. A total of 511 participants registered, with 20 teams submitting valid final solutions. This report provides a comprehensive overview of the challenge, describes the proposed solutions, and can be used as a valuable reference for researchers and practitioners in increasing the robustness of the detection models to real-world transformations.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
PACO: Proxy-Task Alignment and Online Calibration for On-the-Fly Category Discovery
Authors:
Weidong Tang,
Bohan Zhang,
Zhixiang Chi,
ZiZhang Wu,
Yang Wang,
Yanan Wu
Abstract:
On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference i…
▽ More
On-the-Fly Category Discovery (OCD) requires a model, trained on an offline support set, to recognize known classes while discovering new ones from an online streaming sequence. Existing methods focus heavily on offline training. They aim to learn discriminative representations on the support set so that novel classes can be separated at test time. However, their discovery mechanism at inference is typically reduced to a single threshold. We argue that this paradigm is fundamentally flawed as OCD is not a static classification problem, but a dynamic process. The model must continuously decide 1) whether a sample belongs to a known class, 2) matches an existing novel category, or 3) should initiate a new one. Moreover, prior methods treat the support set as fixed knowledge. They do not update their decision boundaries as new evidence arrives during inference. This leads to unstable and inconsistent category formation. Our experiments confirm these issues. With properly calibrated and adaptive thresholds, substantial improvements can be achieved, even without changing the representation. Motivated by this, we propose PACO, a support-set-calibrated, tree-structured online decision framework. The framework models inference as a sequence of hierarchical decisions, including known-class routing, birth-aware novel assignment, and attach-versus-create operations over a dynamic prototype memory. Furthermore, we simulate the proxy discovery process to initialize the thresholds during offline training to align with inference. Thresholds are continuously updated during inference using mature novel prototypes. Importantly, PACO requires no heavy training and no dataset-specific tuning. It can be directly integrated into existing OCD pipelines as an inference-time module. Extensive experiments show significant improvements over SOTA baselines across seven benchmarks.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering
Authors:
Zhuoyu Wu,
Wenhui Ou,
Pei-Sze Tan,
Wenqi Fang,
Sailaja Rajanala,
Raphaël C. -W. Phan
Abstract:
Retrieving procedure-oriented evidence from materials science papers is difficult because key synthesis details are often scattered across long, context-heavy documents and are not well captured by paragraph-only dense retrieval. We present RECIPER, a dual-view retrieval pipeline that indexes both paragraph-level context and compact large language model-extracted procedural summaries, then combine…
▽ More
Retrieving procedure-oriented evidence from materials science papers is difficult because key synthesis details are often scattered across long, context-heavy documents and are not well captured by paragraph-only dense retrieval. We present RECIPER, a dual-view retrieval pipeline that indexes both paragraph-level context and compact large language model-extracted procedural summaries, then combines the two candidate streams with lightweight lexical reranking. Across four dense retrieval backbones, RECIPER consistently improves early-rank retrieval over paragraph-only dense retrieval, achieving average gains of +3.73 in Recall@1, +2.85 in nDCG@10, and +3.13 in MRR. With BGE-large-en-v1.5, it reaches 86.82%, 97.07%, and 97.85% on Recall@1, Recall@5, and Recall@10, respectively. We further observe improved downstream question answering under automatic metrics, suggesting that procedural summaries can serve as a useful complementary retrieval signal for procedure-oriented materials question answering. Code and data are available at https://github.com/ReaganWu/RECIPER.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Measurement of inclusive production of charmonium states in $b$-hadron decays via their decay into $φφ$
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis,
L. An
, et al. (1173 additional authors not shown)
Abstract:
The inclusive production of the $η_c(1S)$, $η_c(2S)$ and $χ_{c}$ charmonium states in $b$-hadron decays is studied with LHCb Run~2 data, corresponding to an integrated luminosity of $5.9~\text{fb}^{-1}$, using charmonia decays to $φφ$ pairs. The production branching fractions of the $χ_{c}(1P)$ states in $b$-hadron decays are measured, using $b \to η_c(1S) (\to φφ) X$ as a normalisation channel, w…
▽ More
The inclusive production of the $η_c(1S)$, $η_c(2S)$ and $χ_{c}$ charmonium states in $b$-hadron decays is studied with LHCb Run~2 data, corresponding to an integrated luminosity of $5.9~\text{fb}^{-1}$, using charmonia decays to $φφ$ pairs. The production branching fractions of the $χ_{c}(1P)$ states in $b$-hadron decays are measured, using $b \to η_c(1S) (\to φφ) X$ as a normalisation channel, with $X$ indicating any additional particles. The results are \begin{align*}
&{\cal{B}} (b \to χ_{c0} X) = (1.34 \pm 0.13 \pm 0.06 \pm 0.37) \times 10^{-3},
&{\cal{B}} (b \to χ_{c1} X) = (1.58 \pm 0.12 \pm 0.09 \pm 0.44) \times 10^{-3},
&{\cal{B}} (b \to χ_{c2} X) = (0.55 \pm 0.08 \pm 0.05 \pm 0.15) \times 10^{-3}, \end{align*} where the first uncertainty is statistical, the second systematic and the last is due to the limited knowledge of externally measured branching fractions. The production branching fraction of $η_c(2S)$ times the branching fraction of its decay into $φφ$ is measured as ${\cal{B}} (b \to η_c(2S) X) \times {\cal{B}} (η_c(2S) \to φφ) = (4.0 \pm 0.6 \pm 0.6 \pm 1.1) \times 10^{-7}$. Furthermore, the mass of the $η_c(1S)$ state is measured to be $M_{η_c(1S)} = 2984.1 \pm 0.5 \pm 0.5$ MeV with the best precision to date.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Quantum-Gated Task-interaction Knowledge Distillation for Pre-trained Model-based Class-Incremental Learning
Authors:
Linjie Li,
Huiyu Xiao,
Jiarui Cao,
Zhenyu Wu,
Yang Ji
Abstract:
Class-incremental learning (CIL) aims to continuously accumulate knowledge from a stream of tasks and construct a unified classifier over all seen classes. Although pretrained models (PTMs) have shown promising performance in CIL, they still struggle with the entanglement of multi-task subspaces, leading to catastrophic forgetting when task routing parameters are poorly calibrated or task-level re…
▽ More
Class-incremental learning (CIL) aims to continuously accumulate knowledge from a stream of tasks and construct a unified classifier over all seen classes. Although pretrained models (PTMs) have shown promising performance in CIL, they still struggle with the entanglement of multi-task subspaces, leading to catastrophic forgetting when task routing parameters are poorly calibrated or task-level representations are rigidly fixed. To address this issue, we propose a novel Quantum-Gated Task-interaction Knowledge Distillation (QKD) framework that leverages quantum gating to guide inter-task knowledge transfer. Specifically, we introduce a quantum-gated task modulation gating mechanism to model the relational dependencies among task embedding, dynamically capturing the sample-to-task relevance for both joint training and inference across streaming tasks. Guided by the quantum gating outputs, we perform task-interaction knowledge distillation guided by these task-embedding-level correlation weights from old to new adapters, enabling the model to bridge the representation gaps between independent task subspaces. Extensive experiments demonstrate that QKD effectively mitigates forgetting and achieves state-of-the-art performance.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning
Authors:
Linjie Li,
Zhenyu Wu,
Huiyu Xiao,
Yang Ji
Abstract:
Prompt-based class-incremental learning methods typically construct a prompt pool consisting of multiple trainable key-prompts and perform instance-level matching to select the most suitable prompt embeddings, which has shown promising results. However, existing approaches face several limitations, including fixed prompt pools, manual selection of prompt embeddings, and strong reliance on the pret…
▽ More
Prompt-based class-incremental learning methods typically construct a prompt pool consisting of multiple trainable key-prompts and perform instance-level matching to select the most suitable prompt embeddings, which has shown promising results. However, existing approaches face several limitations, including fixed prompt pools, manual selection of prompt embeddings, and strong reliance on the pretrained backbone for prompt selection. To address these issues, we propose a \textbf{L}ayer-importance guided \textbf{D}ual \textbf{E}xpandable \textbf{P}rompt Pool (\textbf{LDEPrompt}), which enables adaptive layer selection as well as dynamic freezing and expansion of the prompt pool. Extensive experiments on widely used class-incremental learning benchmarks demonstrate that LDEPrompt achieves state-of-the-art performance, validating its effectiveness and scalability.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
RCBSF: A Multi-Agent Framework for Automated Contract Revision via Stackelberg Game
Authors:
Shijia Xu,
Yu Wang,
Xiaolong Jia,
Zhou Wu,
Kai Liu,
April Xiaowen Dong
Abstract:
Despite the widespread adoption of Large Language Models (LLMs) in Legal AI, their utility for automated contract revision remains impeded by hallucinated safety and a lack of rigorous behavioral constraints. To address these limitations, we propose the Risk-Constrained Bilevel Stackelberg Framework (RCBSF), which formulates revision as a non-cooperative Stackelberg game. RCBSF establishes a hiera…
▽ More
Despite the widespread adoption of Large Language Models (LLMs) in Legal AI, their utility for automated contract revision remains impeded by hallucinated safety and a lack of rigorous behavioral constraints. To address these limitations, we propose the Risk-Constrained Bilevel Stackelberg Framework (RCBSF), which formulates revision as a non-cooperative Stackelberg game. RCBSF establishes a hierarchical Leader Follower structure where a Global Prescriptive Agent (GPA) imposes risk budgets upon a follower system constituted by a Constrained Revision Agent (CRA) and a Local Verification Agent (LVA) to iteratively optimize output. We provide theoretical guarantees that this bilevel formulation converges to an equilibrium yielding strictly superior utility over unguided configurations. Empirical validation on a unified benchmark demonstrates that RCBSF achieves state-of-the-art performance, surpassing iterative baselines with an average Risk Resolution Rate (RRR) of 84.21\% while enhancing token efficiency. Our code is available at https://github.com/xjiacs/RCBSF .
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS
Authors:
Shijia Xu,
Zhou Wu,
Xiaolong Jia,
Yu Wang,
Kai Liu,
April Xiaowen Dong
Abstract:
Retrieval-augmented generation (RAG) substantially extends the knowledge boundary of large language models. However, it still faces two major challenges when handling complex reasoning tasks: low context utilization and frequent hallucinations. To address these issues, we propose Self-Correcting RAG, a unified framework that reformulates retrieval and generation as constrained optimization and pat…
▽ More
Retrieval-augmented generation (RAG) substantially extends the knowledge boundary of large language models. However, it still faces two major challenges when handling complex reasoning tasks: low context utilization and frequent hallucinations. To address these issues, we propose Self-Correcting RAG, a unified framework that reformulates retrieval and generation as constrained optimization and path planning. On the input side, we move beyond traditional greedy retrieval and, for the first time, formalize context selection as a multi-dimensional multiple-choice knapsack problem (MMKP), thereby maximizing information density and removing redundancy under a strict token budget. On the output side, we introduce a natural language inference (NLI)-guided Monte Carlo Tree Search (MCTS) mechanism, which leverages test-time compute to dynamically explore reasoning trajectories and validate the faithfulness of generated answers. Experiments on six multi-hop question answering and fact-checking datasets demonstrate that our method significantly improves reasoning accuracy on complex queries while effectively reducing hallucinations, outperforming strong existing baselines.Our code is available at https://github.com/xjiacs/Self-Correcting-RAG .
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
NTIRE 2026 The Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
Authors:
Xin Li,
Yeying Jin,
Suhang Yao,
Beibei Lin,
Zhaoxin Fan,
Wending Yan,
Xin Jin,
Zongwei Wu,
Bingchen Li,
Peishu Shi,
Yufei Yang,
Yu Li,
Zhibo Chen,
Bihan Wen,
Robby T. Tan,
Radu Timofte,
Runzhe Li,
Kui Jiang,
Zhaocheng Yu,
Yiang Chen,
Junjun Jiang,
Xianming Liu,
Hongde Gu,
Zeliang Li,
Mache You
, et al. (73 additional authors not shown)
Abstract:
This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for train…
▽ More
This paper presents an overview of the NTIRE 2026 Second Challenge on Day and Night Raindrop Removal for Dual-Focused Images. Building upon the success of the first edition, this challenge attracted a wide range of impressive solutions, all developed and evaluated on our real-world Raindrop Clarity dataset~\cite{jin2024raindrop}. For this edition, we adjust the dataset with 14,139 images for training, 407 images for validation, and 593 images for testing. The primary goal of this challenge is to establish a strong and practical benchmark for the removal of raindrops under various illumination and focus conditions. In total, 168 teams have registered for the competition, and 17 teams submitted valid final solutions and fact sheets for the testing phase. The submitted methods achieved strong performance on the Raindrop Clarity dataset, demonstrating the growing progress in this challenging task.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
The Second Challenge on Real-World Face Restoration at NTIRE 2026: Methods and Results
Authors:
Jingkai Wang,
Jue Gong,
Zheng Chen,
Kai Liu,
Jiatong Li,
Yulun Zhang,
Radu Timofte,
Jiachen Tu,
Yaokun Shi,
Guoyi Xu,
Yaoxin Jiang,
Jiajia Liu,
Yingsi Chen,
Yijiao Liu,
Hui Li,
Yu Wang,
Congchao Zhu,
Alexandru-Gabriel Lefterache,
Anamaria Radoi,
Chuanyue Yan,
Tao Lu,
Yanduo Zhang,
Kanghui Zhao,
Jiaming Wang,
Yuqi Li
, et al. (28 additional authors not shown)
Abstract:
This paper provides a review of the NTIRE 2026 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural and realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources…
▽ More
This paper provides a review of the NTIRE 2026 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural and realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources or training data. Performance is evaluated using a weighted image quality assessment (IQA) score and employs the AdaFace model as an identity checker. The competition attracted 96 registrants, with 10 teams submitting valid models; ultimately, 9 teams achieved valid scores in the final ranking. This collaborative effort advances the performance of real-world face restoration while offering an in-depth overview of the latest trends in the field.
△ Less
Submitted 15 April, 2026; v1 submitted 12 April, 2026;
originally announced April 2026.
-
Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (741 additional authors not shown)
Abstract:
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$,…
▽ More
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$, $\mathcal{B}(χ_{c1} \to π^{+}π^{-}π^{0}π^{0}) = (1.16 \pm 0.01 \pm 0.05) \times 10^{-2}$, and $\mathcal{B}(χ_{c2} \to π^{+}π^{-}π^{0}π^{0}) = (1.92 \pm 0.01 \pm 0.08) \times 10^{-2}$, where the first uncertainties are statistical and the second systematic. The dominant intermediate states are found to be $χ_{cJ}\toρ^+ρ^-$. These results supersede the previous most precise measurements and provide significantly improved precision.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
First Observation of \boldmath{$D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$} Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (734 additional authors not shown)
Abstract:
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measure…
▽ More
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measured to be $(3.20\pm0.06_{\text{stat.}}\pm0.03_{\text{syst.}})\times 10^{-3}$ and $(2.43 \pm 0.11_{\text{stat.}} \pm 0.04_{\text{syst.}}) \times 10^{-3}$, respectively. % , both achieving three times better precision than the current PDG values. The decay process $D^{+}\to a_0(980)^{+}f_0(500)$ is observed for the first time with an unexpectedly large branching fraction. Moreover, we observe the decays $D^+ \to a_0(980)^{+(0)} ρ(770)^{0(+)}$ and measure the ratio $r_{+/0} \equiv \frac{\mathcal{B}(D^+ \to a_0(980)^+ ρ(770)^0)}{\mathcal{B}(D^+ \to a_0(980)^0 ρ(770)^+)}$ for the first time to be $0.55\pm0.08_{\text{stat.}}\pm0.05_{\text{syst.}}$. These results offer a novel insight into our comprehension of the nature of the $a_0(980)$ and $f_0(500)$ states.
△ Less
Submitted 15 April, 2026; v1 submitted 11 April, 2026;
originally announced April 2026.
-
Radiology Report Generation for Low-Quality X-Ray Images
Authors:
Hongze Zhu,
Chen Hu,
Jiaxuan Jiang,
Hong Liu,
Yawen Huang,
Ming Hu,
Tianyu Wang,
Zhijian Wu,
Yefeng Zheng
Abstract:
Vision-Language Models (VLMs) have significantly advanced automated Radiology Report Generation (RRG). However, existing methods implicitly assume high-quality inputs, overlooking the noise and artifacts prevalent in real-world clinical environments. Consequently, current models exhibit severe performance degradation when processing suboptimal images. To bridge this gap, we propose a robust report…
▽ More
Vision-Language Models (VLMs) have significantly advanced automated Radiology Report Generation (RRG). However, existing methods implicitly assume high-quality inputs, overlooking the noise and artifacts prevalent in real-world clinical environments. Consequently, current models exhibit severe performance degradation when processing suboptimal images. To bridge this gap, we propose a robust report generation framework explicitly designed for image quality variations. We first introduce an Automated Quality Assessment Agent (AQAA) to identify low-quality samples within the MIMIC-CXR dataset and establish the Low-quality Radiology Report Generation (LRRG) benchmark. To tackle degradation-induced shifts, we propose a novel Dual-loop Training Strategy leveraging bi-level optimization and gradient consistency. This approach ensures the model learns quality-agnostic diagnostic features by aligning gradient directions across varying quality regimes. Extensive experiments demonstrate that our approach effectively mitigates model performance degradation caused by image quality deterioration. The code and data will be released upon acceptance.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Non-solvable groups whose non-linear character degrees have the same number of different prime divisors
Authors:
Junying Guo,
Yanjun Liu,
Ziyi Wu,
Di Xiao
Abstract:
By a result of Noritzsch, a finite solvable group whose non-linear character degrees have the same set of prime divisors is meta-abelian. In this note we investigate finite non-solvable groups whose non-linear character degrees have the same number of different prime divisors, and show that up to an abelian direct factor, such groups are exactly $L_2(4), L_2(8), A_7, S_7$, the central product of a…
▽ More
By a result of Noritzsch, a finite solvable group whose non-linear character degrees have the same set of prime divisors is meta-abelian. In this note we investigate finite non-solvable groups whose non-linear character degrees have the same number of different prime divisors, and show that up to an abelian direct factor, such groups are exactly $L_2(4), L_2(8), A_7, S_7$, the central product of a cyclic $3$-group with $3.A_7$, or the semi-direct product of $A_7$ by a cyclic $2$-group $\langle a\rangle$ such that $a$ non-trivially acts on $A_7$ by conjugation. As consequence, we show that only the primes $2,3,5,7$ may occur as prime divisors of their irreducible character degrees, and that Huppert's $ρ$-$σ$ conjecture holds for them.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
DINO_4D: Semantic-Aware 4D Reconstruction
Authors:
Yiru Yang,
Zhuojie Wu,
Quentin Marguet,
Nishant Kumar Singh,
Max Schulthess
Abstract:
In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO\_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic…
▽ More
In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO\_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic tracking. Experiments on the Point Odyssey and TUM-Dynamics benchmarks demonstrate that our method maintains the linear time complexity $O(T)$ of its predecessors while significantly improving Tracking Accuracy (APD) and Reconstruction Completeness. DINO\_4D establishes a new paradigm for constructing 4D World Models that possess both geometric precision and semantic understanding.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
Galactic Archaeology with the Subaru `Ōnohi`ula Prime Focus Spectrograph Strategic Program
Authors:
Masashi Chiba,
Rosemary F. G. Wyse,
Evan N. Kirby,
Judith G. Cohen,
László Dobos,
Roman Gerasimov,
Miho N. Ishigaki,
Kohei Hayashi,
Carrie Filion,
Magda Arnaboldi,
Souradeep Bhattacharya,
Yutaka Hirai,
Chiaki Kobayashi,
Yutaka Komiyama,
Pete B. Kuzma,
Itsuki Ogami,
Ana L. Chies-Santos,
Nicole L. Klock-Miranda,
Federico Sestito,
Tamás Budavári,
Andrew P. Cooper,
Keyi Ding,
Ivanna Escala,
Elisa G. M. Ferreira,
Ortwin Gerhard
, et al. (24 additional authors not shown)
Abstract:
The recently commissioned Subaru `Ōnohi`ula Prime Focus Spectrograph (PFS) will obtain spectra from nearly 2,400 fibers that cover 1.24 square degrees. The 360 night Subaru Strategic Program for PFS is dedicating approximately one-third of its allocation (130 nights) to study the structure and evolution of galaxies in the Local Group. This Galactic Archaeological survey has three pillars. (1) We w…
▽ More
The recently commissioned Subaru `Ōnohi`ula Prime Focus Spectrograph (PFS) will obtain spectra from nearly 2,400 fibers that cover 1.24 square degrees. The 360 night Subaru Strategic Program for PFS is dedicating approximately one-third of its allocation (130 nights) to study the structure and evolution of galaxies in the Local Group. This Galactic Archaeological survey has three pillars. (1) We will determine whether the mass density profiles of dwarf galaxies are consistent with cusps, as expected for cold dark matter, or cores, as expected from alternative dark matter theories or baryonic feedback. We will deduce the density profiles as a function of radius from modeling of the full line-of-sight velocity and abundance distributions for six dwarf galaxies. Our total sample will consist of 18,000 member stars to beyond the nominal tidal radius of each system. (2) From measurements of the [alpha/Fe] abundance ratio, we will learn the difference in assembly history of the two most massive galaxies in the Local Group: M31 and the Milky Way. We will observe 30,000 member stars over 45 square degrees of M31's halo and outer disk. (3) We will uncover how the most fragile (outer) part of the Milky Way responded to accretion events both in the distant past (such as Gaia-Sausage Enceladus) and in more recent history (such as the Sagittarius dwarf spheroidal galaxy). To support this study, PFS will provide velocities and metallicities--from which, in combination with photometry, we will deduce ages--for tens of thousands of main-sequence stars out to a Galactocentric distance of ~30 kpc.
△ Less
Submitted 15 April, 2026; v1 submitted 10 April, 2026;
originally announced April 2026.
-
Simulating Organized Group Behavior: New Framework, Benchmark, and Analysis
Authors:
Xinkai Zou,
Yiming Huang,
Zhuohang Wu,
Jian Sha,
Nan Huang,
Longfei Yun,
Jingbo Shang,
Letian Peng
Abstract:
Simulating how organized groups (e.g., corporations) make decisions (e.g., responding to a competitor's move) is essential for understanding real-world dynamics and could benefit relevant applications (e.g., market prediction). In this paper, we formalize this problem as a concrete research platform for group behavior understanding, providing: (1) a task definition with benchmark and evaluation cr…
▽ More
Simulating how organized groups (e.g., corporations) make decisions (e.g., responding to a competitor's move) is essential for understanding real-world dynamics and could benefit relevant applications (e.g., market prediction). In this paper, we formalize this problem as a concrete research platform for group behavior understanding, providing: (1) a task definition with benchmark and evaluation criteria, (2) a structured analytical framework with a corresponding algorithm, and (3) detailed temporal and cross-group analysis. Specifically, we propose Organized Group Behavior Simulation, a task that models organized groups as collective entities from a practical perspective: given a group facing a particular situation (e.g., AI Boom), predict the decision it would take. To support this task, we present GROVE (GRoup Organizational BehaVior Evaluation), a benchmark covering 44 entities with 8,052 real-world context-decision pairs collected from Wikipedia and TechCrunch across 9 domains, with an end-to-end evaluation protocol assessing consistency, initiative, scope, magnitude, and horizon. Beyond straightforward prompting pipelines, we propose a structured analytical framework that converts collective decision-making events into an interpretable, adaptive, and traceable behavioral model, achieving stronger performance than summarization- and retrieval-based baselines. It further introduces an adapter mechanism for time-aware evolution and group-aware transfer, and traceable evidence nodes grounding each decision rule in originating historical events. Our analysis reveals temporal behavioral drift within individual groups, which the time-aware adapter effectively captures for stronger prediction, and structured cross-group similarity that enables knowledge transfer for data-scarce organizations.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
GeRM: A Generative Rendering Model From Physically Realistic to Photorealistic
Authors:
Jiayuan Lu,
Rengan Xie,
Xuancheng Jin,
Zhizhen Wu,
Qi Ye,
Tian Xie,
Hujun Bao,
Rui Wang. Yuchi Huo
Abstract:
For decades, Physically-Based Rendering (PBR) is the fundation of synthesizing photorealisitic images, and therefore sometimes roughly referred as Photorealistic Rendering (PRR). While PBR is indeed a mathematical simulation of light transport that guarantees physical reality, photorealism has additional reliance on the realistic digital model of geometry and appearance of the real world, leaving…
▽ More
For decades, Physically-Based Rendering (PBR) is the fundation of synthesizing photorealisitic images, and therefore sometimes roughly referred as Photorealistic Rendering (PRR). While PBR is indeed a mathematical simulation of light transport that guarantees physical reality, photorealism has additional reliance on the realistic digital model of geometry and appearance of the real world, leaving a barely explored gap from PBR to PRR (P2P). Consequently, the path toward photorealism faces a critical dilemma: the explicit simulation of PRR encumbered by unreachable realistic digital models for real-world existence, while implicit generation models sacrifice controllability and geometric consistency. Based on this insight, this paper presents the problem, data, and approach of mitigating P2P gap, followed by the first multi-modal generative rendering model, dubbed GeRM, to unify PBR and PRR. GeRM integrates physical attributes like G-buffers with text prompts, and progressive incremental injection to generate controllable photorealistic images, allowing users to fluidly navigate the continuum between strict physical fidelity and perceptual photorealism. Technically, we model the transition between PBR and PRR images as a distribution transfer and aim to learn a distribution transfer vector field (DTV Field) to guide this process. To define the learning objective, we first leverage a multi-agent VLM framework to construct an expert-guided pairwise P2P transfer dataset, named P2P-50K, where each paired sample in the dataset corresponds to a transfer vector in the DTV Field. Subsequently, we propose a multi-condition ControlNet to learn the DTV Field, which synthesizes PBR images and progressively transitions them into PRR images, guided by G-buffers, text prompts, and cues for enhanced regions.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation
Authors:
Haoyu Zhao,
Zihao Zhang,
Jiaxi Gu,
Haoran Chen,
Qingping Zheng,
Pin Tang,
Yeyin Jin,
Yuang Zhang,
Junqi Cheng,
Zenghui Lu,
Peng Shu,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Camera-controllable video generation aims to synthesize videos with flexible and physically plausible camera movements. However, existing methods either provide imprecise camera control from text prompts or rely on labor-intensive manual camera trajectory parameters, limiting their use in automated scenarios. To address these issues, we propose a novel Vision-Language-Camera model, termed CT-1 (Ca…
▽ More
Camera-controllable video generation aims to synthesize videos with flexible and physically plausible camera movements. However, existing methods either provide imprecise camera control from text prompts or rely on labor-intensive manual camera trajectory parameters, limiting their use in automated scenarios. To address these issues, we propose a novel Vision-Language-Camera model, termed CT-1 (Camera Transformer 1), a specialized model designed to transfer spatial reasoning knowledge to video generation by accurately estimating camera trajectories. Built upon vision-language modules and a Diffusion Transformer model, CT-1 employs a Wavelet-based Regularization Loss in the frequency domain to effectively learn complex camera trajectory distributions. These trajectories are integrated into a video diffusion model to enable spatially aware camera control that aligns with user intentions. To facilitate the training of CT-1, we design a dedicated data curation pipeline and construct CT-200K, a large-scale dataset containing over 47M frames. Experimental results demonstrate that our framework successfully bridges the gap between spatial reasoning and video synthesis, yielding faithful and high-quality camera-controllable videos and improving camera control accuracy by 25.7% over prior methods.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
Degradation-Robust Fusion: An Efficient Degradation-Aware Diffusion Framework for Multimodal Image Fusion in Arbitrary Degradation Scenarios
Authors:
Yu Shi,
Yu Liu,
Zhong-Cheng Wu,
Juan Cheng,
Huafeng Li,
Xun Chen
Abstract:
Complex degradations like noise, blur, and low resolution are typical challenges in real world image fusion tasks, limiting the performance and practicality of existing methods. End to end neural network based approaches are generally simple to design and highly efficient in inference, but their black-box nature leads to limited interpretability. Diffusion based methods alleviate this to some exte…
▽ More
Complex degradations like noise, blur, and low resolution are typical challenges in real world image fusion tasks, limiting the performance and practicality of existing methods. End to end neural network based approaches are generally simple to design and highly efficient in inference, but their black-box nature leads to limited interpretability. Diffusion based methods alleviate this to some extent by providing powerful generative priors and a more structured inference process. However, they are trained to learn a single domain target distribution, whereas fusion lacks natural fused data and relies on modeling complementary information from multiple sources, making diffusion hard to apply directly in practice. To address these challenges, this paper proposes an efficient degradation aware diffusion framework for image fusion under arbitrary degradation scenarios. Specifically, instead of explicitly predicting noise as in conventional diffusion models, our method performs implicit denoising by directly regressing the fused image, enabling flexible adaptation to diverse fusion tasks under complex degradations with limited steps. Moreover, we design a joint observation model correction mechanism that simultaneously imposes degradation and fusion constraints during sampling to ensure high reconstruction accuracy. Experiments on diverse fusion tasks and degradation configurations demonstrate the superiority of the proposed method under complex degradation scenarios.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Test of lepton flavour universality with $B^0\to K^{*0}\ell^+\ell^-$ decays at large dilepton invariant mass
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1113 additional authors not shown)
Abstract:
Muon-electron universality is tested in $B^0 \to K^{*0} \ \ell^+ \ell^-$ decays, in the dilepton-invariant-mass region above the $ψ(2S)$ resonance. The analysis uses beauty mesons produced in proton-proton collisions recorded by the LHCb detector at center-of-mass energies of 7, 8, and 13 $\text{TeV}$, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$. The ratio of branching fraction…
▽ More
Muon-electron universality is tested in $B^0 \to K^{*0} \ \ell^+ \ell^-$ decays, in the dilepton-invariant-mass region above the $ψ(2S)$ resonance. The analysis uses beauty mesons produced in proton-proton collisions recorded by the LHCb detector at center-of-mass energies of 7, 8, and 13 $\text{TeV}$, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$. The ratio of branching fractions between the muon and electron channels, $R_{K^{*0}}$, is measured to be $1.08\,^{+0.14}_{-0.12}\text{(stat)} \ \pm 0.07\text{(syst)}$ for a dilepton-invariant-mass squared above 14.0 $\text{GeV}^{2}/\text{c}^{4}$, consistent with the Standard Model prediction. This result represents the most precise measurement of $R_{K^{*0}}$ in this region and the first such measurement performed at a hadron collider.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Bridging the Gap between Micro-scale Traffic Simulation and 4D Digital Cityscapes
Authors:
Longxiang Jiao,
Lukas Hofmann,
Yiru Yang,
Zhanyi Wu,
Jonas Egeler
Abstract:
While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photorealistic, geospatially accurate VR representation of Zurich in Unreal Engine 5. Our…
▽ More
While micro-scale traffic simulations provide essential data for urban planning, they are rarely coupled with the high-fidelity visualization or auralization necessary for effective stakeholder communication. In this work, we present a real-time 4D visualization framework that couples the SUMO traffic with a photorealistic, geospatially accurate VR representation of Zurich in Unreal Engine 5. Our architecture implements a robust C++ data pipeline for synchronized vehicle visualization and features an Open Sound Control (OSC) interface to support external auralization engines. We validate the framework through a user study assessing the correlation between simulated traffic dynamics and human perception. Results demonstrate a high degree of perceptual alignment, where users correctly interpret safety risks from the 4D simulation. Furthermore, our findings indicate that the inclusion of spatialized audio alters the user's sense of safety, showing the importance of multimodality in traffic simulations.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Search for the lepton-flavour violating decays $B^+ \to π^+ μ^\pm e^\mp$
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
S. Amato,
J. L. Amey,
Y. Amhis,
L. An,
L. Anderlini
, et al. (1105 additional authors not shown)
Abstract:
The first search for the lepton-flavour violating decays $B^+ \to π^+ μ^{\pm} e^{\mp}$ in proton-proton collisions is presented, using data collected by the LHCb experiment between 2011 and 2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signal is observed and an upper limit on the branching fraction is set at…
▽ More
The first search for the lepton-flavour violating decays $B^+ \to π^+ μ^{\pm} e^{\mp}$ in proton-proton collisions is presented, using data collected by the LHCb experiment between 2011 and 2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signal is observed and an upper limit on the branching fraction is set at $\mathcal{B}(B^+ \to π^+ μ^{\pm} e^{\mp}) < 1.8 \times 10^{-9}$ at the $90\%$ confidence level, two orders of magnitude more restrictive than the current world average. This is the first constraint on lepton-flavour violating $b \to d$ quark transitions at the LHC and also sets the most stringent upper limits to date on $b \to d μ^{\pm} e^{\mp}$ transitions. Limits on left-handed and scalar scenarios beyond the Standard Model are also reported.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Towards Identification and Intervention of Safety-Critical Parameters in Large Language Models
Authors:
Weiwei Qi,
Zefeng Wu,
Tianhang Zheng,
Zikang Zhang,
Xiaojun Jia,
Zhan Qin,
Kui Ren
Abstract:
Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of precise and reliable methodologies for safety intervention across diverse tasks. To better understand and control LLM safety, we propose the Expected Safety Impact (ESI) framework for quantifying how different parameters affect LLM safety. Based on ESI, we…
▽ More
Ensuring Large Language Model (LLM) safety is crucial, yet the lack of a clear understanding about safety mechanisms hinders the development of precise and reliable methodologies for safety intervention across diverse tasks. To better understand and control LLM safety, we propose the Expected Safety Impact (ESI) framework for quantifying how different parameters affect LLM safety. Based on ESI, we reveal distinct safety-critical patterns across different LLM architectures: In dense LLMs, many safety-critical parameters are located in value matrices (V) and MLPs in middle layers, whereas in Mixture-of-Experts (MoE) models, they shift to the late-layer MLPs. Leveraging ESI, we further introduce two targeted intervention paradigms for safety enhancement and preservation, i.e., Safety Enhancement Tuning (SET) and Safety Preserving Adaptation (SPA). SET can align unsafe LLMs by updating only a few safety-critical parameters, effectively enhancing safety while preserving original performance. SPA safeguards well-aligned LLMs during capability-oriented intervention (e.g., instruction tuning) by preventing disruption of safety-critical weights, allowing the LLM to acquire new abilities and maintain safety capabilities. Extensive evaluations on different LLMs demonstrate that SET can reduce the attack success rates of unaligned LLMs by over 50% with only a 100-iteration update on 1% of model weights. SPA can limit the safety degradation of aligned LLMs within 1% after a 1,000-iteration instruction fine-tuning on different tasks. Our code is available at: https://github.com/ZJU-LLM-Safety/SafeWeights-ACL.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
ADAG: Automatically Describing Attribution Graphs
Authors:
Aryaman Arora,
Zhengxuan Wu,
Jacob Steinhardt,
Sarah Schwettmann
Abstract:
In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular output and how they affected each other, with the goal of explaining the computations underlying some behaviour. However, all prior circuit tracing work has relied on ad-hoc human interpretation of the role that each feature in the circuit plays, via m…
▽ More
In language model interpretability research, \textbf{circuit tracing} aims to identify which internal features causally contributed to a particular output and how they affected each other, with the goal of explaining the computations underlying some behaviour. However, all prior circuit tracing work has relied on ad-hoc human interpretation of the role that each feature in the circuit plays, via manual inspection of data artifacts such as the dataset examples the component activates on. We introduce \textbf{ADAG}, an end-to-end pipeline for describing these attribution graphs which is fully automated. To achieve this, we introduce \textit{attribution profiles} which quantify the functional role of a feature via its input and output gradient effects. We then introduce a novel clustering algorithm for grouping features, and an LLM explainer--simulator setup which generates and scores natural-language explanations of the functional role of these feature groups. We run our system on known human-analysed circuit-tracing tasks and recover interpretable circuits, and further show ADAG can find steerable clusters which are responsible for a harmful advice jailbreak in Llama 3.1 8B Instruct.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Improving Search Suggestions for Alphanumeric Queries
Authors:
Samarth Agrawal,
Jayanth Yetukuri,
Diptesh Kanojia,
Qunzhi Zhou,
Zhe Wu
Abstract:
Alphanumeric identifiers such as manufacturer part numbers (MPNs), SKUs, and model codes are ubiquitous in e-commerce catalogs and search. These identifiers are sparse, non linguistic, and highly sensitive to tokenization and typographical variation, rendering conventional lexical and embedding based retrieval methods ineffective. We propose a training free, character level retrieval framework tha…
▽ More
Alphanumeric identifiers such as manufacturer part numbers (MPNs), SKUs, and model codes are ubiquitous in e-commerce catalogs and search. These identifiers are sparse, non linguistic, and highly sensitive to tokenization and typographical variation, rendering conventional lexical and embedding based retrieval methods ineffective. We propose a training free, character level retrieval framework that encodes each alphanumeric sequence as a fixed length binary vector. This representation enables efficient similarity computation via Hamming distance and supports nearest neighbor retrieval over large identifier corpora. An optional re-ranking stage using edit distance refines precision while preserving latency guarantees. The method offers a practical and interpretable alternative to learned dense retrieval models, making it suitable for production deployment in search suggestion generation systems. Significant gains in business metrics in the A/B test further prove utility of our approach.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
Learning to Search: A Decision-Based Agent for Knowledge-Based Visual Question Answering
Authors:
Zhuohong Chen,
Zhenxian Wu,
Yunyao Yu,
Hangrui Xu,
Zirui Liao,
Zhifang Liu,
Xiangwen Deng,
Pen Jiao,
Haoqian Wang
Abstract:
Knowledge-based visual question answering (KB-VQA) requires vision-language models to understand images and use external knowledge, especially for rare entities and long-tail facts. Most existing retrieval-augmented generation (RAG) methods adopt a fixed pipeline that sequentially retrieves information, filters it, and then produces an answer. Such a design makes it difficult to adapt to diverse q…
▽ More
Knowledge-based visual question answering (KB-VQA) requires vision-language models to understand images and use external knowledge, especially for rare entities and long-tail facts. Most existing retrieval-augmented generation (RAG) methods adopt a fixed pipeline that sequentially retrieves information, filters it, and then produces an answer. Such a design makes it difficult to adapt to diverse question types. Moreover, it separates retrieval from reasoning, making it hard for the model to decide when to search, how to refine queries, or when to stop. As a result, the retrieved evidence is often poorly aligned with the question. To address these limitations, we reformulate KB-VQA as a search-agent problem and model the solving process as a multi-step decision-making procedure. At each step, the agent selects one of four actions-Answer, Image Retrieval, Text Retrieval, and Caption-based on its current information state. We further design an automated pipeline to collect multi-step trajectories that record the agent's reasoning process, tool usage, and intermediate decisions. These trajectories are then used as supervision for fine-tuning. Experiments on InfoSeek and E-VQA demonstrate that our method achieves state-of-the-art performance, consistently outperforming prior baselines and confirming the effectiveness of our framework.
△ Less
Submitted 9 April, 2026; v1 submitted 8 April, 2026;
originally announced April 2026.