-
Trajectory Planning for a Multi-UAV Rigid-Payload Cascaded Transportation System Based on Enhanced Tube-RRT*
Authors:
Jianqiao Yu,
Jia Li,
Tianhua Gao
Abstract:
This paper presents a two-stage trajectory planning framework for a multi-UAV rigid-payload cascaded transportation system, aiming to address planning challenges in densely cluttered environments. In Stage I, an Enhanced Tube-RRT* algorithm is developed by integrating active hybrid sampling and an adaptive expansion strategy, enabling rapid generation of a safe and feasible virtual tube in environ…
▽ More
This paper presents a two-stage trajectory planning framework for a multi-UAV rigid-payload cascaded transportation system, aiming to address planning challenges in densely cluttered environments. In Stage I, an Enhanced Tube-RRT* algorithm is developed by integrating active hybrid sampling and an adaptive expansion strategy, enabling rapid generation of a safe and feasible virtual tube in environments with dense obstacles. Moreover, a trajectory smoothness cost is explicitly incorporated into the edge cost to reduce excessive turns and thereby mitigate cable-induced oscillations. Simulation results demonstrate that the proposed Enhanced Tube-RRT* achieves a higher success rate and effective sampling rate than mixed-sampling Tube-RRT* (STube-RRT*) and adaptive-extension Tube-RRT* (AETube-RRT*), while producing a shorter optimal path with a smaller cumulative turning angle. In Stage II, a convex quadratic program is formulated by considering payload translational and rotational dynamics, cable tension constraints, and collision-safety constraints, yielding a smooth, collision-free desired payload trajectory. Finally, a centralized geometric control scheme is applied to the cascaded system to validate the effectiveness and feasibility of the proposed planning framework, offering a practical solution for payload attitude maneuvering in densely cluttered environments.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
Study of the $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ decay
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1111 additional authors not shown)
Abstract:
The decay $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ is studied at LHCb for the first time using proton-proton collision data recorded by the LHCb experiment at a center-of-mass energy of $\sqrt{s} = 13$ TeV, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. The branching ratio relative to the decay $B^+ \to Λ_c^+ \barΛ_c^- K^+$ is measured to be…
▽ More
The decay $B^0 \to Λ_c^+ \barΛ_c^- K_S^0$ is studied at LHCb for the first time using proton-proton collision data recorded by the LHCb experiment at a center-of-mass energy of $\sqrt{s} = 13$ TeV, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. The branching ratio relative to the decay $B^+ \to Λ_c^+ \barΛ_c^- K^+$ is measured to be
$$ \frac{{\cal B}(B^0 \to Λ_c^+ \barΛ_c^- K_S^0)}{{\cal B}(B^+ \to Λ_c^+ \barΛ_c^- K^+)} = 0.53 \pm 0.05 \pm 0.05, $$ where the first uncertainty is statistical and the second is systematic. Evidence is found for contributions from two resonant states, $Ξ_c(2923)^+$ and $Ξ_c(2939)^+$, in the $Λ_c^+ K_S^0$ system. The two states show a significance of $3.9σ$ relative to the nonresonant hypothesis. These two $Ξ_c^+$ states are consistent with being the isospin partners of the states observed in $Λ_c^+ K^-$ system.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
Authors:
Team HY-World,
Chenjie Cao,
Xuhui Zuo,
Zhenwei Wang,
Yisu Zhang,
Junta Wu,
Zhenyang Liu,
Yuning Gong,
Yang Liu,
Bo Yuan,
Chao Zhang,
Coopers Li,
Dongyuan Guo,
Fan Yang,
Haiyu Zhang,
Hang Cao,
Jianchen Zhu,
Jiaxin Lin,
Jie Xiao,
Jihong Zhang,
Junlin Yu,
Lei Wang,
Lifu Wang,
Lilin Wang,
Linus
, et al. (20 additional authors not shown)
Abstract:
We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the model performs world generation, synthesizing high-fidelity, navigable 3D Gaussian…
▽ More
We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the model performs world generation, synthesizing high-fidelity, navigable 3D Gaussian Splatting (3DGS) scenes. This is achieved through a four-stage method: a) Panorama Generation with HY-Pano 2.0, b) Trajectory Planning with WorldNav, c) World Expansion with WorldStereo 2.0, and d) World Composition with WorldMirror 2.0. Specifically, we introduce key innovations to enhance panorama fidelity, enable 3D scene understanding and planning, and upgrade WorldStereo, our keyframe-based view generation model with consistent memory. We also upgrade WorldMirror, a feed-forward model for universal 3D prediction, by refining model architecture and learning strategy, enabling world reconstruction from multi-view images or videos. Also, we introduce WorldLens, a high-performance 3DGS rendering platform featuring a flexible engine-agnostic architecture, automatic IBL lighting, efficient collision detection, and training-rendering co-design, enabling interactive exploration of 3D worlds with character support. Extensive experiments demonstrate that HY-World 2.0 achieves state-of-the-art performance on several benchmarks among open-source approaches, delivering results comparable to the closed-source model Marble. We release all model weights, code, and technical details to facilitate reproducibility and support further research on 3D world models.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Measurement of the $W$-boson production cross-sections in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1112 additional authors not shown)
Abstract:
A precision measurement of the $W$-boson production cross-section is performed using the $W \to μν$ decay channel, based on a sample of proton-proton collision data collected by the LHCb experiment at $\sqrt{s}$ = 13 TeV and corresponding to an integrated luminosity of 5.1 $fb^{-1}$. The cross-section is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2…
▽ More
A precision measurement of the $W$-boson production cross-section is performed using the $W \to μν$ decay channel, based on a sample of proton-proton collision data collected by the LHCb experiment at $\sqrt{s}$ = 13 TeV and corresponding to an integrated luminosity of 5.1 $fb^{-1}$. The cross-section is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. The integrated production cross-sections of $W$ bosons are measured to be $$ \begin{array}{lcl} σ_{W^+ \to μ^+ν} &=& 1754.2 \pm 1.5 \pm 11.9 \pm 35.1\text{ pb} \\ σ_{W^- \to μ^-\barν} &=& 1178.1 \pm 1.3 \pm 9.7 \pm 23.6\text{ pb} \end{array} $$ where uncertainties are statistical, systematic, and due to the luminosity determination, respectively. Results are in good agreement with theoretical predictions at next-to-next-to-leading order in perturbative quantum chromodynamics. This measurement is significantly more precise than previous results in this kinematic regime.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Hypergraph-State Collaborative Reasoning for Multi-Object Tracking
Authors:
Zikai Song,
Junqing Yu,
Yi-Ping Phoebe Chen,
Wei Yang,
Xinchao Wang
Abstract:
Motion reasoning serves as the cornerstone of multi-object tracking (MOT), as it enables consistent association of targets across frames. However, existing motion estimation approaches face two major limitations: (1) instability caused by noisy or probabilistic predictions, and (2) vulnerability under occlusion, where trajectories often fragment once visual cues disappear. To overcome these issues…
▽ More
Motion reasoning serves as the cornerstone of multi-object tracking (MOT), as it enables consistent association of targets across frames. However, existing motion estimation approaches face two major limitations: (1) instability caused by noisy or probabilistic predictions, and (2) vulnerability under occlusion, where trajectories often fragment once visual cues disappear. To overcome these issues, we propose a collaborative reasoning framework that enhances motion estimation through joint inference among multiple correlated objects. By allowing objects with similar motion states to mutually constrain and refine each other, our framework stabilizes noisy trajectories and infers plausible motion continuity even when target is occluded. To realize this concept, we design HyperSSM, an architecture that integrates Hypergraph computation and a State Space Model (SSM) for unified spatial-temporal reasoning. The Hypergraph module captures spatial motion correlations through dynamic hyperedges, while the SSM enforces temporal smoothness via structured state transitions. This synergistic design enables simultaneous optimization of spatial consensus and temporal coherence, resulting in robust and stable motion estimation. Extensive experiments on four mainstream and diverse benchmarks(MOT17, MOT20, DanceTrack, and SportsMOT) covering various motion patterns and scene complexities, demonstrate that our approach achieves state-of-the-art performance across a wide range of tracking scenarios.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport
Authors:
Rui Wang,
Yi Zheng,
Dongxin Wang,
Haiping Huang,
Yuanzhi Yao,
Yuxiang Zhou,
Jialin Yu,
Philip Torr
Abstract:
Existing topic modeling methods, from LDA to recent neural and LLM-based approaches, which focus mainly on statistical coherence, often produce redundant or off-target topics that miss the user's underlying intent. We introduce Human-centric Topic Modeling, \emph{Human-TM}), a novel task formulation that integrates a human-provided goal directly into the topic modeling process to produce interpret…
▽ More
Existing topic modeling methods, from LDA to recent neural and LLM-based approaches, which focus mainly on statistical coherence, often produce redundant or off-target topics that miss the user's underlying intent. We introduce Human-centric Topic Modeling, \emph{Human-TM}), a novel task formulation that integrates a human-provided goal directly into the topic modeling process to produce interpretable, diverse and goal-oriented topics. To tackle this challenge, we propose the \textbf{G}oal-prompted \textbf{C}ontrastive \textbf{T}opic \textbf{M}odel with \textbf{O}ptimal \textbf{T}ransport (GCTM-OT), which first uses LLM-based prompting to extract goal candidates from documents, then incorporates these into semantic-aware contrastive learning via optimal transport for topic discovery. Experimental results on three public subreddit datasets show that GCTM-OT outperforms state-of-the-art baselines in topic coherence and diversity while significantly improving alignment with human-provided goals, paving the way for more human-centric topic discovery systems.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Precision measurement of the muon charge asymmetry from $W$-boson decays in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1112 additional authors not shown)
Abstract:
A precision measurement of the muon charge asymmetry from $W$-boson decays in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The analysis utilizes data corresponding to an integrated luminosity of 5.1 $fb^{-1}$, recorded by the LHCb detector during 2016, 2017 and 2018. The asymmetry is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 a…
▽ More
A precision measurement of the muon charge asymmetry from $W$-boson decays in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The analysis utilizes data corresponding to an integrated luminosity of 5.1 $fb^{-1}$, recorded by the LHCb detector during 2016, 2017 and 2018. The asymmetry is measured for muons with transverse momentum between 25 and 55 GeV and pseudorapidity between 2.0 and 4.5. This result represents the most precise determination of the muon charge asymmetry in the forward region to date, exhibiting excellent agreement with next-to-next-to-leading-order predictions in perturbative quantum chromodynamics.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Projection of purification performance for the RELICS experiment
Authors:
Jiachen Yu,
Kaihang Li,
Jingfan Gu,
Chang Cai,
Guocai Chen,
Jiangyu Chen,
Huayu Dai,
Rundong Fang,
Hongrui Gao,
Fei Gao,
Xiaoran Guo,
Jiheng Guo,
Chengjie Jia,
Gaojun Jin,
Fali Ju,
Yanzhou Hao,
Xu Han,
Yang Lei,
Meng Li,
Minhua Li,
Shengchao Li,
Siyin Li,
Tao Li,
Qing Lin,
Jiajun Liu
, et al. (25 additional authors not shown)
Abstract:
The RELICS (REactor neutrino LIquid xenon Coherent elastic Scattering) experiment employs a dual-phase liquid xenon time projection chamber to search for Coherent Elastic Neutrino-Nucleus Scattering (CE$ν$NS) induced by reactor neutrinos. To detect these sub-keV nuclear recoils and minimize signal attenuation, it is critical to maintain a sufficiently low impurity concentration in the detector. Th…
▽ More
The RELICS (REactor neutrino LIquid xenon Coherent elastic Scattering) experiment employs a dual-phase liquid xenon time projection chamber to search for Coherent Elastic Neutrino-Nucleus Scattering (CE$ν$NS) induced by reactor neutrinos. To detect these sub-keV nuclear recoils and minimize signal attenuation, it is critical to maintain a sufficiently low impurity concentration in the detector. This work presents a comprehensive purity evolution model developed to describe impurity migration inside the detector. Utilizing measured material outgassing rates as input parameters, the model incorporates non-uniform transport mechanisms of the impurities, including circulation, vaporization, and condensation. The model is validated using data from a dedicated prototype detector. Based on this validated model, projections for the purification performance of the upcoming RELICS-10 and RELICS-50 detectors are provided.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (728 additional authors not shown)
Abstract:
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$,…
▽ More
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$.
△ Less
Submitted 14 April, 2026; v1 submitted 14 April, 2026;
originally announced April 2026.
-
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Authors:
NVIDIA,
:,
Aakshita Chandiramani,
Aaron Blakeman,
Abdullahi Olaoye,
Abhibha Gupta,
Abhilash Somasamudramath,
Abhinav Khattar,
Adeola Adesoba,
Adi Renduchintala,
Adil Asif,
Aditya Agrawal,
Aditya Vavre,
Ahmad Kiswani,
Aishwarya Padmakumar,
Ajay Hotchandani,
Akanksha Shukla,
Akhiad Bercovich,
Aleksander Ficek,
Aleksandr Shaposhnikov,
Alex Gronskiy,
Alex Kondratenko,
Alex Neefus,
Alex Steiner,
Alex Yang
, et al. (522 additional authors not shown)
Abstract:
We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, a…
▽ More
We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
AlphaEval: Evaluating Agents in Production
Authors:
Pengrui Lu,
Bingyu Xu,
Wenjun Zhang,
Shengjia Hua,
Xuanjian Gao,
Ranxiang Ge,
Lyumanshan Ye,
Linxuan Wu,
Yiran Li,
Junfei Fish Yu,
Yibo Zhang,
Ruixin Li,
Manxiang Li,
Xiao Han,
Xiaocong Zhou,
Guangyao Chi,
Zisheng Chen,
Kaishen Chen,
Kun Wang,
Qihua Xu,
Fengyue Meng,
Yuchen Ni,
Jiajun Li,
Jinxiu Liu,
Danfeng Zhang
, et al. (2 additional authors not shown)
Abstract:
The rapid deployment of AI agents in commercial settings has outpaced the development of evaluation methodologies that reflect production realities. Existing benchmarks measure agent capabilities through retrospectively curated tasks with well-specified requirements and deterministic metrics -- conditions that diverge fundamentally from production environments where requirements contain implicit c…
▽ More
The rapid deployment of AI agents in commercial settings has outpaced the development of evaluation methodologies that reflect production realities. Existing benchmarks measure agent capabilities through retrospectively curated tasks with well-specified requirements and deterministic metrics -- conditions that diverge fundamentally from production environments where requirements contain implicit constraints, inputs are heterogeneous multi-modal documents with information fragmented across sources, tasks demand undeclared domain expertise, outputs are long-horizon professional deliverables, and success is judged by domain experts whose standards evolve over time. We present AlphaEval, a production-grounded benchmark of 94 tasks sourced from seven companies deploying AI agents in their core business, spanning six O*NET (Occupational Information Network) domains. Unlike model-centric benchmarks, AlphaEval evaluates complete agent products -- Claude Code, Codex, etc. -- as commercial systems, capturing performance variations invisible to model-level evaluation. Our evaluation framework covers multiple paradigms (LLM-as-a-Judge, reference-driven metrics, formal verification, rubric-based assessment, automated UI testing, etc.), with individual domains composing multiple paradigms. Beyond the benchmark itself, we contribute a requirement-to-benchmark construction framework -- a systematic methodology that transforms authentic production requirements into executable evaluation tasks in minimal time. This framework standardizes the entire pipeline from requirement to evaluation, providing a reproducible, modular process that any organization can adopt to construct production-grounded benchmarks for their own domains.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
ComSim: Building Scalable Real-World Robot Data Generation via Compositional Simulation
Authors:
Yiran Qin,
Jiahua Ma,
Li Kang,
Wenzhan Li,
Yihang Jiao,
Xin Wen,
Xiufeng Song,
Heng Zhou,
Jiwen Yu,
Zhenfei Yin,
Xihui Liu,
Philip Torr,
Yilun Du,
Ruimao Zhang
Abstract:
Recent advancements in foundational models, such as large language models and world models, have greatly enhanced the capabilities of robotics, enabling robots to autonomously perform complex tasks. However, acquiring large-scale, high-quality training data for robotics remains a challenge, as it often requires substantial manual effort and is limited in its coverage of diverse real-world environm…
▽ More
Recent advancements in foundational models, such as large language models and world models, have greatly enhanced the capabilities of robotics, enabling robots to autonomously perform complex tasks. However, acquiring large-scale, high-quality training data for robotics remains a challenge, as it often requires substantial manual effort and is limited in its coverage of diverse real-world environments. To address this, we propose a novel hybrid approach called Compositional Simulation, which combines classical simulation and neural simulation to generate accurate action-video pairs while maintaining real-world consistency. Our approach utilizes a closed-loop real-sim-real data augmentation pipeline, leveraging a small amount of real-world data to generate diverse, large-scale training datasets that cover a broader spectrum of real-world scenarios. We train a neural simulator to transform classical simulation videos into real-world representations, improving the accuracy of policy models trained in real-world environments. Through extensive experiments, we demonstrate that our method significantly reduces the sim2real domain gap, resulting in higher success rates in real-world policy model training. Our approach offers a scalable solution for generating robust training data and bridging the gap between simulated and real-world robotics.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Strategy evolution on networks under payoff uncertainty and risk preference
Authors:
Jiapeng Yu,
Anzhi Sheng,
Long Wang
Abstract:
Cooperation is a key driver of human social progress. Studies of the evolution of cooperation typically assume a deterministic outcome for social interactions. But in real-world social interactions, interaction outcomes are often subject to stochastic perturbations arising from open environments. Individuals may show different attitudes towards such uncertainty, some are risk-seeking, while others…
▽ More
Cooperation is a key driver of human social progress. Studies of the evolution of cooperation typically assume a deterministic outcome for social interactions. But in real-world social interactions, interaction outcomes are often subject to stochastic perturbations arising from open environments. Individuals may show different attitudes towards such uncertainty, some are risk-seeking, while others tend to be risk-averse. Here we investigate how risk preference towards uncertain payoffs affects the evolution of cooperation on social networks, where uncertainty originates from random punishment of defectors initiated by cooperators. We provide an analytical treatment of how the distribution of risk preference among individuals alters the threshold required for cooperation. We find that, at the population level, risk-averse behavior promotes or even rescues cooperation. At the node level, variation in risk preference has a significant impact when it occurs on nodes with high degree centrality. When nodes have the same degree centrality, the nodes with lower betweenness centrality exhibit a stronger effect on strategy evolution. Our analysis reveals how risk preference, together with spatial structure, jointly shapes and potentially reverses the evolutionary dynamics of cooperation.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Measurement of inclusive production of charmonium states in $b$-hadron decays via their decay into $φφ$
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis,
L. An
, et al. (1173 additional authors not shown)
Abstract:
The inclusive production of the $η_c(1S)$, $η_c(2S)$ and $χ_{c}$ charmonium states in $b$-hadron decays is studied with LHCb Run~2 data, corresponding to an integrated luminosity of $5.9~\text{fb}^{-1}$, using charmonia decays to $φφ$ pairs. The production branching fractions of the $χ_{c}(1P)$ states in $b$-hadron decays are measured, using $b \to η_c(1S) (\to φφ) X$ as a normalisation channel, w…
▽ More
The inclusive production of the $η_c(1S)$, $η_c(2S)$ and $χ_{c}$ charmonium states in $b$-hadron decays is studied with LHCb Run~2 data, corresponding to an integrated luminosity of $5.9~\text{fb}^{-1}$, using charmonia decays to $φφ$ pairs. The production branching fractions of the $χ_{c}(1P)$ states in $b$-hadron decays are measured, using $b \to η_c(1S) (\to φφ) X$ as a normalisation channel, with $X$ indicating any additional particles. The results are \begin{align*}
&{\cal{B}} (b \to χ_{c0} X) = (1.34 \pm 0.13 \pm 0.06 \pm 0.37) \times 10^{-3},
&{\cal{B}} (b \to χ_{c1} X) = (1.58 \pm 0.12 \pm 0.09 \pm 0.44) \times 10^{-3},
&{\cal{B}} (b \to χ_{c2} X) = (0.55 \pm 0.08 \pm 0.05 \pm 0.15) \times 10^{-3}, \end{align*} where the first uncertainty is statistical, the second systematic and the last is due to the limited knowledge of externally measured branching fractions. The production branching fraction of $η_c(2S)$ times the branching fraction of its decay into $φφ$ is measured as ${\cal{B}} (b \to η_c(2S) X) \times {\cal{B}} (η_c(2S) \to φφ) = (4.0 \pm 0.6 \pm 0.6 \pm 1.1) \times 10^{-7}$. Furthermore, the mass of the $η_c(1S)$ state is measured to be $M_{η_c(1S)} = 2984.1 \pm 0.5 \pm 0.5$ MeV with the best precision to date.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
Authors:
Jialing Wang,
Yue Zhao,
Yuhao Zhang,
Jing Yu,
Shaosai Li,
Zhanchen Dai,
Benyou Wang,
Haizhou Li
Abstract:
Recent advances in Speech Large Language Models (Speech-LLMs) have made significant progress, greatly enhancing multimodal interaction capabilities.However, their application in low-resource and dialect-diverse environments still faces challenges. The severe scarcity of Tibetan data, coupled with the phonetic differences among its major dialects (Ü-Tsang, Amdo, and Kham), is a prime example of thi…
▽ More
Recent advances in Speech Large Language Models (Speech-LLMs) have made significant progress, greatly enhancing multimodal interaction capabilities.However, their application in low-resource and dialect-diverse environments still faces challenges. The severe scarcity of Tibetan data, coupled with the phonetic differences among its major dialects (Ü-Tsang, Amdo, and Kham), is a prime example of this challenge. This paper proposes Ti-Audio, the first multi-dialectal end-to-end Speech-LLM for Tibetan. To efficiently align speech and text, we introduce a Dynamic Q-Former Adapter that extracts essential acoustic features from variable-length speech, ensuring stable cross-modal alignment even with limited data. At the data level, we leverage mutual assistance among related dialects to alleviate data scarcity and employ a temperature-based sampling strategy to maximize this synergy. Experimental results demonstrate that Ti-Audio achieves state-of-the-art performance on Tibetan benchmarks for automatic speech recognition and speech translation. Our work validates the effectiveness of cross-dialectal cooperation and provides a scalable paradigm for the development of Speech-LLM in low-resource scenarios.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
LLMs Should Incorporate Explicit Mechanisms for Human Empathy
Authors:
Xiaoxing You,
Qiang Huang,
Jun Yu
Abstract:
This paper argues that Large Language Models (LLMs) should incorporate explicit mechanisms for human empathy. As LLMs become increasingly deployed in high-stakes human-centered settings, their success depends not only on correctness or fluency but on faithful preservation of human perspectives. Yet, current LLMs systematically fail at this requirement: even when well-aligned and policy-compliant,…
▽ More
This paper argues that Large Language Models (LLMs) should incorporate explicit mechanisms for human empathy. As LLMs become increasingly deployed in high-stakes human-centered settings, their success depends not only on correctness or fluency but on faithful preservation of human perspectives. Yet, current LLMs systematically fail at this requirement: even when well-aligned and policy-compliant, they often attenuate affect, misrepresent contextual salience, and rigidify relational stance in ways that distort meaning. We formalize empathy as an observable behavioral property: the capacity to model and respond to human perspectives while preserving intention, affect, and context. Under this framing, we identify four recurring mechanisms of empathic failure in contemporary LLMs--sentiment attenuation, empathic granularity mismatch, conflict avoidance, and linguistic distancing--arising as structural consequences of prevailing training and alignment practices. We further organize these failures along three dimensions: cognitive, cultural, and relational empathy, to explain their manifestation across tasks. Empirical analyses show that strong benchmark performance can mask systematic empathic distortions, motivating empathy-aware objectives, benchmarks, and training signals as first-class components of LLM development.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (741 additional authors not shown)
Abstract:
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$,…
▽ More
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$, $\mathcal{B}(χ_{c1} \to π^{+}π^{-}π^{0}π^{0}) = (1.16 \pm 0.01 \pm 0.05) \times 10^{-2}$, and $\mathcal{B}(χ_{c2} \to π^{+}π^{-}π^{0}π^{0}) = (1.92 \pm 0.01 \pm 0.08) \times 10^{-2}$, where the first uncertainties are statistical and the second systematic. The dominant intermediate states are found to be $χ_{cJ}\toρ^+ρ^-$. These results supersede the previous most precise measurements and provide significantly improved precision.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
First Observation of \boldmath{$D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$} Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (734 additional authors not shown)
Abstract:
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measure…
▽ More
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measured to be $(3.20\pm0.06_{\text{stat.}}\pm0.03_{\text{syst.}})\times 10^{-3}$ and $(2.43 \pm 0.11_{\text{stat.}} \pm 0.04_{\text{syst.}}) \times 10^{-3}$, respectively. % , both achieving three times better precision than the current PDG values. The decay process $D^{+}\to a_0(980)^{+}f_0(500)$ is observed for the first time with an unexpectedly large branching fraction. Moreover, we observe the decays $D^+ \to a_0(980)^{+(0)} ρ(770)^{0(+)}$ and measure the ratio $r_{+/0} \equiv \frac{\mathcal{B}(D^+ \to a_0(980)^+ ρ(770)^0)}{\mathcal{B}(D^+ \to a_0(980)^0 ρ(770)^+)}$ for the first time to be $0.55\pm0.08_{\text{stat.}}\pm0.05_{\text{syst.}}$. These results offer a novel insight into our comprehension of the nature of the $a_0(980)$ and $f_0(500)$ states.
△ Less
Submitted 15 April, 2026; v1 submitted 11 April, 2026;
originally announced April 2026.
-
Probing lattice fluctuations using solid-state high-harmonic spectroscopy
Authors:
Lance Hatch,
Navdeep Rana,
Shoushou He,
Jessica Yu,
Boyang Zhao,
Yu Zhang,
Haidan Wen,
Xavier Roy,
Lun Yue,
Mette Gaarde,
Hanzhe Liu
Abstract:
Solid-state high-harmonic spectroscopy allows the study of strongly driven ultrafast electron dynamics. Microscopically, high harmonics are generated by strong-laser-field acceleration of electron-hole pairs through the lattice. At finite temperatures, atomic-scale structural fluctuations are ubiquitous and are expected to influence the electron-hole trajectories. Yet, the effect of thermal lattic…
▽ More
Solid-state high-harmonic spectroscopy allows the study of strongly driven ultrafast electron dynamics. Microscopically, high harmonics are generated by strong-laser-field acceleration of electron-hole pairs through the lattice. At finite temperatures, atomic-scale structural fluctuations are ubiquitous and are expected to influence the electron-hole trajectories. Yet, the effect of thermal lattice fluctuations on solid-state high-harmonic generation (HHG) has not been quantified. Here, we demonstrate a profound sensitivity of HHG to thermal lattice fluctuations, by characterizing the temperature dependence of HHG in Re6Se8Cl2, a superatomic semiconductor. As the sample temperature is decreased, the high-harmonic yield exhibits a slow increase, followed by an abrupt increase below 50 K, consistent with the temperature at which lattice vibrations are strongly suppressed. Our calculations show that thermal lattice fluctuations both weaken the harmonic response from individual distorted configurations and induce phase dispersion across the ensemble, leading to a pronounced suppression of the coherently emitted harmonics. We show that this effect can be interpreted in terms of an effective electronic dephasing time that varies with temperature. Our results are relevant to dephasing in broad strong-field phenomena, including lightwave electronics and Floquet engineering. The wide tunability of superatomic crystals further enables materials-controlled strong-field physics.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep Unrolling
Authors:
Luca Jiang-Tao Yu,
Chenshu Wu
Abstract:
Wireless sensing, traditionally relying on signal processing (SP) techniques, has recently shifted toward data-driven deep learning (DL) to achieve performance breakthroughs. However, existing deep wireless sensing models are typically end-to-end and task-specific, lacking reusability and interpretability. We propose RF-LEGO, a modular co-design framework that transforms interpretable SP algorithm…
▽ More
Wireless sensing, traditionally relying on signal processing (SP) techniques, has recently shifted toward data-driven deep learning (DL) to achieve performance breakthroughs. However, existing deep wireless sensing models are typically end-to-end and task-specific, lacking reusability and interpretability. We propose RF-LEGO, a modular co-design framework that transforms interpretable SP algorithms into trainable, physics-grounded DL modules through deep unrolling. By replacing hand-tuned parameters with learnable ones while preserving core processing structures and mathematical operators, RF-LEGO ensures modularity, cascadability, and structure-aligned interpretability. Specifically, we introduce three deep-unrolled modules for critical RF sensing tasks: frequency transform, spatial angle estimation, and signal detection. Extensive experiments using real-world data for Wi-Fi, millimeter-wave, UWB, and 6G sensing demonstrate that RF-LEGO significantly outperforms existing SP and DL baselines, both standalone and when integrated into multiple downstream tasks. RF-LEGO pioneers a novel SP-DL co-design paradigm for wireless sensing via deep unrolling, shedding light on efficient and interpretable deep wireless sensing solutions. Our code is available at https://github.com/aiot-lab/RF-LEGO.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems
Authors:
Zongwei Wang,
Min Gao,
Hongzhi Yin,
Junliang Yu,
Tong Chen,
Shazia Sadiq,
Tianrui Li
Abstract:
Large language model-empowered agentic recommender systems (ARS) reformulate recommendation as a multi-turn interaction between a recommender agent and a user agent, enabling iterative preference elicitation and refinement beyond conventional one-shot prediction. However, existing ARS are mainly optimized in a Reflexion-style paradigm, where past interaction trajectories are stored as textual memo…
▽ More
Large language model-empowered agentic recommender systems (ARS) reformulate recommendation as a multi-turn interaction between a recommender agent and a user agent, enabling iterative preference elicitation and refinement beyond conventional one-shot prediction. However, existing ARS are mainly optimized in a Reflexion-style paradigm, where past interaction trajectories are stored as textual memory and retrieved as prompt context for later reasoning. Although this design allows agents to recall prior feedback and observations, the accumulated experience remains external to model parameters, leaving agents reliant on generic reasoning rather than progressively acquiring recommendation-specific decision-making ability through learning. Reinforcement learning (RL) therefore provides a natural way to internalize such interaction experience into parameters. Yet existing RL methods for ARS still suffer from two key limitations. First, they fail to capture the interactive nature of ARS, in which the recommender agent and the user agent continuously influence each other and can naturally generate endogenous supervision through interaction feedback. Second, they reduce a rich multi-turn interaction process to final outcomes, overlooking the dense supervision embedded throughout the trajectory. To this end, we propose CoARS, a self-distilled reinforcement learning framework for co-evolving agentic recommender systems. CoARS introduces two complementary learning schemes: interaction reward, which derives coupled task-level supervision for the recommender agent and the user agent from the same interaction trajectory, and self-distilled credit assignment, which converts historical trajectories into token-level credit signals under teacher-student conditioning. Experiments on multiple datasets show that CoARS outperforms representative ARS baselines in recommendation performance and user alignment.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Horrila: Cost-Based Placement of Semantic Operators in Hybrid Query Plans
Authors:
Qiuyang Mang,
Yufan Xiang,
Hangrui Zhou,
Runyuan He,
Jiaxiang Yu,
Hanchen Li,
Aditya Parameswaran,
Alvin Cheung
Abstract:
Recent database systems have introduced semantic operators that leverage large language models (LLMs) to filter, join, and project over structured data using natural language predicates. In practice, these operators are combined with traditional relational operators, e.g., equi-joins, producing hybrid query plans whose execution cost depends on both expensive LLM calls and conventional database pr…
▽ More
Recent database systems have introduced semantic operators that leverage large language models (LLMs) to filter, join, and project over structured data using natural language predicates. In practice, these operators are combined with traditional relational operators, e.g., equi-joins, producing hybrid query plans whose execution cost depends on both expensive LLM calls and conventional database processing. A key optimization question is where to place each semantic operator relative to the relational operators in the plan: placing them earlier reduces the data that subsequent operators process, but requires more LLM calls; placing them later reduces LLM calls through deduplication, but forces relational operators to process larger intermediate data. Existing systems either ignore this placement question or apply simple heuristics without considering the full cost trade-off. We present Horrila, a plan-level optimizer for hybrid semantic-relational queries. Horrila reduces hybrid query planning to semantic filter placement via two equivalence-preserving rewrites. We prove that deferring all semantic filters to the latest possible position minimizes LLM invocations under function caching, but show that this can cause relational processing costs to dominate on complex multi-table queries. To balance LLM cost against relational cost, Horrila uses a dynamic-programming-based cost model that finds the placement minimizing their weighted sum. On 44 semantic SQL queries across five schemas and two benchmarks, Horrila achieves up to 1.5$\times$ speedup and 4.29$\times$ cost reduction while maintaining high output quality: an average F1 of 0.85 against the unoptimized baseline and 0.84 against human-annotated ground truth on SemBench. Overall, Horrila achieves a significant cost reduction while preserving the highest accuracy among six publicly available systems.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion
Authors:
Yunfei Feng,
Xi Zhao,
Cheng Zhang,
Dahu Feng,
Daolin Cheng,
Jianqi Yu,
Yubin Xia,
Erhu Feng
Abstract:
Mobile agents can autonomously complete user-assigned tasks through GUI interactions. However, existing mainstream evaluation benchmarks, such as AndroidWorld, operate by connecting to a system-level Android emulator and provide evaluation signals based on the state of system resources. In real-world mobile-agent scenarios, however, many third-party applications do not expose system-level APIs to…
▽ More
Mobile agents can autonomously complete user-assigned tasks through GUI interactions. However, existing mainstream evaluation benchmarks, such as AndroidWorld, operate by connecting to a system-level Android emulator and provide evaluation signals based on the state of system resources. In real-world mobile-agent scenarios, however, many third-party applications do not expose system-level APIs to determine whether a task has succeeded, leading to a mismatch between benchmarks and real-world usage and making it difficult to evaluate model performance accurately. To address these issues, we propose MobiFlow, an evaluation framework built on tasks drawn from arbitrary third-party applications. Using an efficient graph-construction algorithm based on multi-trajectory fusion, MobiFlow can effectively compress the state space, support dynamic interaction, and better align with real-world third-party application scenarios. MobiFlow covers 20 widely used third-party applications and comprises 240 diverse real-world tasks, with enriched evaluation metrics. Compared with AndroidWorld, MobiFlow's evaluation results show higher alignment with human assessments and can guide the training of future GUI-based models under real workloads.
△ Less
Submitted 28 February, 2026;
originally announced April 2026.
-
Strips as Tokens: Artist Mesh Generation with Native UV Segmentation
Authors:
Rui Xu,
Dafei Qin,
Kaichun Qiao,
Qiujie Dong,
Huaijin Pi,
Qixuan Zhang,
Longwen Zhang,
Lan Xu,
Jingyi Yu,
Wenping Wang,
Taku Komura
Abstract:
Recent advancements in autoregressive transformers have demonstrated remarkable potential for generating artist-quality meshes. However, the token ordering strategies employed by existing methods typically fail to meet professional artist standards, where coordinate-based sorting yields inefficiently long sequences, and patch-based heuristics disrupt the continuous edge flow and structural regular…
▽ More
Recent advancements in autoregressive transformers have demonstrated remarkable potential for generating artist-quality meshes. However, the token ordering strategies employed by existing methods typically fail to meet professional artist standards, where coordinate-based sorting yields inefficiently long sequences, and patch-based heuristics disrupt the continuous edge flow and structural regularity essential for high-quality modeling. To address these limitations, we propose Strips as Tokens (SATO), a novel framework with a token ordering strategy inspired by triangle strips. By constructing the sequence as a connected chain of faces that explicitly encodes UV boundaries, our method naturally preserves the organized edge flow and semantic layout characteristic of artist-created meshes. A key advantage of this formulation is its unified representation, enabling the same token sequence to be decoded into either a triangle or quadrilateral mesh. This flexibility facilitates joint training on both data types: large-scale triangle data provides fundamental structural priors, while high-quality quad data enhances the geometric regularity of the outputs. Extensive experiments demonstrate that SATO consistently outperforms prior methods in terms of geometric quality, structural coherence, and UV segmentation.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
"Take Me Home, Wi-Fi Drone": A Drone-based Wireless System for Wilderness Search and Rescue
Authors:
Weiying Hou,
Luca Jiang-Tao Yu,
Chenshu Wu
Abstract:
Wilderness Search and Rescue (WiSAR) represents a longstanding and critical societal challenge, demanding innovative and automatic technological solutions. In this paper, we introduce Wi2SAR, a novel autonomous drone-based wireless system for long-range, through-occlusion WiSAR operations, without relying on existing infrastructure. Our basic insight is to leverage the automatic reconnection behav…
▽ More
Wilderness Search and Rescue (WiSAR) represents a longstanding and critical societal challenge, demanding innovative and automatic technological solutions. In this paper, we introduce Wi2SAR, a novel autonomous drone-based wireless system for long-range, through-occlusion WiSAR operations, without relying on existing infrastructure. Our basic insight is to leverage the automatic reconnection behavior of modern Wi-Fi devices to known networks. By mimicking these networks via on-drone Wi-Fi, Wi2SAR uniquely facilitates the discovery and localization of victims through their accompanying mobile devices. Translating this simple idea into a practical system poses substantial technical challenges. Wi2SAR overcomes these challenges via three distinct innovations: (1) a rapid and energy-efficient device discovery mechanism to discover and identify the target victim, (2) a novel RSS-only, long-range direction finding approach using a 3D-printed Luneburg Lens, amplifying the directional signal strength differences and significantly extending the operational range, and (3) an adaptive drone navigation scheme that guides the drone toward the target efficiently. We implement an end-to-end prototype and evaluate Wi2SAR across various mobile devices and real-world wilderness scenarios. Experimental results demonstrate Wi2SAR's high performance, efficiency, and practicality, highlighting its potential to advance autonomous WiSAR solutions. Wi2SAR is open-sourced at https://aiot-lab.github.io/Wi2SAR to facilitate further research and real-world deployment.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations
Authors:
Pengze Li,
Jiaquan Zhang,
Yunbo Long,
Xinping Liu,
Zhou wenjie,
Encheng Su,
Zihang Zeng,
Jiaqi Liu,
Jiyao Liu,
Junchi Yu,
Lihao Liu,
Philip Torr,
Shixiang Tang,
Aoran Wang,
Xi Chen
Abstract:
Recovering analytical solutions of physical fields from visual observations is a fundamental yet underexplored capability for AI-assisted scientific reasoning. We study visual-to-symbolic analytical solution inference (ViSA) for two-dimensional linear steady-state fields: given field visualizations (and first-order derivatives) plus minimal auxiliary metadata, the model must output a single execut…
▽ More
Recovering analytical solutions of physical fields from visual observations is a fundamental yet underexplored capability for AI-assisted scientific reasoning. We study visual-to-symbolic analytical solution inference (ViSA) for two-dimensional linear steady-state fields: given field visualizations (and first-order derivatives) plus minimal auxiliary metadata, the model must output a single executable SymPy expression with fully instantiated numeric constants. We introduce ViSA-R2 and align it with a self-verifying, solution-centric chain-of-thought pipeline that follows a physicist-like pathway: structural pattern recognition solution-family (ansatz) hypothesis parameter derivation consistency verification. We also release ViSA-Bench, a VLM-ready synthetic benchmark covering 30 linear steady-state scenarios with verifiable analytical/symbolic annotations, and evaluate predictions by numerical accuracy, expression-structure similarity, and character-level accuracy. Using an 8B open-weight Qwen3-VL backbone, ViSA-R2 outperforms strong open-source baselines and the evaluated closed-source frontier VLMs under a standardized protocol.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
InstrAct: Towards Action-Centric Understanding in Instructional Videos
Authors:
Zhuoyi Yang,
Jiapeng Yu,
Reuben Tan,
Boyang Li,
Huijuan Xu
Abstract:
Understanding instructional videos requires recognizing fine-grained actions and modeling their temporal relations, which remains challenging for current Video Foundation Models (VFMs). This difficulty stems from noisy web supervision and a pervasive "static bias", where models rely on objects rather than motion cues. To address this, we propose InstrAction, a pretraining framework for instruction…
▽ More
Understanding instructional videos requires recognizing fine-grained actions and modeling their temporal relations, which remains challenging for current Video Foundation Models (VFMs). This difficulty stems from noisy web supervision and a pervasive "static bias", where models rely on objects rather than motion cues. To address this, we propose InstrAction, a pretraining framework for instructional videos' action-centric representations. We first introduce a data-driven strategy, which filters noisy captions and generates action-centric hard negatives to disentangle actions from objects during contrastive learning. At the visual feature level, an Action Perceiver extracts motion-relevant tokens from redundant video encodings. Beyond contrastive learning, we introduce two auxiliary objectives: Dynamic Time Warping alignment (DTW-Align) for modeling sequential temporal structure, and Masked Action Modeling (MAM) for strengthening cross-modal grounding. Finally, we introduce the InstrAct Bench to evaluate action-centric understanding, where our method consistently outperforms state-of-the-art VFMs on semantic reasoning, procedural logic, and fine-grained retrieval tasks.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Test of lepton flavour universality with $B^0\to K^{*0}\ell^+\ell^-$ decays at large dilepton invariant mass
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1113 additional authors not shown)
Abstract:
Muon-electron universality is tested in $B^0 \to K^{*0} \ \ell^+ \ell^-$ decays, in the dilepton-invariant-mass region above the $ψ(2S)$ resonance. The analysis uses beauty mesons produced in proton-proton collisions recorded by the LHCb detector at center-of-mass energies of 7, 8, and 13 $\text{TeV}$, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$. The ratio of branching fraction…
▽ More
Muon-electron universality is tested in $B^0 \to K^{*0} \ \ell^+ \ell^-$ decays, in the dilepton-invariant-mass region above the $ψ(2S)$ resonance. The analysis uses beauty mesons produced in proton-proton collisions recorded by the LHCb detector at center-of-mass energies of 7, 8, and 13 $\text{TeV}$, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$. The ratio of branching fractions between the muon and electron channels, $R_{K^{*0}}$, is measured to be $1.08\,^{+0.14}_{-0.12}\text{(stat)} \ \pm 0.07\text{(syst)}$ for a dilepton-invariant-mass squared above 14.0 $\text{GeV}^{2}/\text{c}^{4}$, consistent with the Standard Model prediction. This result represents the most precise measurement of $R_{K^{*0}}$ in this region and the first such measurement performed at a hadron collider.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Search for the lepton-flavour violating decays $B^+ \to π^+ μ^\pm e^\mp$
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
S. Amato,
J. L. Amey,
Y. Amhis,
L. An,
L. Anderlini
, et al. (1105 additional authors not shown)
Abstract:
The first search for the lepton-flavour violating decays $B^+ \to π^+ μ^{\pm} e^{\mp}$ in proton-proton collisions is presented, using data collected by the LHCb experiment between 2011 and 2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signal is observed and an upper limit on the branching fraction is set at…
▽ More
The first search for the lepton-flavour violating decays $B^+ \to π^+ μ^{\pm} e^{\mp}$ in proton-proton collisions is presented, using data collected by the LHCb experiment between 2011 and 2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signal is observed and an upper limit on the branching fraction is set at $\mathcal{B}(B^+ \to π^+ μ^{\pm} e^{\mp}) < 1.8 \times 10^{-9}$ at the $90\%$ confidence level, two orders of magnitude more restrictive than the current world average. This is the first constraint on lepton-flavour violating $b \to d$ quark transitions at the LHC and also sets the most stringent upper limits to date on $b \to d μ^{\pm} e^{\mp}$ transitions. Limits on left-handed and scalar scenarios beyond the Standard Model are also reported.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Constraining Ultralight Scalar Dark Matter in the Galactic Center with the S2 Orbit
Authors:
Jiang-Chuan Yu,
Yan Cao,
Lijing Shao
Abstract:
The dense environment of our Galactic Center (GC) offers a unique laboratory for probing ultralight dark matter (ULDM). We explore the prospect of detecting a scalar ULDM field through its effects on the orbital dynamics of S-stars around the supermassive black hole in the GC, Sgr A$^*$. We consider both linear and quadratic couplings between the real scalar field $φ$ and Standard Model particles,…
▽ More
The dense environment of our Galactic Center (GC) offers a unique laboratory for probing ultralight dark matter (ULDM). We explore the prospect of detecting a scalar ULDM field through its effects on the orbital dynamics of S-stars around the supermassive black hole in the GC, Sgr A$^*$. We consider both linear and quadratic couplings between the real scalar field $φ$ and Standard Model particles, and analyze two representative ULDM structures: the scalar gravitational atom and the spherical soliton. We find that quadratic coupling induces a non-oscillatory perturbation, leading to a long-term secular orbital evolution. We use the observed periastron precession rate of S2 star to put stringent constraints on the total ULDM mass in the GC and the quadratic coupling constant. For the gravitational atom $|211\rangle$ state, we constrain the mass ratio of ULDM to Sgr A$^*$ to $β\lesssim 10^{-3}$ at $m \sim 10^{-18}$ eV, and for the spherical soliton which extends to $\sim 0.2\,$pc, the mass ratio is limited to $β\lesssim 1$ at $m \sim 3\times10^{-20}$ eV. Notably, the resulting limits on the quadratic coupling constant surpass current bounds in the mass range $10^{-20} \,\text{eV} \lesssim m \lesssim 10^{-18}$ eV.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis
Authors:
Jian Yu,
Fei Shen,
Cong Wang,
Yi Xin,
Si Shen,
Xiaoyu Du,
Jinhui Tang
Abstract:
Diffusion models have driven remarkable advancements in fashion image generation, yet prior works usually treat garment generation and virtual dressing as separate problems, limiting their flexibility in real-world fashion workflows. Moreover, fashion image synthesis under multi-source heterogeneous conditions remains challenging, as existing methods typically rely on simple feature concatenation…
▽ More
Diffusion models have driven remarkable advancements in fashion image generation, yet prior works usually treat garment generation and virtual dressing as separate problems, limiting their flexibility in real-world fashion workflows. Moreover, fashion image synthesis under multi-source heterogeneous conditions remains challenging, as existing methods typically rely on simple feature concatenation or static layer-wise injection, which often causes attribute entanglement and semantic interference. To address these issues, we propose VersaVogue, a unified framework for multi-condition controllable fashion synthesis that jointly supports garment generation and virtual dressing, corresponding to the design and showcase stages of the fashion lifecycle. Specifically, we introduce a trait-routing attention (TA) module that leverages a mixture-of-experts mechanism to dynamically route condition features to the most compatible experts and generative layers, enabling disentangled injection of visual attributes such as texture, shape, and color. To further improve realism and controllability, we develop an automated multi-perspective preference optimization (MPO) pipeline that constructs preference data without human annotation or task-specific reward models. By combining evaluators of content fidelity, textual alignment, and perceptual quality, MPO identifies reliable preference pairs, which are then used to optimize the model via direct preference optimization (DPO). Extensive experiments on both garment generation and virtual dressing benchmarks demonstrate that VersaVogue consistently outperforms existing methods in visual fidelity, semantic consistency, and fine-grained controllability.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Agent-Driven Corpus Linguistics: A Framework for Autonomous Linguistic Discovery
Authors:
Jia Yu,
Weiwei Yu,
Pengfei Xiao,
Fukun Xing
Abstract:
Corpus linguistics has traditionally relied on human researchers to formulate hypotheses, construct queries, and interpret results - a process demanding specialized technical skills and considerable time. We propose Agent-Driven Corpus Linguistics, an approach in which a large language model (LLM), connected to a corpus query engine via a structured tool-use interface, takes over the investigative…
▽ More
Corpus linguistics has traditionally relied on human researchers to formulate hypotheses, construct queries, and interpret results - a process demanding specialized technical skills and considerable time. We propose Agent-Driven Corpus Linguistics, an approach in which a large language model (LLM), connected to a corpus query engine via a structured tool-use interface, takes over the investigative cycle: generating hypotheses, querying the corpus, interpreting results, and refining analysis across multiple rounds. The human researcher sets direction and evaluates final output. Unlike unconstrained LLM generation, every finding is anchored in verifiable corpus evidence. We treat this not as a replacement for the corpus-based/corpus-driven distinction but as a complementary dimension: it concerns who conducts the inquiry, not the epistemological relationship between theory and data. We demonstrate the framework by linking an LLM agent to a CQP-indexed Gutenberg corpus (5 million tokens) via the Model Context Protocol (MCP). Given only "investigate English intensifiers," the agent identified a diachronic relay chain (so+ADJ > very > really), three pathways of semantic change (delexicalization, polarity fixation, metaphorical constraint), and register-sensitive distributions. A controlled baseline experiment shows that corpus grounding contributes quantification and falsifiability that the model cannot produce from training data alone. To test external validity, the agent replicated two published studies on the CLMET corpus (40 million tokens) - Claridge (2025) and De Smet (2013) - with close quantitative agreement. Agent-driven corpus research can thus produce empirically grounded findings at machine speed, lowering the technical barrier for a broader range of researchers.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Fast-dVLM: Efficient Block-Diffusion VLM via Direct Conversion from Autoregressive VLM
Authors:
Chengyue Wu,
Shiyi Lan,
Yonggan Fu,
Sensen Gao,
Jin Wang,
Jincheng Yu,
Jose M. Alvarez,
Pavlo Molchanov,
Ping Luo,
Song Han,
Ligeng Zhu,
Enze Xie
Abstract:
Vision-language models (VLMs) predominantly rely on autoregressive decoding, which generates tokens one at a time and fundamentally limits inference throughput. This limitation is especially acute in physical AI scenarios such as robotics and autonomous driving, where VLMs are deployed on edge devices at batch size one, making AR decoding memory-bandwidth-bound and leaving hardware parallelism und…
▽ More
Vision-language models (VLMs) predominantly rely on autoregressive decoding, which generates tokens one at a time and fundamentally limits inference throughput. This limitation is especially acute in physical AI scenarios such as robotics and autonomous driving, where VLMs are deployed on edge devices at batch size one, making AR decoding memory-bandwidth-bound and leaving hardware parallelism underutilized. While block-wise discrete diffusion has shown promise for parallel text generation, extending it to VLMs remains challenging due to the need to jointly handle continuous visual representations and discrete text tokens while preserving pretrained multimodal capabilities. We present Fast-dVLM, a block-diffusion-based VLM that enables KV-cache-compatible parallel decoding and speculative block decoding for inference acceleration. We systematically compare two AR-to-diffusion conversion strategies: a two-stage approach that first adapts the LLM backbone with text-only diffusion fine-tuning before multimodal training, and a direct approach that converts the full AR VLM in one stage. Under comparable training budgets, direct conversion proves substantially more efficient by leveraging the already multimodally aligned VLM; we therefore adopt it as our recommended recipe. We introduce a suite of multimodal diffusion adaptations, block size annealing, causal context attention, auto-truncation masking, and vision efficient concatenation, that collectively enable effective block diffusion in the VLM setting. Extensive experiments across 11 multimodal benchmarks show Fast-dVLM matches its autoregressive counterpart in generation quality. With SGLang integration and FP8 quantization, Fast-dVLM achieves over 6x end-to-end inference speedup over the AR baseline.
△ Less
Submitted 10 April, 2026; v1 submitted 8 April, 2026;
originally announced April 2026.
-
TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
Authors:
Xiangyu Wang,
Jin Wu,
Haoran Shi,
Wei Xia,
Jiarui Yu,
Chanjin Zheng
Abstract:
Recently, multi-Large Language Model (LLM) frameworks have been proposed to solve contextualized tasks. However, these frameworks do not explicitly emulate human team role division, which may lead to a single perspective, thereby weakening performance on multi-step contextualized tasks. To address this issue, we propose TeamLLM, a human-like Team-Oriented Multi-LLM Collaboration Framework. TeamLLM…
▽ More
Recently, multi-Large Language Model (LLM) frameworks have been proposed to solve contextualized tasks. However, these frameworks do not explicitly emulate human team role division, which may lead to a single perspective, thereby weakening performance on multi-step contextualized tasks. To address this issue, we propose TeamLLM, a human-like Team-Oriented Multi-LLM Collaboration Framework. TeamLLM adopts four team roles with distinct division and employs a three-phase multi-LLM collaboration for multi-step contextualized tasks. To evaluate the effectiveness of TeamLLM on multi-step contextualized tasks, we propose Contextually-Grounded and Procedurally-Structured tasks (CGPST) and construct the CGPST benchmark. This benchmark has four core features: contextual grounding, procedural structure, process-oriented evaluation and multi-dimensional assessment. We evaluate ten popular LLMs on CGPST at overall-level, step-level, and dimension-level. Results show that TeamLLM substantially improves performance on CGPST. We release the benchmark with scenarios, full-process responses and human scores from ten LLMs. The code and data are available at https://anonymous.4open.science/r/TeamLLM-anonymous-C50E/.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics
Authors:
Yunyao Zhang,
Zuocheng Ying,
Xinglang Zhang,
Junqing Yu,
Peng Fang,
Xu Chen,
Wei Yang,
Zikai Song
Abstract:
LLM-based social network simulation introduces a new computational approach for modeling event evolution in complex online environments. However, existing methods typically simulate social processes under a fixed event trajectory, treating the event as static once initialized and overlooking intervention dynamics, and thus fail to capture the intrinsic evolution of real social network events, wher…
▽ More
LLM-based social network simulation introduces a new computational approach for modeling event evolution in complex online environments. However, existing methods typically simulate social processes under a fixed event trajectory, treating the event as static once initialized and overlooking intervention dynamics, and thus fail to capture the intrinsic evolution of real social network events, where source-side interventions and collective interactions continuously reshape event trajectories, sometimes leading to secondary popularity explosions and collective attitude shifts. To address this limitation, we introduce an intervention-aware simulation framework, IntervenSim, that models event evolution and intervention in a closed loop. We model event developments and source-side interventions using source agents, and collective crowd reactions using crowd agents, capturing their continuous co-evolution through an intervention-aware mechanism that couples source-side intervention, group interaction, and feedback-driven adjustment of subsequent interventions. Experiments on diverse real-world events show that IntervenSim improves MAPE by 41.6% and DTW by 66.9% over prior frameworks, while reducing computational cost with fewer yet more capable agents. These improvements indicate that IntervenSim not only simulates regular event trajectories more faithfully, but also better captures opinion dynamics under intervention in complex cases.
△ Less
Submitted 15 April, 2026; v1 submitted 7 April, 2026;
originally announced April 2026.
-
AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis
Authors:
Dong She,
Xianrong Yao,
Liqun Chen,
Jinghe Yu,
Yang Gao,
Zhanpeng Jin
Abstract:
Vision-Language Models (VLMs) have demonstrated strong capabilities in perception, yet holistic Affective Image Content Analysis (AICA), which integrates perception, reasoning, and generation into a unified framework, remains underexplored. To address this gap, we introduce AICA-Bench, a comprehensive benchmark with three core tasks: Emotion Understanding (EU), Emotion Reasoning (ER), and Emotion-…
▽ More
Vision-Language Models (VLMs) have demonstrated strong capabilities in perception, yet holistic Affective Image Content Analysis (AICA), which integrates perception, reasoning, and generation into a unified framework, remains underexplored. To address this gap, we introduce AICA-Bench, a comprehensive benchmark with three core tasks: Emotion Understanding (EU), Emotion Reasoning (ER), and Emotion-Guided Content Generation (EGCG). We evaluate 23 VLMs and identify two major limitations: weak intensity calibration and shallow open-ended descriptions. To address these issues, we propose Grounded Affective Tree (GAT) Prompting, a training-free framework that combines visual scaffolding with hierarchical reasoning. Experiments show that GAT reduces intensity errors and improves descriptive depth, providing a strong baseline for future research on affective multimodal understanding and generation.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Precise measurement of the CKM angle $γ$ with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from…
▽ More
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from ${B^{\pm}\rightarrow D(\rightarrow K_{\rm S}^{0} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays in LHCb data, where $h^{(\prime)}$ is either a pion or kaon, while the corresponding strong-phase parameters are measured using doubly tagged ${D\rightarrow K_{\rm S/L}^0 h^{\prime+} h^{\prime-}}$ decays in the quantum-correlated $D\overline{D}$ system present in BESIII data. A joint fit to both datasets, which allows for a simultaneous determination of the associated $C\!P$-violating observables and strong-phase parameters, yields ${γ= (71.3\pm 5.0)^{\circ}}$. The result is the most precise to date and consistent with previous measurements and world averages.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider…
▽ More
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider II during 2010--2011 and 2021--2022, corresponding to an integrated luminosity of 8 fb$^{-1}$, and proton-proton collisions collected by the LHCb experiment at the Large Hadron Collider during 2011--2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. The two datasets are analyzed simultaneously by applying per-event weights based on the amplitude variation over the $D$-decay phase space to enhance the sensitivity to $C\!P$-violating observables. The CKM angle $γ$ is determined to be $γ= (71.3\pm 5.0)^{\circ}$, which constitutes the most precise single measurement to date.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Coupling Macro Dynamics and Micro States for Long-Horizon Social Simulation
Authors:
Yunyao Zhang,
Yihao Ai,
Zuocheng Ying,
Qirui Mi,
Junqing Yu,
Wei Yang,
Zikai Song
Abstract:
Social network simulation aims to model collective opinion dynamics in large populations, but existing LLM-based simulators mainly focus on aggregate dynamics while largely ignoring individual internal states. This limits their ability to capture opinion reversals driven by gradual individual shifts and makes them unreliable in long-horizon simulations. We propose MF-MDP, a social simulation frame…
▽ More
Social network simulation aims to model collective opinion dynamics in large populations, but existing LLM-based simulators mainly focus on aggregate dynamics while largely ignoring individual internal states. This limits their ability to capture opinion reversals driven by gradual individual shifts and makes them unreliable in long-horizon simulations. We propose MF-MDP, a social simulation framework that tightly couples macro-level collective dynamics with micro-level individual states. MF-MDP explicitly models per-agent latent opinion states with a state transition mechanism, combining individual Markov Decision Processes at the micro level with a mean-field collective framework at the macro level. This allows individual behaviors to change internal states gradually rather than trigger instant reactions, enabling the simulator to distinguish agents that are close to switching from those that are far from switching, capture opinion reversals, and maintain accuracy over long horizons. Across real-world events, MF-MDP supports stable simulation of long-horizon social processes with up to 40,000 interactions, compared with about 300 in the baseline MF-LLM, while reducing long-horizon KL divergence by 75.3% (1.2490 to 0.3089) and reversal KL by 66.9% (1.6425 to 0.5434), significantly mitigating the drift observed in MF-LLM. Code is available at github.com/AI4SS/MF-MDP.
△ Less
Submitted 8 April, 2026; v1 submitted 7 April, 2026;
originally announced April 2026.
-
Hierarchical Contrastive Learning for Multimodal Data
Authors:
Huichao Li,
Junhan Yu,
Doudou Zhou
Abstract:
Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contra…
▽ More
Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contrastive Learning (HCL), a framework that learns globally shared, partially shared, and modality-specific representations within a unified model. HCL combines a hierarchical latent-variable formulation with structural sparsity and a structure-aware contrastive objective that aligns only modalities that genuinely share a latent factor. Under uncorrelated latent variables, we prove identifiability of the hierarchical decomposition, establish recovery guarantees for the loading matrices, and derive parameter estimation and excess-risk bounds for downstream prediction. Simulations show accurate recovery of hierarchical structure and effective selection of task-relevant components. On multimodal electronic health records, HCL yields more informative representations and consistently improves predictive performance.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Quantum Algorithms for Heterogeneous PDEs: The Neutron Diffusion Eigenvalue Problem
Authors:
Andrew M. Childs,
Lincoln Johnston,
Brian Kiedrowski,
Mahathi Vempati,
Jeffery Yu
Abstract:
We develop a hybrid classical-quantum algorithm to solve a type of linear reaction-diffusion equation, the neutron diffusion (generalized) k-eigenvalue problem that establishes nuclear criticality. The algorithm handles an equation with piecewise constant coefficients, describing a problem in a heterogeneous medium. We apply uniform finite elements and show that the quantum algorithm provides sign…
▽ More
We develop a hybrid classical-quantum algorithm to solve a type of linear reaction-diffusion equation, the neutron diffusion (generalized) k-eigenvalue problem that establishes nuclear criticality. The algorithm handles an equation with piecewise constant coefficients, describing a problem in a heterogeneous medium. We apply uniform finite elements and show that the quantum algorithm provides significant polynomial end-to-end speedup over its classical counterparts. This speedup leverages recent advances in quantum linear systems -- fast inversion and quantum preconditioning -- and uses Hamiltonian simulation as a subroutine. Our results suggest that quantum algorithms may provide speedups for heterogeneous PDEs, though the extent of this advantage over the fastest classical algorithm depends on the effectiveness of other classical approaches such as nonuniform or adaptive meshing for a given problem instance.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation
Authors:
Sijun Dai,
Qiang Huang,
Xiaoxing You,
Jun Yu
Abstract:
Retrieval-Augmented Generation (RAG) mitigates hallucinations in Multimodal Large Language Models (MLLMs), yet existing systems struggle with complex cross-modal reasoning. Flat vector retrieval often ignores structural dependencies, while current graph-based methods rely on costly ``translation-to-text'' pipelines that discard fine-grained visual information. To address these limitations, we prop…
▽ More
Retrieval-Augmented Generation (RAG) mitigates hallucinations in Multimodal Large Language Models (MLLMs), yet existing systems struggle with complex cross-modal reasoning. Flat vector retrieval often ignores structural dependencies, while current graph-based methods rely on costly ``translation-to-text'' pipelines that discard fine-grained visual information. To address these limitations, we propose \textbf{MG$^2$-RAG}, a lightweight \textbf{M}ulti-\textbf{G}ranularity \textbf{G}raph \textbf{RAG} framework that jointly improves graph construction, modality fusion, and cross-modal retrieval. MG$^2$-RAG constructs a hierarchical multimodal knowledge graph by combining lightweight textual parsing with entity-driven visual grounding, enabling textual entities and visual regions to be fused into unified multimodal nodes that preserve atomic evidence. Building on this representation, we introduce a multi-granularity graph retrieval mechanism that aggregates dense similarities and propagates relevance across the graph to support structured multi-hop reasoning. Extensive experiments across four representative multimodal tasks (i.e., retrieval, knowledge-based VQA, reasoning, and classification) demonstrate that MG$^2$-RAG consistently achieves state-of-the-art performance while reducing graph construction overhead with an average 43.3$\times$ speedup and 23.9$\times$ cost reduction compared with advanced graph-based frameworks.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
Authors:
Bin Wang,
Tianyao He,
Linke Ouyang,
Fan Wu,
Zhiyuan Zhao,
Tao Chu,
Yuan Qu,
Zhenjiang Jin,
Weijun Zeng,
Ziyang Miao,
Bangrui Xu,
Junbo Niu,
Mengzhang Cai,
Jiantao Qiu,
Qintong Zhang,
Dongsheng Ma,
Yuefeng Sun,
Hejun Dong,
Wenzheng Zhang,
Jutao Xiao,
Jiayong Shi,
Pengyu Liao,
Xiaomeng Zhao,
Huaping Zhong,
Liqun Wei
, et al. (18 additional authors not shown)
Abstract:
Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training…
▽ More
Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than from architectural differences. Building on this finding, we present MinerU2.5-Pro, which advances the state of the art purely through data engineering and training strategy design while retaining the 1.2B-parameter architecture of MinerU2.5 unchanged. At its core is a Data Engine co-designed around coverage, informativeness, and annotation accuracy: Diversity-and-Difficulty-Aware Sampling expands training data from under 10M to 65.5M samples while mitigating distribution shift; Cross-Model Consistency Verification leverages output consensus among heterogeneous models to assess sample difficulty and generate reliable annotations; the Judge-and-Refine pipeline improves annotation quality for hard samples through render-then-verify iterative correction. A three-stage progressive training strategy--large-scale pre-training, hard sample fine-tuning, and GRPO alignment--sequentially exploits these data at different quality tiers. On the evaluation front, we rectify element-matching biases in OmniDocBench v1.5 and introduce a Hard subset, establishing the more discriminative OmniDocBench v1.6 protocol. Without any architectural modification, MinerU2.5-Pro achieves 95.69 on OmniDocBench v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods, including those based on models with over 200x more parameters.
△ Less
Submitted 9 April, 2026; v1 submitted 6 April, 2026;
originally announced April 2026.
-
Decoding Student Dialogue: A Multi-Dimensional Comparison and Bias Analysis of Large Language Models as Annotation Tools
Authors:
Jie Cao,
Zhanxin Hao,
Jifan Yu
Abstract:
Educational dialogue is critical for decoding student learning processes, yet manual annotation remains time-consuming. This study evaluates the efficacy of GPT-5.2 and Gemini-3 using three prompting strategies (few-shot, single-agent, and multi-agent reflection) across diverse subjects, educational levels, and four coding dimensions. Results indicate that while multi-agent prompting achieved the…
▽ More
Educational dialogue is critical for decoding student learning processes, yet manual annotation remains time-consuming. This study evaluates the efficacy of GPT-5.2 and Gemini-3 using three prompting strategies (few-shot, single-agent, and multi-agent reflection) across diverse subjects, educational levels, and four coding dimensions. Results indicate that while multi-agent prompting achieved the highest accuracy, the results did not reach statistical significance. Accuracy proved highly context-dependent, with significantly higher performance in K-12 datasets compared to university-level data, alongside disciplinary variations within the same educational level. Performance peaked in the affective dimension but remained lowest in the cognitive dimension. Furthermore, analysis revealed four bias patterns: (1) Gemini-3 exhibited a consistent optimistic bias in the affective dimension across all subjects; (2) the cognitive dimension displayed domain-specific directional bias, characterized by systematic underestimation in Mathematics versus overestimation in Psychology; (3) both models are more prone to overestimation than underestimation within the meta-cognitive dimension; and (4) behavioral categories such as question, negotiation, and statements were frequently misclassified. These results underscore the need for context-sensitive deployment and targeted mitigation of directional biases in automated annotation.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
Benchmarking Multi-turn Medical Diagnosis: Hold, Lure, and Self-Correction
Authors:
Jinrui Fang,
Runhan Chen,
Xu Yang,
Jian Yu,
Jiawei Xu,
Ashwin Vinod,
Wenqi Shi,
Tianlong Chen,
Heng Ji,
ChengXiang Zhai,
Ying Ding,
Yuji Zhang
Abstract:
Large language models (LLMs) achieve high accuracy in medical diagnosis when all clinical information is provided in a single turn, yet how they behave under multi-turn evidence accumulation closer to real clinical reasoning remains unexplored. We introduce MINT (Medical Incremental N-Turn Benchmark), a high-fidelity, multi-turn medical diagnosis benchmark comprising 1,035 cases with clinically la…
▽ More
Large language models (LLMs) achieve high accuracy in medical diagnosis when all clinical information is provided in a single turn, yet how they behave under multi-turn evidence accumulation closer to real clinical reasoning remains unexplored. We introduce MINT (Medical Incremental N-Turn Benchmark), a high-fidelity, multi-turn medical diagnosis benchmark comprising 1,035 cases with clinically labeled evidence shards, controlled turn granularity, and information-preserving decomposition. Through systematic evaluation of 11 LLMs on MINT, we uncover three persistent behavioral patterns that significantly impact diagnostic decisions: (1) intent to answer, models rush to answer before sufficient evidence has been observed, with over 55% of answers committed within the first two turns; (2) self-correction, incorrect-to-correct answer revisions occur at up to 10.6 times the rate of correct-to-incorrect flips, revealing a latent capacity for self-correction that premature commitment forecloses; and (3) strong lures, clinically salient information such as laboratory results trigger premature answering even when models are explicitly instructed to wait. We translate these findings into clinically actionable guidance: deferring the diagnostic question to later turns reduces premature answering and improves accuracy at the first point of commitment by up to 62.6%, while reserving salient clinical evidence for later turns prevents a catastrophic accuracy drop of up to 23.3% caused by premature commitment. Our work provides both a controlled evaluation framework and concrete recommendations for improving the reliability of LLMs in multi-turn medical diagnosis.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results
Authors:
Shuhong Liu,
Chenyu Bao,
Ziteng Cui,
Xuangeng Chu,
Bin Ren,
Lin Gu,
Xiang Chen,
Mingrui Li,
Long Ma,
Marcos V. Conde,
Radu Timofte,
Yun Liu,
Ryo Umagami,
Tomohiro Hashimoto,
Zijian Hu,
Yuan Gan,
Tianhan Xu,
Yusuke Kurose,
Tatsuya Harada,
Junwei Yuan,
Gengjia Chang,
Xining Ge,
Mache You,
Qida Cao,
Zeliang Li
, et al. (81 additional authors not shown)
Abstract:
This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participa…
▽ More
This paper presents a comprehensive review of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge, detailing the proposed methods and results. The challenge seeks to identify robust reconstruction pipelines that are robust under real-world adverse conditions, specifically extreme low-light and smoke-degraded environments, as captured by our RealX3D benchmark. A total of 279 participants registered for the competition, of whom 33 teams submitted valid results. We thoroughly evaluate the submitted approaches against state-of-the-art baselines, revealing significant progress in 3D reconstruction under adverse conditions. Our analysis highlights shared design principles among top-performing methods and provides insights into effective strategies for handling 3D scene degradation.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories
Authors:
Yuanshao Zhu,
Yuxuan Liang,
Xiangyu Zhao,
Liang Han,
Xinwei Fang,
Xuetao Wei,
James Jianqiao Yu
Abstract:
The generation of realistic and controllable GPS trajectories is a fundamental task for applications in urban planning, mobility simulation, and privacy-preserving data sharing. However, existing methods face a two-fold challenge: they lack the deep semantic understanding to interpret complex user travel intent, and struggle to handle complex constraints while maintaining the realistic diversity i…
▽ More
The generation of realistic and controllable GPS trajectories is a fundamental task for applications in urban planning, mobility simulation, and privacy-preserving data sharing. However, existing methods face a two-fold challenge: they lack the deep semantic understanding to interpret complex user travel intent, and struggle to handle complex constraints while maintaining the realistic diversity inherent in human behavior. To resolve this, we introduce InsTraj, a novel framework that instructs diffusion models to generate high-fidelity trajectories directly from natural language descriptions. Specifically, InsTraj first utilizes a powerful large language model to decipher unstructured travel intentions formed in natural language, thereby creating rich semantic blueprints and bridging the representation gap between intentions and trajectories. Subsequently, we proposed a multimodal trajectory diffusion transformer that can integrate semantic guidance to generate high-fidelity and instruction-faithful trajectories that adhere to fine-grained user intent. Comprehensive experiments on real-world datasets demonstrate that InsTraj significantly outperforms state-of-the-art methods in generating trajectories that are realistic, diverse, and semantically faithful to the input instructions.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
SoK: Blockchain Agent-to-Agent Payments
Authors:
Yuanzhe Zhang,
Yuexin Xiang,
Yuchen Lei,
Qin Wang,
Tian Qiu,
Yujing Sun,
Spiridon Zarkov,
Tsz Hon Yuen,
Andreas Deppeler,
Jiangshan Yu,
Kwok-Yan Lam
Abstract:
Agentic AI rivals human capabilities across a wide range of domains. Looking ahead, it is foreseeable that AI agents will autonomously handle complex workflows and interactions. Early prototypes of this paradigm are emerging, e.g., OpenClaw and Moltbook, signaling a shift toward Agent-to-Agent (A2A) ecosystems. However, despite these promising blueprints, critical trust and security challenges rem…
▽ More
Agentic AI rivals human capabilities across a wide range of domains. Looking ahead, it is foreseeable that AI agents will autonomously handle complex workflows and interactions. Early prototypes of this paradigm are emerging, e.g., OpenClaw and Moltbook, signaling a shift toward Agent-to-Agent (A2A) ecosystems. However, despite these promising blueprints, critical trust and security challenges remain, particularly in scenarios involving financial transactions. Ensuring secure and reliable payment mechanisms between unknown and untrusted agents is crucial to complete a fully functional and trustworthy A2A ecosystem. Although blockchain-based infrastructures provide a natural foundation for this setting, via programmable settlement, transparent accounting, and open interoperability, trust and security challenges have not yet been fully addressed. Hence, for the first time, we systematize blockchain-based A2A payments, e.g., X402, with a four-stage lifecycle: discovery, authorization, execution, and accounting. We categorize representative designs at each stage and identify key challenges, including weak intent binding, misuse under valid authorization, payment-service decoupling, and limited accountability. We highlight future directions for strengthening cross-stage consistency, enabling behavior-aware control, and supporting compositional payment workflows across agents and systems.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
GenSmoke-GS: A Multi-Stage Method for Novel View Synthesis from Smoke-Degraded Images Using a Generative Model
Authors:
Qida Cao,
Xinyuan Hu,
Changyue Shi,
Jiajun Ding,
Zhou Yu,
Jun Yu
Abstract:
This paper describes our method for Track 2 of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge on smoke-degraded images. In this task, smoke reduces image visibility and weakens the cross-view consistency required by scene optimization and rendering. We address this problem with a multi-stage pipeline consisting of image restoration, dehazing, MLLM-based enhancement, 3DGS-MCMC op…
▽ More
This paper describes our method for Track 2 of the NTIRE 2026 3D Restoration and Reconstruction (3DRR) Challenge on smoke-degraded images. In this task, smoke reduces image visibility and weakens the cross-view consistency required by scene optimization and rendering. We address this problem with a multi-stage pipeline consisting of image restoration, dehazing, MLLM-based enhancement, 3DGS-MCMC optimization, and averaging over repeated runs. The main purpose of the pipeline is to improve visibility before rendering while limiting scene-content changes across input views. Experimental results on the challenge benchmark show improved quantitative performance and better visual quality than the provided baselines. The code is available at https://github.com/plbbl/GenSmoke-GS. Our method achieved a ranking of 1 out of 14 participants in Track 2 of the NTIRE 3DRR Challenge, as reported on the official competition website: https://www.codabench.org/competitions/13993/#/results-tab.
△ Less
Submitted 6 April, 2026; v1 submitted 3 April, 2026;
originally announced April 2026.
-
Search for the decays $B_{(s)}^0\to J/ψγ$ at LHCb
Authors:
LHCb collaboration,
R. Aaij,
M. Abdelfatah,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1114 additional authors not shown)
Abstract:
A search for the rare decays $B_{(s)}^0\to J/ψγ$ is performed with proton-proton collision data collected by the LHCb experiment, corresponding to integrated luminosities of $3~\rm{fb}^{-1}$ at centre-of-mass energies of 7 and 8 TeV, and $6~\rm{fb}^{-1}$ at 13 TeV. Assuming no contribution from $B^0\to J/ψγ$ decay, an upper limit is set on the branching fraction…
▽ More
A search for the rare decays $B_{(s)}^0\to J/ψγ$ is performed with proton-proton collision data collected by the LHCb experiment, corresponding to integrated luminosities of $3~\rm{fb}^{-1}$ at centre-of-mass energies of 7 and 8 TeV, and $6~\rm{fb}^{-1}$ at 13 TeV. Assuming no contribution from $B^0\to J/ψγ$ decay, an upper limit is set on the branching fraction $\mathcal{B}(B_{s}^0\to J/ψγ)<2.9\times10^{-6}$ at the 90% confidence level. If instead no contribution from $B_{s}^0\to J/ψγ$ decay is assumed, the limit is $\mathcal{B}(B^0\to J/ψγ)<2.5\times10^{-6}$ at the 90% confidence level. These results supersede the previous LHCb results, with the limit for $B_{s}^0\to J/ψγ$ improved by a factor of 2.5.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.