-
No More Guessing: a Verifiable Gradient Inversion Attack in Federated Learning
Authors:
Francesco Diana,
Chuan Xu,
André Nusser,
Giovanni Neglia
Abstract:
Gradient inversion attacks threaten client privacy in federated learning by reconstructing training samples from clients' shared gradients. Gradients aggregate contributions from multiple records and existing attacks may fail to disentangle them, yielding incorrect reconstructions with no intrinsic way to certify success. In vision and language, attackers may fall back on human inspection to judge…
▽ More
Gradient inversion attacks threaten client privacy in federated learning by reconstructing training samples from clients' shared gradients. Gradients aggregate contributions from multiple records and existing attacks may fail to disentangle them, yielding incorrect reconstructions with no intrinsic way to certify success. In vision and language, attackers may fall back on human inspection to judge reconstruction plausibility, but this is far less feasible for numerical tabular records, fueling the impression that tabular data is less vulnerable.
We challenge this perception by proposing a verifiable gradient inversion attack (VGIA) that provides an explicit certificate of correctness for reconstructed samples. Our method adopts a geometric view of ReLU leakage: the activation boundary of a fully connected layer defines a hyperplane in input space. VGIA introduces an algebraic, subspace-based verification test that detects when a hyperplane-delimited region contains exactly one record. Once isolation is certified, VGIA recovers the corresponding feature vector analytically and reconstructs the target via a lightweight optimization step.
Experiments on tabular benchmarks with large batch sizes demonstrate exact record and target recovery in regimes where existing state-of-the-art attacks either fail or cannot assess reconstruction fidelity. Compared to prior geometric approaches, VGIA allocates hyperplane queries more effectively, yielding faster reconstructions with fewer attack rounds.
△ Less
Submitted 16 April, 2026;
originally announced April 2026.
-
Rapid-response 1.3 mm Observations of GRB 260127A with the Submillimeter Array
Authors:
Garrett K. Keating,
Tanmoy Laskar,
Anna Y. Q. Ho,
Peter K. Blanchard,
Kate D. Alexander,
Edo Berger,
Mark Gurwell,
Tarraneh Eftekhari,
Chloe T. Xu,
Joshua Bennett Lovell,
Ramprasad Rao,
Peter K. G. Williams
Abstract:
We present the results from rapid-response 1.3 mm observations of GRB 260127A using the Submillimeter Array (SMA). SMA arrived on-source 12.6 minutes after the initial detection by the Neil Gehrels Swift Observatory, representing the earliest millimeter/submillimeter observations of a GRB to date. From these observations, we find a source with flux density $6.9\pm1.7$ mJy, consistent with the X-ra…
▽ More
We present the results from rapid-response 1.3 mm observations of GRB 260127A using the Submillimeter Array (SMA). SMA arrived on-source 12.6 minutes after the initial detection by the Neil Gehrels Swift Observatory, representing the earliest millimeter/submillimeter observations of a GRB to date. From these observations, we find a source with flux density $6.9\pm1.7$ mJy, consistent with the X-ray afterglow position but slightly offset from the optical afterglow position (2.7'' offset, with the SMA detection having a 90% confidence radial position uncertainty of 0.9''). Subsequent observations 1.9 days later show no sources of emission, with a $3σ$ upper limit of 0.70 mJy. If the SMA detection is associated with GRB 260127A, we infer that the 1.3 mm light curve for GRB 260127A declined at least as fast as $t^{-0.5}$, suggesting that peak brightness of the event at this wavelength was reached in under a day. We discuss how these findings may be consistent with both forward shock and reverse shock afterglow scenarios, and implications for future millimeter/submillimeter observations of GRBs on these timescales.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
Authors:
Tianshuo Yang,
Guanyu Chen,
Yutian Chen,
Zhixuan Liang,
Yitian Liu,
Zanxin Chen,
Chunpu Xu,
Haotian Liang,
Jiangmiao Pang,
Yao Mu,
Ping Luo
Abstract:
While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visual-grounded-centric hierarchical framework that explicitly decouples high-level…
▽ More
While end-to-end Vision-Language-Action (VLA) models offer a promising paradigm for robotic manipulation, fine-tuning them on narrow control data often compromises the profound reasoning capabilities inherited from their base Vision-Language Models (VLMs). To resolve this fundamental trade-off, we propose HiVLA, a visual-grounded-centric hierarchical framework that explicitly decouples high-level semantic planning from low-level motor control. In high-level part, a VLM planner first performs task decomposition and visual grounding to generate structured plans, comprising a subtask instruction and a precise target bounding box. Then, to translate this plan into physical actions, we introduce a flow-matching Diffusion Transformer (DiT) action expert in low-level part equipped with a novel cascaded cross-attention mechanism. This design sequentially fuses global context, high-resolution object-centric crops and skill semantics, enabling the DiT to focus purely on robust execution. Our decoupled architecture preserves the VLM's zero-shot reasoning while allowing independent improvement of both components. Extensive experiments in simulation and the real world demonstrate that HiVLA significantly outperforms state-of-the-art end-to-end baselines, particularly excelling in long-horizon skill composition and the fine-grained manipulation of small objects in cluttered scenes.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning
Authors:
Zekai Lin,
Chao Xue,
Di Liang,
Xingsheng Han,
Peiyang Liu,
Xianjie Wu,
Lei Jiang,
Yu Lu,
Haibo Shi,
Shuang Liang,
Minlong Peng
Abstract:
Supervised Fine-Tuning (SFT) of large language models often suffers from task interference and catastrophic forgetting. Recent approaches alleviate this issue by isolating task-critical parameters during training. However, these methods represent a static solution to a dynamic problem, assuming that parameter importance remains fixed once identified. In this work, we empirically demonstrate that p…
▽ More
Supervised Fine-Tuning (SFT) of large language models often suffers from task interference and catastrophic forgetting. Recent approaches alleviate this issue by isolating task-critical parameters during training. However, these methods represent a static solution to a dynamic problem, assuming that parameter importance remains fixed once identified. In this work, we empirically demonstrate that parameter importance exhibits temporal drift over the course of training. To address this, we propose Evolving Parameter Isolation (EPI), a fine-tuning framework that adapts isolation decisions based on online estimates of parameter importance. Instead of freezing a fixed subset of parameters, EPI periodically updates isolation masks using gradient-based signals, enabling the model to protect emerging task-critical parameters while releasing outdated ones to recover plasticity. Experiments on diverse multi-task benchmarks demonstrate that EPI consistently reduces interference and forgetting compared to static isolation and standard fine-tuning, while improving overall generalization. Our analysis highlights the necessity of synchronizing isolation mechanisms with the evolving dynamics of learning diverse abilities.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias
Authors:
Zhiyuan Xu,
Jiuming Liu,
Yuxin Chen,
Masayoshi Tomizuka,
Chenfeng Xu,
Chensheng Peng
Abstract:
We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a…
▽ More
We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Foresight Optimization for Strategic Reasoning in Large Language Models
Authors:
Jiashuo Wang,
Jiawen Duan,
Jian Wang,
Kaitao Song,
Chunpu Xu,
Johnny K. W. Ho,
Fenggang Yu,
Wenjie Li,
Johan F. Hoorn
Abstract:
Reasoning capabilities in large language models (LLMs) have generally advanced significantly. However, it is still challenging for existing reasoning-based LLMs to perform effective decision-making abilities in multi-agent environments, due to the absence of explicit foresight modeling. To this end, strategic reasoning, the most fundamental capability to anticipate the counterpart's behaviors and…
▽ More
Reasoning capabilities in large language models (LLMs) have generally advanced significantly. However, it is still challenging for existing reasoning-based LLMs to perform effective decision-making abilities in multi-agent environments, due to the absence of explicit foresight modeling. To this end, strategic reasoning, the most fundamental capability to anticipate the counterpart's behaviors and foresee its possible future actions, has been introduced to alleviate the above issues. Strategic reasoning is fundamental to effective decision-making in multi-agent environments, yet existing reasoning enhancement methods for LLMs do not explicitly capture its foresight nature. In this work, we introduce Foresight Policy Optimization (FoPO) to enhance strategic reasoning in LLMs, which integrates opponent modeling principles into policy optimization, thereby enabling explicit consideration of both self-interest and counterpart influence. Specifically, we construct two curated datasets, namely Cooperative RSA and Competitive Taboo, equipped with well-designed rules and moderate difficulty to facilitate a systematic investigation of FoPO in a self-play framework. Our experiments demonstrate that FoPO significantly enhances strategic reasoning across LLMs of varying sizes and origins. Moreover, models trained with FoPO exhibit strong generalization to out-of-domain strategic scenarios, substantially outperforming standard LLM reasoning optimization baselines.
△ Less
Submitted 16 April, 2026; v1 submitted 15 April, 2026;
originally announced April 2026.
-
Coherent Rydberg excitation of single atoms using a pulsed fiber amplifier
Authors:
Ying-Wen Zhang,
Yang Wang,
Chen-Long Xu,
Yi-Bo Wang,
Peng Xu
Abstract:
In recent years, the growing scale of programmable neutral-atom arrays has led to an increasing demand for higher-power Rydberg excitation light. Although pulsed amplifiers deliver higher peak power than continuous-wave lasers, their use for efficient coherent Rydberg excitation of single atoms in arrays has been limited by challenges such as pulse distortion, synchronization with excitation seque…
▽ More
In recent years, the growing scale of programmable neutral-atom arrays has led to an increasing demand for higher-power Rydberg excitation light. Although pulsed amplifiers deliver higher peak power than continuous-wave lasers, their use for efficient coherent Rydberg excitation of single atoms in arrays has been limited by challenges such as pulse distortion, synchronization with excitation sequences, and spectral linewidth broadening. Here, we address these issues using a fiber-based master-oscillator power-amplifier system. We demonstrate efficient coherent Rydberg excitation of single atoms in a rubidium atom array, achieving performance comparable to continuous-wave methods. This study provides a potentially new technical pathway toward future large-scale quantum simulation and computation with Rydberg atom arrays.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations
Authors:
Tong Zhang,
Jiangning Zhang,
Zhucun Xue,
Juntao Jiang,
Yicheng Xu,
Chengming Xu,
Teng Hu,
Xingyu Xie,
Xiaobin Hu,
Yabiao Wang,
Yong Liu,
Shuicheng Yan
Abstract:
Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam, serve as the cornerstone of modern training pipelines. However, large-scale model training, stringent differential privacy requirements, and distributed learning p…
▽ More
Balancing convergence speed, generalization capability, and computational efficiency remains a core challenge in deep learning optimization. First-order gradient descent methods, epitomized by stochastic gradient descent (SGD) and Adam, serve as the cornerstone of modern training pipelines. However, large-scale model training, stringent differential privacy requirements, and distributed learning paradigms expose critical limitations in these conventional approaches regarding privacy protection and memory efficiency. To mitigate these bottlenecks, researchers explore second-order optimization techniques to surpass first-order performance ceilings, while zeroth-order methods reemerge to alleviate memory constraints inherent to large-scale training. Despite this proliferation of methodologies, the field lacks a cohesive framework that unifies underlying principles and delineates application scenarios for these disparate approaches. In this work, we retrospectively analyze the evolutionary trajectory of deep learning optimization algorithms and present a comprehensive empirical evaluation of mainstream optimizers across diverse model architectures and training scenarios. We distill key emerging trends and fundamental design trade-offs, pinpointing promising directions for future research. By synthesizing theoretical insights with extensive empirical evidence, we provide actionable guidance for designing next-generation highly efficient, robust, and trustworthy optimization methods. The code is available at https://github.com/APRIL-AIGC/Awesome-Optimizer.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Observation of the Exotic State $π_{1}(1600)$ in $ψ(2S)\rightarrowγχ_{c1},χ_{c1}\rightarrowπ^{+}π^{-}η'$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (728 additional authors not shown)
Abstract:
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$,…
▽ More
A partial wave analysis of the process $ψ(2S)\rightarrowγχ_{c1}, χ_{c1}\rightarrowπ^+π^-η^{\prime}$ is performed using $(2712.4\pm14.3)\times10^{6}$ $ψ(2S)$ events collected with the BESIII detector. An isovector state with exotic quantum numbers $J^{PC}=1^{-+}$, denoted as $π_{1}(1600)$, is observed for the first time in the charmonium decay of $χ_{c1}\rightarrowπ_{1}^{\pm}(1600)π^{\mp}$, $π_{1}^{\pm}(1600)\rightarrowπ^{\pm}η^{\prime}$ with a statistical significance over $21σ$. Its mass and width are determined to be $1828 \pm 8 ({\rm stat})^{+11}_{-33}({\rm syst})~\mathrm{MeV}/c^2$ and $638 \pm 26 ({\rm stat})^{+35}_{-86}({\rm syst})~\mathrm{MeV}$, respectively, using a relativistic Breit-Wigner function with a mass-dependent width. The corresponding product of branching fractions is determined to be $\mathcal{B}\left[χ_{c1}\rightarrowπ_{1}(1600)^{\pm}π^{\mp} \right] \times \mathcal{B}\left[π_{1}(1600)^{\pm}\rightarrowπ^{\pm}η^{\prime}\right] = \left( 4.30 \pm 0.14 ({\rm stat})^{+1.04}_{-1.03}({\rm syst})~ \right) \times 10^{-4}$.
△ Less
Submitted 14 April, 2026; v1 submitted 14 April, 2026;
originally announced April 2026.
-
LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
Authors:
Clara Xue,
Zizheng Yan,
Zhenning Shi,
Yuhang Yu,
Jingyu Zhuang,
Qi Zhang,
Jinwei Chen,
Qingnan Fan
Abstract:
Live Photo captures both a high-quality key photo and a short video clip to preserve the precious dynamics around the captured moment. While users may choose alternative frames as the key photo to capture better expressions or timing, these frames often exhibit noticeable quality degradation, as the photo capture ISP pipeline delivers significantly higher image quality than the video pipeline. Thi…
▽ More
Live Photo captures both a high-quality key photo and a short video clip to preserve the precious dynamics around the captured moment. While users may choose alternative frames as the key photo to capture better expressions or timing, these frames often exhibit noticeable quality degradation, as the photo capture ISP pipeline delivers significantly higher image quality than the video pipeline. This quality gap highlights the need for dedicated restoration techniques to enhance the reselected key photo. To this end, we propose LiveMoments, a reference-guided image restoration framework tailored for the reselected key photo in Live Photos. Our method employs a two-branch neural network: a reference branch that extracts structural and textural information from the original high-quality key photo, and a main branch that restores the reselected frame using the guidance provided by the reference branch. Furthermore, we introduce a unified Motion Alignment module that incorporates motion guidance for spatial alignment at both the latent and image levels. Experiments on real and synthetic Live Photos demonstrate that LiveMoments significantly improves perceptual quality and fidelity over existing solutions, especially in scenes with fast motion or complex structures. Our code is available at https://github.com/OpenVeraTeam/LiveMoments.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models
Authors:
Haocheng Xi,
Harman Singh,
Yuezhou Hu,
Coleman Hooper,
Rishabh Tiwari,
Aditya Tomar,
Minjae Lee,
Wonjun Kang,
Michael Mahoney,
Chenfeng Xu,
Kurt Keutzer,
Amir Gholami
Abstract:
Block-wise diffusion language models (DLMs) generate multiple tokens in any order, offering a promising alternative to the autoregressive decoding pipeline. However, they still remain bottlenecked by memory-bound attention in long-context scenarios. Naive sparse attention fails on DLMs due to a KV Inflation problem, where different queries select different prefix positions, making the union of acc…
▽ More
Block-wise diffusion language models (DLMs) generate multiple tokens in any order, offering a promising alternative to the autoregressive decoding pipeline. However, they still remain bottlenecked by memory-bound attention in long-context scenarios. Naive sparse attention fails on DLMs due to a KV Inflation problem, where different queries select different prefix positions, making the union of accessed KV pages large. To address this, we observe that between consecutive denoising steps, only a small fraction of active tokens exhibit significant hidden-state changes, while the majority of stable tokens remain nearly constant. Based on this insight, we propose LOSA (Locality-aware Sparse Attention), which reuses cached prefix-attention results for stable tokens and applies sparse attention only to active tokens. This substantially shrinks the number of KV indices that must be loaded, yielding both higher speedup and higher accuracy. Across multiple block-wise DLMs and benchmarks, LOSA preserves near-dense accuracy while significantly improving efficiency, achieving up to +9 points in average accuracy at aggressive sparsity levels while maintaining 1.54x lower attention density. It also achieves up to 4.14x attention speedup on RTX A6000 GPUs, demonstrating the effectiveness of the proposed method.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
V-Nutri: Dish-Level Nutrition Estimation from Egocentric Cooking Videos
Authors:
Chengkun Yue,
Chuanzhi Xu,
Jiangpeng He
Abstract:
Nutrition estimation of meals from visual data is an important problem for dietary monitoring and computational health, but existing approaches largely rely on single images of the finally completed dish. This setting is fundamentally limited because many nutritionally relevant ingredients and transformations, such as oils, sauces, and mixed components, become visually ambiguous after cooking, mak…
▽ More
Nutrition estimation of meals from visual data is an important problem for dietary monitoring and computational health, but existing approaches largely rely on single images of the finally completed dish. This setting is fundamentally limited because many nutritionally relevant ingredients and transformations, such as oils, sauces, and mixed components, become visually ambiguous after cooking, making accurate calorie and macronutrient estimation difficult. In this paper, we investigate whether the cooking process information from egocentric cooking videos can contribute to dish-level nutrition estimation. First, we further manually annotated the HD-EPIC dataset and established the first benchmark for video-based nutrition estimation. Most importantly, we propose V-Nutri, a staged framework that combines Nutrition5K-pretrained visual backbones with a lightweight fusion module that aggregates features from the final dish frame and cooking process keyframes extracted from the egocentric videos. V-Nutri also includes a cooking keyframes selection module, a VideoMamba-based event-detection model that targets ingredient-addition moments. Experiments on the HD-EPIC dataset show that process cues can provide complementary nutritional evidence, improving nutrition estimation under controlled conditions. Our results further indicate that the benefit of process keyframes depends strongly on backbone representation capacity and event detection quality. Our code and annotated dataset is available at https://github.com/K624-YCK/V-Nutri.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Compressible turbulent boundary layers over two-dimensional square-rib roughness
Authors:
Youtian Su,
Wei-Xi Huang,
Chunxiao Xu
Abstract:
Direct numerical simulations are performed to investigate the combined effects of surface roughness and wall heat transfer on spatially developing compressible turbulent boundary layers at $Ma=2.5$. The roughness consists of transverse square bars with $λ_x/k=8$ and $k^+ \approx 35$, under adiabatic and wall-cooling ($T_w/T_r = 0.5$) conditions. Dynamically, the conventional zero-moment method fai…
▽ More
Direct numerical simulations are performed to investigate the combined effects of surface roughness and wall heat transfer on spatially developing compressible turbulent boundary layers at $Ma=2.5$. The roughness consists of transverse square bars with $λ_x/k=8$ and $k^+ \approx 35$, under adiabatic and wall-cooling ($T_w/T_r = 0.5$) conditions. Dynamically, the conventional zero-moment method fails to yield a consistent zero-plane displacement for the present cavity-type roughness. Instead, a fitting-based optimization procedure is proposed to determine the kinematic virtual origin, which successfully restores the logarithmic behavior. Based on this displacement, Griffin--Fu--Moin (GFM) transformation outperforms the classical van Driest transformation in recovering outer-layer similarity for the velocity defect. Thermodynamically, the physical disparity between momentum form drag and the absence of a corresponding heat transfer mechanism disrupts the classical Reynolds analogy. The effective turbulent Prandtl number ($Pr_e$) deviates severely from unity within the roughness sublayer, leading to the breakdown of the classical Generalized Reynolds Analogy (GRA). To address this, a modified rough-wall GRA (rGRA) is formulated by introducing an equivalent slip-plane or reference-point boundary conditions, which accurately reconstructs the temperature-velocity relationship by bypassing the near-wall thermal heterogeneity. Finally, the refined strong Reynolds analogy (RSRA) is shown to maintain predictive accuracy for fluctuation intensities in the outer layer despite near-wall modulation by roughness and cooling.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Intelligent Approval of Access Control Flow in Office Automation Systems via Relational Modeling
Authors:
Dugang Liu,
Zulong Chen,
Chuanfei Xu,
Jiaxuan He,
Yunlu Ma,
Jia Xu
Abstract:
Office automation (OA) systems play a crucial role in enterprise operations and management, with access control flow approval (ACFA) being a key component that manages the accessibility of various resources. However, traditional ACFA requires approval from the person in charge at each step, which consumes a significant amount of manpower and time. Its intelligence is a crucial issue that needs to…
▽ More
Office automation (OA) systems play a crucial role in enterprise operations and management, with access control flow approval (ACFA) being a key component that manages the accessibility of various resources. However, traditional ACFA requires approval from the person in charge at each step, which consumes a significant amount of manpower and time. Its intelligence is a crucial issue that needs to be addressed urgently by all companies. In this paper, we propose a novel relational modeling-driven intelligent approval (RMIA) framework to automate ACFA. Specifically, our RMIA consists of two core modules: (1) The binary relation modeling module aims to characterize the coupling relation between applicants and approvers and provide reliable basic information for ACFA decision-making from a coarse-grained perspective. (2) The ternary relation modeling module utilizes specific resource information as its core, characterizing the complex relations between applicants, resources, and approvers, and thus provides fine-grained gain information for informed decision-making. Then, our RMIA effectively fuses these two kinds of information to form the final decision. Finally, extensive experiments are conducted on two product datasets and an online A/B test to verify the effectiveness of RMIA.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Introspective Diffusion Language Models
Authors:
Yifan Yu,
Yuqing Jian,
Junxiong Wang,
Zhongzhu Zhou,
Donglin Zhuang,
Xinyu Fang,
Sri Yanamandra,
Xiaoxia Wu,
Qingyang Wu,
Shuaiwen Leon Song,
Tri Dao,
Ben Athiwaratkun,
James Zou,
Fan Lai,
Chenfeng Xu
Abstract:
Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a stru…
▽ More
Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with their own generations, while DLMs often do not. We define the introspective acceptance rate, which measures whether a model accepts its previously generated tokens. This reveals why AR training has a structural advantage: causal masking and logit shifting implicitly enforce introspective consistency. Motivated by this observation, we introduce Introspective Diffusion Language Model (I-DLM), a paradigm that retains diffusion-style parallel decoding while inheriting the introspective consistency of AR training. I-DLM uses a novel introspective strided decoding (ISD) algorithm, which enables the model to verify previously generated tokens while advancing new ones in the same forward pass. From a systems standpoint, we build I-DLM inference engine on AR-inherited optimizations and further customize it with a stationary-batch scheduler. To the best of our knowledge, I-DLM is the first DLM to match the quality of its same-scale AR counterpart while outperforming prior DLMs in both model quality and practical serving efficiency across 15 benchmarks. It reaches 69.6 on AIME-24 and 45.7 on LiveCodeBench-v6, exceeding LLaDA-2.1-mini (16B) by more than 26 and 15 points, respectively. Beyond quality, I-DLM is designed for the growing demand of large-concurrency serving, delivering about 3x higher throughput than prior state-of-the-art DLMs.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction
Authors:
Shaqi Luo,
Yuanyuan Li,
Youhao Hu,
Chenhao Yu,
Chaoran Xu,
Jiachen Zhang,
Guocai Yao,
Tiejun Huang,
Ran He,
Zhongyuan Wang
Abstract:
UMI-style interfaces enable scalable robot learning, but existing systems remain largely visuomotor, relying primarily on RGB observations and trajectory while providing only limited access to physical interaction signals. This becomes a fundamental limitation in contact-rich manipulation, where success depends on contact dynamics such as tactile interaction, internal grasping force, and external…
▽ More
UMI-style interfaces enable scalable robot learning, but existing systems remain largely visuomotor, relying primarily on RGB observations and trajectory while providing only limited access to physical interaction signals. This becomes a fundamental limitation in contact-rich manipulation, where success depends on contact dynamics such as tactile interaction, internal grasping force, and external interaction wrench that are difficult to infer from vision alone. We present OmniUMI, a unified framework for physically grounded robot learning via human-aligned multimodal interaction. OmniUMI synchronously captures RGB, depth, trajectory, tactile sensing, internal grasping force, and external interaction wrench within a compact handheld system, while maintaining collection--deployment consistency through a shared embodiment design. To support human-aligned demonstration, OmniUMI provides dual-force feedback through bilateral gripper feedback and natural perception of external interaction wrench in the handheld embodiment. Built on this interface, we extend diffusion policy with visual, tactile, and force-related observations, and deploy the learned policy through impedance-based execution for unified regulation of motion and contact behavior. Experiments demonstrate reliable sensing and strong downstream performance on force-sensitive pick-and-place, interactive surface erasing, and tactile-informed selective release. Overall, OmniUMI combines physically grounded multimodal data acquisition with human-aligned interaction, providing a scalable foundation for learning contact-rich manipulation.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
Rein3D: Reinforced 3D Indoor Scene Generation with Panoramic Video Diffusion Models
Authors:
Dehui Wang,
Congsheng Xu,
Rong Wei,
Yue Shi,
Shoufa Chen,
Dingxiang Luo,
Tianshuo Yang,
Xiaokang Yang,
Wei Sui,
Yusen Qin,
Rui Tang,
Yao Mu
Abstract:
The growing demand for Embodied AI and VR applications has highlighted the need for synthesizing high-quality 3D indoor scenes from sparse inputs. However, existing approaches struggle to infer massive amounts of missing geometry in large unseen areas while maintaining global consistency, often producing locally plausible but globally inconsistent reconstructions. We present Rein3D, a framework th…
▽ More
The growing demand for Embodied AI and VR applications has highlighted the need for synthesizing high-quality 3D indoor scenes from sparse inputs. However, existing approaches struggle to infer massive amounts of missing geometry in large unseen areas while maintaining global consistency, often producing locally plausible but globally inconsistent reconstructions. We present Rein3D, a framework that reconstructs full 360-degree indoor environments by coupling explicit 3D Gaussian Splatting (3DGS) with temporally coherent priors from video diffusion models. Our approach follows a "restore-and-refine" paradigm: we employ a radial exploration strategy to render imperfect panoramic videos along trajectories starting from the origin, effectively uncovering occluded regions from a coarse 3DGS initialization. These sequences are restored by a panoramic video-to-video diffusion model and further enhanced via video super-resolution to synthesize high-fidelity geometry and textures. Finally, these refined videos serve as pseudo-ground truths to update the global 3D Gaussian field. To support this task, we construct PanoV2V-15K, a dataset of over 15K paired clean and degraded panoramic videos for diffusion-based scene restoration. Experiments demonstrate that Rein3D produces photorealistic and globally consistent 3D scenes and significantly improves long-range camera exploration compared with existing baselines.
△ Less
Submitted 14 April, 2026; v1 submitted 12 April, 2026;
originally announced April 2026.
-
Measurement of the branching fractions of $χ_{cJ} \to π^{+}π^{-}π^{0}π^{0}$ via $ψ(3686) \to γχ_{cJ}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (741 additional authors not shown)
Abstract:
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$,…
▽ More
Using $(2712.4\pm14.3)\times 10^6$ $ψ(3686)$ events collected with the BESIII detector operating at BEPCII, the branching fractions of $χ_{cJ}\toπ^+π^-π^0π^0$ ($J=0,~1,~2$) are measured via the radiative transition $ψ(3686)\toγχ_{cJ}$. The results are $\mathcal{B}(χ_{c0} \to π^{+}π^{-}π^{0}π^{0}) = (3.10 \pm 0.01 \pm 0.14) \times 10^{-2}$, $\mathcal{B}(χ_{c1} \to π^{+}π^{-}π^{0}π^{0}) = (1.16 \pm 0.01 \pm 0.05) \times 10^{-2}$, and $\mathcal{B}(χ_{c2} \to π^{+}π^{-}π^{0}π^{0}) = (1.92 \pm 0.01 \pm 0.08) \times 10^{-2}$, where the first uncertainties are statistical and the second systematic. The dominant intermediate states are found to be $χ_{cJ}\toρ^+ρ^-$. These results supersede the previous most precise measurements and provide significantly improved precision.
△ Less
Submitted 12 April, 2026;
originally announced April 2026.
-
First Observation of \boldmath{$D^+ \to a_0(980)ρ$ and $D^+ \to a_0(980)^+ f_0(500)$} in \boldmath{$D^+ \to π^+π^+π^-η$ and $D^+ \to π^+π^0π^0η$} Decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
X. L. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko
, et al. (734 additional authors not shown)
Abstract:
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measure…
▽ More
We perform the first amplitude analysis of the singly Cabibbo-suppressed decays $D^+ \to π^+ π^{+(0)} π^{-(0)} η$, using $e^+e^-$ collision data collected with the BESIII detector at the center-of-mass energy of 3.773\,GeV, corresponding to an integrated luminosity of 20.3 $\rm{fb}^{-1}$. The absolute branching fractions of the $D^+ \to π^+ π^+ π^- η$ and $D^+ \to π^+ π^0 π^0 η$ decays are measured to be $(3.20\pm0.06_{\text{stat.}}\pm0.03_{\text{syst.}})\times 10^{-3}$ and $(2.43 \pm 0.11_{\text{stat.}} \pm 0.04_{\text{syst.}}) \times 10^{-3}$, respectively. % , both achieving three times better precision than the current PDG values. The decay process $D^{+}\to a_0(980)^{+}f_0(500)$ is observed for the first time with an unexpectedly large branching fraction. Moreover, we observe the decays $D^+ \to a_0(980)^{+(0)} ρ(770)^{0(+)}$ and measure the ratio $r_{+/0} \equiv \frac{\mathcal{B}(D^+ \to a_0(980)^+ ρ(770)^0)}{\mathcal{B}(D^+ \to a_0(980)^0 ρ(770)^+)}$ for the first time to be $0.55\pm0.08_{\text{stat.}}\pm0.05_{\text{syst.}}$. These results offer a novel insight into our comprehension of the nature of the $a_0(980)$ and $f_0(500)$ states.
△ Less
Submitted 15 April, 2026; v1 submitted 11 April, 2026;
originally announced April 2026.
-
MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis
Authors:
Congying Xu,
Hengcheng Zhu,
Songqiang Chen,
Jiarong Wu,
Valerio Terragni,
Shing-Chi Cheung
Abstract:
Metamorphic testing (MT) is a widely recognized technique for alleviating the oracle problem in software testing. However, its adoption is hindered by the difficulty of constructing effective metamorphic relations (MRs), which often require domain-specific or hard-to-obtain knowledge. In this work, we propose a novel approach that leverages the functional coupling between methods, which is readily…
▽ More
Metamorphic testing (MT) is a widely recognized technique for alleviating the oracle problem in software testing. However, its adoption is hindered by the difficulty of constructing effective metamorphic relations (MRs), which often require domain-specific or hard-to-obtain knowledge. In this work, we propose a novel approach that leverages the functional coupling between methods, which is readily available in source code, to automatically construct MRs and generate metamorphic test cases (MTCs). Our technique, MR-Coupler, identifies functionally coupled method pairs, employs large language models to generate candidate MTCs, and validates them through test amplification and mutation analysis. In particular, we leverage three functional coupling features to avoid expensive enumeration of possible method pairs, and a novel validation mechanism to reduce false alarms. Our evaluation of MR-Coupler on 100 human-written MTCs and 50 real-world bugs shows that it generates valid MTCs for over 90% of tasks, improves valid MTC generation by 64.90%, and reduces false alarms by 36.56% compared to baselines. Furthermore, the MTCs generated by MR-Coupler detect 44% of the real bugs. Our results highlight the effectiveness of leveraging functional coupling for automated MR construction and the potential of MR-Coupler to facilitate the adoption of MT in practice. We also released the tool and experimental data to support future research.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models
Authors:
Chao Xue,
Yao Wang,
Mengqiao Liu,
Di Liang,
Xingsheng Han,
Peiyang Liu,
Xianjie Wu,
Chenyao Lu,
Lei Jiang,
Yu Lu,
Haibo Shi,
Shuang Liang,
Minlong Peng,
Flora D. Salim
Abstract:
Supervised Fine-Tuning (SFT) is the standard approach for adapting large language models (LLMs) to downstream tasks. However, we observe a persistent failure mode: even after convergence, models often fail to correctly reproduce a subset of their own supervised training data. We refer to this behavior as the Incomplete Learning Phenomenon(ILP). This paper presents the first systematic study of ILP…
▽ More
Supervised Fine-Tuning (SFT) is the standard approach for adapting large language models (LLMs) to downstream tasks. However, we observe a persistent failure mode: even after convergence, models often fail to correctly reproduce a subset of their own supervised training data. We refer to this behavior as the Incomplete Learning Phenomenon(ILP). This paper presents the first systematic study of ILP in LLM fine-tuning. We formalize ILP as post-training failure to internalize supervised instances and demonstrate its prevalence across multiple model families, domains, and datasets. Through controlled analyses, we identify five recurrent sources of incomplete learning: (1) missing prerequisite knowledge in the pre-trained model, (2) conflicts between SFT supervision and pre-training knowledge, (3) internal inconsistencies within SFT data, (4) left-side forgetting during sequential fine-tuning, and (5) insufficient optimization for rare or complex patterns. We introduce a diagnostic-first framework that maps unlearned samples to these causes using observable training and inference signals, and study several targeted mitigation strategies as causal interventions. Experiments on Qwen, LLaMA, and OLMo2 show that incomplete learning is widespread and heterogeneous, and that improvements in aggregate metrics can mask persistent unlearned subsets. The findings highlight the need for fine-grained diagnosis of what supervised fine-tuning fails to learn, and why.
△ Less
Submitted 16 April, 2026; v1 submitted 11 April, 2026;
originally announced April 2026.
-
Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty
Authors:
Chao Xue,
Yao Wang,
Mengqiao Liu,
Di Liang,
Xingsheng Han,
Peiyang Liu,
Xianjie Wu,
Chenyao Lu,
Lei Jiang,
Yu Lu,
Haibo Shi,
Shuang Liang,
Minlong Peng,
Flora D. Salim
Abstract:
Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, existing implementations of GRM suffer from two critical limitations. First, CoT prompting is applied indiscriminately to all inputs regardless of their inherent complexity. This introduces unnecessary com…
▽ More
Recent advancements in the Generative Reward Model (GRM) have demonstrated its potential to enhance the reasoning abilities of LLMs through Chain-of-Thought (CoT) prompting. Despite these gains, existing implementations of GRM suffer from two critical limitations. First, CoT prompting is applied indiscriminately to all inputs regardless of their inherent complexity. This introduces unnecessary computational costs for tasks amenable to fast, direct inference. Second, existing approaches primarily rely on voting-based mechanisms to evaluate CoT outputs, which often lack granularity and precision in assessing reasoning quality. In this paper, we propose E-GRM, an efficient generative reward modeling framework grounded in model-internal uncertainty. E-GRM leverages the convergence behavior of parallel model generations to estimate uncertainty and selectively trigger CoT reasoning only when needed, without relying on handcrafted features or task-dependent signals. To improve reward fidelity, we introduce a lightweight discriminative scorer trained with a hybrid regression--ranking objective to provide fine-grained evaluation of reasoning paths. Experiments on multiple reasoning benchmarks show that E-GRM substantially reduces inference cost while consistently improving answer accuracy, demonstrating that model-internal uncertainty is an effective and general signal for efficient reasoning-aware reward modeling.
△ Less
Submitted 16 April, 2026; v1 submitted 11 April, 2026;
originally announced April 2026.
-
Impact of Intelligent Technologies on IoV Security: Integrating Edge Computing and AI
Authors:
Awais Bilal,
Kashif Sharif,
Liehuang Zhu,
Chang Xu,
Fan Li,
Sadaf Bukhari,
Sujit Biswas
Abstract:
The rapid development and integration of intelligent technologies in the Internet of Vehicles (IoV) have revolutionized transportation systems by enhancing connectivity, automation, and safety. However, the complexity and connectivity of IoV networks also introduce security challenges, including data privacy concerns, cyber threats, and system vulnerabilities. This paper surveys the role of Edge C…
▽ More
The rapid development and integration of intelligent technologies in the Internet of Vehicles (IoV) have revolutionized transportation systems by enhancing connectivity, automation, and safety. However, the complexity and connectivity of IoV networks also introduce security challenges, including data privacy concerns, cyber threats, and system vulnerabilities. This paper surveys the role of Edge Computing (EC), Machine Learning (ML), and Deep Learning (DL) in strengthening IoV security frameworks. It examines the synergy between these technologies, highlighting their individual capabilities and their collective impact on enhancing threat detection, response times, and adaptive security. Through real world case studies and practical deployments, we demonstrate how EC, ML, and DL are currently improving security and operational efficiency in IoV systems. The paper also identifies key research gaps and future directions for further advancements in IoV security, including the need for scalable, privacy preserving solutions and robust defense mechanisms against emerging cyber threats. By integrating EC, ML, and DL, this work lays the groundwork for developing adaptive, efficient, and resilient IoV security infrastructures capable of addressing evolving challenges in the transportation ecosystem.
△ Less
Submitted 11 April, 2026;
originally announced April 2026.
-
Conflicts Make Large Reasoning Models Vulnerable to Attacks
Authors:
Honghao Liu,
Chengjin Xu,
Xuhui Jiang,
Cehao Yang,
Shengming Yin,
Zhengwu Ma,
Lionel Ni,
Jian Guo
Abstract:
Large Reasoning Models (LRMs) have achieved remarkable performance across diverse domains, yet their decision-making under conflicting objectives remains insufficiently understood. This work investigates how LRMs respond to harmful queries when confronted with two categories of conflicts: internal conflicts that pit alignment values against each other and dilemmas, which impose mutually contradict…
▽ More
Large Reasoning Models (LRMs) have achieved remarkable performance across diverse domains, yet their decision-making under conflicting objectives remains insufficiently understood. This work investigates how LRMs respond to harmful queries when confronted with two categories of conflicts: internal conflicts that pit alignment values against each other and dilemmas, which impose mutually contradictory choices, including sacrificial, duress, agent-centered, and social forms. Using over 1,300 prompts across five benchmarks, we evaluate three representative LRMs - Llama-3.1-Nemotron-8B, QwQ-32B, and DeepSeek R1 - and find that conflicts significantly increase attack success rates, even under single-round non-narrative queries without sophisticated auto-attack techniques. Our findings reveal through layerwise and neuron-level analyses that safety-related and functional representations shift and overlap under conflict, interfering with safety-aligned behavior. This study highlights the need for deeper alignment strategies to ensure the robustness and trustworthiness of next-generation reasoning models. Our code is available at https://github.com/DataArcTech/ConflictHarm. Warning: This paper contains inappropriate, offensive and harmful content.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
Decoding Ancient Oracle Bone Script via Generative Dictionary Retrieval
Authors:
Yin Wu,
Gangjian Zhang,
Jiayu Chen,
Chang Xu,
Yuyu Luo,
Nan Tang,
Hui Xiong
Abstract:
Understanding humanity's earliest writing systems is crucial for reconstructing civilization's origins, yet many ancient scripts remain undeciphered. Oracle Bone Script (OBS) from China's Shang dynasty exemplifies this challenge: only approximately 1,500 of roughly 4,600 characters have been decoded, and a substantial portion of these 3,000-year-old inscriptions remains only partially understood.…
▽ More
Understanding humanity's earliest writing systems is crucial for reconstructing civilization's origins, yet many ancient scripts remain undeciphered. Oracle Bone Script (OBS) from China's Shang dynasty exemplifies this challenge: only approximately 1,500 of roughly 4,600 characters have been decoded, and a substantial portion of these 3,000-year-old inscriptions remains only partially understood. Limited by extreme data scarcity, existing computational methods achieve under 3% accuracy on unseen characters -- the core palaeographic challenge. We overcome this by reframing decipherment from classification to dictionary-based retrieval. Using deep learning guided by character evolution principles, we generate a comprehensive synthetic dictionary of plausible OBS variants for modern Chinese characters. Scholars query unknown inscriptions to retrieve visually similar candidates with transparent evidence, replacing algorithmic black boxes with interpretable hypotheses. Our approach achieves 54.3% Top-10 and 86.6% Top-50 accuracy for unseen characters. This scalable, transparent framework accelerates decipherment of a pivotal undeciphered script and establishes a generalizable methodology for AI-assisted archaeological discovery.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
Are Independently Estimated View Uncertainties Comparable? Unified Routing for Trusted Multi-View Classification
Authors:
Yilin Zhang,
Cai Xu,
Haishun Chen,
Ziyu Guan,
Wei Zhao
Abstract:
Trusted multi-view classification typically relies on a view-wise evidential fusion process: each view independently produces class evidence and uncertainty, and the final prediction is obtained by aggregating these independent opinions. While this design is modular and uncertainty-aware, it implicitly assumes that evidence from different views is numerically comparable. In practice, however, this…
▽ More
Trusted multi-view classification typically relies on a view-wise evidential fusion process: each view independently produces class evidence and uncertainty, and the final prediction is obtained by aggregating these independent opinions. While this design is modular and uncertainty-aware, it implicitly assumes that evidence from different views is numerically comparable. In practice, however, this assumption is fragile. Different views often differ in feature space, noise level, and semantic granularity, while independently trained branches are optimized only for prediction correctness, without any constraint enforcing cross-view consistency in evidence strength. As a result, the uncertainty used for fusion can be dominated by branch-specific scale bias rather than true sample-level reliability. To address this issue, we propose Trusted Multi-view learning with Unified Routing (TMUR), which decouples view-specific evidence extraction from fusion arbitration. TMUR uses view-private experts and one collaborative expert, and employs a unified router that observes the global multi-view context to generate sample-level expert weights. Soft load-balancing and diversity regularization further encourage balanced expert utilization and more discriminative expert specialization. We also provide theoretical analysis showing why independent evidential supervision does not identify a common cross-view evidence scale, and why unified global routing is preferable to branch-local arbitration when reliability is sample-dependent.
△ Less
Submitted 10 April, 2026;
originally announced April 2026.
-
Ground State Decay of the Three-Proton Emitter $^{17}$Na Reveals Isospin Symmetry Breaking
Authors:
X. -D. Xu,
I. Mukha,
Z. C. Xu,
S. M. Wang,
K. Y. Zhang,
L. Acosta,
E. Casarejos,
D. Cortina-Gil,
J. M. Espino,
A. Fomichev,
H. Geissel,
J. Gómez-Camacho,
L. V. Grigorenko,
O. Kiselev,
A. A. Korsheninnikov,
N. Kurz,
Yu. A. Litvinov,
I. Martel,
C. Nociforo,
M. Pfützner,
C. Rodríguez-Tajes,
C. Scheidenberger,
M. Stanoiu,
K. Sümmerer,
H. Weick
, et al. (2 additional authors not shown)
Abstract:
The spectrum of the exotic three-proton (3p) emitter $^{17}$Na has been studied by detecting all in-flight decay products. Derived from the measured angular correlations $^{14}$O+p+p+p, a resonant peak has been discovered at the 3p-decay energy of 2.24($^{+0.17}_{-0.25}$) MeV, which likely corresponds to the $^{17}$Na ground state. This decay energy value is significantly smaller than the previous…
▽ More
The spectrum of the exotic three-proton (3p) emitter $^{17}$Na has been studied by detecting all in-flight decay products. Derived from the measured angular correlations $^{14}$O+p+p+p, a resonant peak has been discovered at the 3p-decay energy of 2.24($^{+0.17}_{-0.25}$) MeV, which likely corresponds to the $^{17}$Na ground state. This decay energy value is significantly smaller than the previous experimental upper limit. Our measured $^{14}$O-p correlations stemming from the ground state decay have been quantitatively described by a sequential 1p-2p emission from a $^{17}$Na resonance via the intermediate $^{16}$Ne ground state, which allowed to derive the upper limit of $^{17}$Na ground-state width of 0.6 MeV. A dramatic systematic decrease in the mirror energy differences of mirror nuclei pairs has been observed at almost all 3p emitters with known proton separation energy (such as $^{31}$K, $^{20}$Al, and $^{17}$Na), in sharp contrast to the behavior in less exotic nuclei. Such a lowering effect indicates a general trend in evolution of nuclear structure for light to medium mass nuclei beyond the proton drip line, which is often associated with strong isospin symmetry breaking.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators
Authors:
Cong Li,
Chenhao Xue,
Yi Ren,
Xiping Dong,
Yu Cheng,
Yinbo Hu,
Fujun Bai,
Yixin Guo,
Xiping Jiang,
Qiang Wu,
Zhi Yang,
Zhe Cheng,
Yuan Xie,
Guangyu Sun
Abstract:
Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this emerging technology provides strong performance gains over existing hardware, current 3D-DRAM accelerators (3D-Accelerators) rely on closed-source evaluation tools…
▽ More
Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this emerging technology provides strong performance gains over existing hardware, current 3D-DRAM accelerators (3D-Accelerators) rely on closed-source evaluation tools, limiting access to publicly available performance analysis methods. Moreover, existing designs are highly customized for specific scenarios, lacking a general and reusable full-stack modeling for 3D-Accelerators across diverse usecases.
To bridge this fundamental gap, we present ATLAS, the first silicon-proven Architectural Three-dimesional-DRAM-based LLM Accelerator Simulation framework. Built on commercially deployed multi-layer 3D-DRAM technology, ATLAS introduces unified abstractions for both 3D-Accelerator system architecture and programming primitives to support arbitrary LLM inference scenarios. Validation against real silicon shows that ATLAS achieves $\le$8.57% simulation error and 97.26-99.96\% correlation with measured performance. Through design space exploration with ATLAS, we demonstrate its ability to guide architecture design and distill key takeaways for both 3D-DRAM memory system and 3D-Accelerator microarchitecture across scenarios. ATLAS will be open-sourced upon publication, enabling further research on 3D-Accelerators.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Rethinking Data Mixing from the Perspective of Large Language Models
Authors:
Yuanjian Xu,
Tianze Sun,
Changwei Xu,
XinLong Zhao,
Jianing Hao,
Ran Chen,
Yang Liu,
Ruijie Xu,
Stephen Chen,
Guang Zhang
Abstract:
Data mixing strategy is essential for large language model (LLM) training. Empirical evidence shows that inappropriate strategies can significantly reduce generalization. Although recent methods have improved empirical performance, several fundamental questions remain open: what constitutes a domain, whether human and model perceptions of domains are aligned, and how domain weighting influences ge…
▽ More
Data mixing strategy is essential for large language model (LLM) training. Empirical evidence shows that inappropriate strategies can significantly reduce generalization. Although recent methods have improved empirical performance, several fundamental questions remain open: what constitutes a domain, whether human and model perceptions of domains are aligned, and how domain weighting influences generalization. We address these questions by establishing formal connections between gradient dynamics and domain distributions, offering a theoretical framework that clarifies the role of domains in training dynamics. Building on this analysis, we introduce DoGraph, a reweighting framework that formulates data scheduling as a graph-constrained optimization problem. Extensive experiments on GPT-2 models of varying scales demonstrate that DoGraph consistently achieves competitive performance.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
Authors:
Shiwan Zhao,
Zhihu Wang,
Xuyang Zhao,
Jiaming Zhou,
Caiyue Xu,
Chenfei Liu,
Liting Zhang,
Yuhang Jia,
Yanzhe Zhang,
Hualong Yu,
Zichen Xu,
Qicheng Li,
Yong Qin
Abstract:
Post-training has become central to turning pretrained large language models (LLMs) into aligned, capable, and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet these methods are often discussed in fragmented ways, organized by label…
▽ More
Post-training has become central to turning pretrained large language models (LLMs) into aligned, capable, and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet these methods are often discussed in fragmented ways, organized by labels or objectives rather than by the behavioral bottlenecks they address. This survey argues that LLM post-training is best understood as structured intervention on model behavior. We organize the field first by trajectory provenance, which defines two primary regimes: off-policy learning on externally supplied trajectories and on-policy learning on learner-generated rollouts. We then interpret methods through two recurring roles -- effective support expansion, which makes useful behaviors more reachable, and policy reshaping, which improves behavior within already reachable regions -- together with a complementary systems-level role, behavioral consolidation, which preserves, transfers, and amortizes useful behavior across stages and model transitions. Under this view, SFT may serve either support expansion or policy reshaping; preference optimization is usually off-policy reshaping, though online variants move closer to learner-generated states. On-policy RL often improves behavior on learner-generated states, but stronger guidance can also make hard-to-reach reasoning paths reachable. Distillation is often better understood as consolidation rather than only compression, and hybrid pipelines emerge as coordinated multi-stage compositions. Overall, the framework helps diagnose post-training bottlenecks and reason about stage composition, suggesting that progress increasingly depends on coordinated systems design rather than any single dominant objective.
△ Less
Submitted 16 April, 2026; v1 submitted 9 April, 2026;
originally announced April 2026.
-
Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution
Authors:
Monishwaran Maheswaran,
Leon Lakhani,
Zhongzhu Zhou,
Shijia Yang,
Junxiong Wang,
Coleman Hooper,
Yuezhou Hu,
Rishabh Tiwari,
Jue Wang,
Harman Singh,
Qingyang Wu,
Yuqing Jian,
Ce Zhang,
Kurt Keutzer,
Tri Dao,
Xiaoxia Wu,
Ben Athiwaratkun,
James Zou,
Chenfeng Xu
Abstract:
We show that verifier-free evolution is bottlenecked by both diversity and efficiency: without external correction, repeated evolution accelerates collapse toward narrow modes, while the uniform use of a high-cost model wastes compute and quickly becomes economically impractical. We introduce Squeeze Evolve, a unified multi-model orchestration framework for verifier-free evolutionary inference. Ou…
▽ More
We show that verifier-free evolution is bottlenecked by both diversity and efficiency: without external correction, repeated evolution accelerates collapse toward narrow modes, while the uniform use of a high-cost model wastes compute and quickly becomes economically impractical. We introduce Squeeze Evolve, a unified multi-model orchestration framework for verifier-free evolutionary inference. Our approach is guided by a simple principle: allocate model capability where it has the highest marginal utility. Stronger models are reserved for high-impact stages, while cheaper models handle the other stages at much lower costs. This principle addresses diversity and cost-efficiency jointly while remaining lightweight. Squeeze Evolve naturally supports open-source, closed-source, and mixed-model deployments. Across AIME 2025, HMMT 2025, LiveCodeBench V6, GPQA-Diamond, ARC-AGI-V2, and multimodal vision benchmarks, such as MMMU-Pro and BabyVision, Squeeze Evolve consistently improves the cost-capability frontier over single-model evolution and achieves new state-of-the-art results on several tasks. Empirically, Squeeze Evolve reduces API cost by up to $\sim$3$\times$ and increases fixed-budget serving throughput by up to $\sim$10$\times$. Moreover, on discovery tasks, Squeeze Evolve is the first verifier-free evolutionary method to match, and in some cases exceed, the performance of verifier-based evolutionary methods.
△ Less
Submitted 10 April, 2026; v1 submitted 8 April, 2026;
originally announced April 2026.
-
GAN-based Domain Adaptation for Image-aware Layout Generation in Advertising Poster Design
Authors:
Chenchen Xu,
Min Zhou,
Tiezheng Ge,
Weiwei Xu
Abstract:
Layout plays a crucial role in graphic design and poster generation. Recently, the application of deep learning models for layout generation has gained significant attention. This paper focuses on using a GAN-based model conditioned on images to generate advertising poster graphic layouts, requiring a dataset of paired product images and layouts. To address this task, we introduce the Content-awar…
▽ More
Layout plays a crucial role in graphic design and poster generation. Recently, the application of deep learning models for layout generation has gained significant attention. This paper focuses on using a GAN-based model conditioned on images to generate advertising poster graphic layouts, requiring a dataset of paired product images and layouts. To address this task, we introduce the Content-aware Graphic Layout Dataset (CGL-Dataset), consisting of 60,548 paired inpainted posters with annotations and 121,000 clean product images. The inpainting artifacts introduce a domain gap between the inpainted posters and clean images. To bridge this gap, we design two GAN-based models. The first model, CGL-GAN, uses Gaussian blur on the inpainted regions to generate layouts. The second model combines unsupervised domain adaptation by introducing a GAN with a pixel-level discriminator (PD), abbreviated as PDA-GAN, to generate image-aware layouts based on the visual texture of input images. The PD is connected to shallow-level feature maps and computes the GAN loss for each input-image pixel. Additionally, we propose three novel content-aware metrics to assess the model's ability to capture the intricate relationships between graphic elements and image content. Quantitative and qualitative evaluations demonstrate that PDA-GAN achieves state-of-the-art performance and generates high-quality image-aware layouts.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles
Authors:
Yicheng Guo,
Jiaqi Liu,
Chengkai Xu,
Peng Hang,
Jian Sun
Abstract:
Autonomous vehicles in interactive traffic environments are often limited by the scarcity of safety-critical tail events in static datasets, which biases learned policies toward average-case behaviors and reduces robustness. Existing evaluation methods attempt to address this through adversarial stress testing, but are predominantly open-loop and post-hoc, making it difficult to incorporate discov…
▽ More
Autonomous vehicles in interactive traffic environments are often limited by the scarcity of safety-critical tail events in static datasets, which biases learned policies toward average-case behaviors and reduces robustness. Existing evaluation methods attempt to address this through adversarial stress testing, but are predominantly open-loop and post-hoc, making it difficult to incorporate discovered failures back into the training process. We introduce Evaluation as Evolution ($E^2$), a closed-loop framework that transforms adversarial generation from a static validation step into an adaptive evolutionary curriculum. Specifically, $E^2$ formulates adversarial scenario synthesis as transport-regularized sparse control over a learned reverse-time SDE prior. To make this high-dimensional generation tractable, we utilize topology-driven support selection to identify critical interacting agents, and introduce Topological Anchoring to stabilize the process. This approach enables the targeted discovery of failure cases while strictly constraining deviations from realistic data distributions. Empirically, $E^2$ improves collision failure discovery by 9.01% on the nuScenes dataset and up to 21.43% on the nuPlan dataset over the strongest baselines, while maintaining low invalidity and high realism. It further yields substantial robustness gains when the resulting boundary cases are recycled for closed-loop policy fine-tuning.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Microscopic evidence of spin-driven multiferroicity and topological spin textures in monolayer NiI2
Authors:
Haitao Wang,
Tianxing Jiang,
Weiyi Pan,
Xu Wang,
Hongyu Wang,
Junchao Tian,
Lianchuang Li,
Dongming Zhao,
Qingle Zhang,
Chenxi Wang,
Ying Yang,
Hongjun Xiang,
Changsong Xu,
Donglai Feng,
Tong Zhang
Abstract:
In type II multiferroics, noncollinear spin textures are expected to induce electric polarization directly, leading to strong magnetoelectric coupling. Realizing such spin driven multiferroicity in two-dimensional systems, and elucidating the interplay between local spins and electric polarization, are of both fundamental and technological importance. Here, using vectorial spin polarized scanning…
▽ More
In type II multiferroics, noncollinear spin textures are expected to induce electric polarization directly, leading to strong magnetoelectric coupling. Realizing such spin driven multiferroicity in two-dimensional systems, and elucidating the interplay between local spins and electric polarization, are of both fundamental and technological importance. Here, using vectorial spin polarized scanning tunneling microscopy, we investigated the spin-driven multiferroicity in monolayer NiI2 at atomic scale. We identify a canted spin-spiral state with fully determined spin rotation plane, accompanied by a 2Q charge modulation. At spin spiral domain walls, we discover topological spin textures that composed of meron/antimeron pairs. These textures are associated with distinct charge pattern and notable band shifts, indicating local bound charges induced by variations of ferroelectricity at domain wall. Our observations are well captured by a realistic spin model incorporating Kitaev interactions and generalized spin-current model of type II multiferroicity. The findings provide microscopic evidence of spin-driven multiferroicity in an extreme 2D system and establish a platform for low-dissipation, electric-field control of topological spin textures.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models
Authors:
Yuheng Shi,
Xiaohuan Pei,
Linfeng Wen,
Minjing Dong,
Chang Xu
Abstract:
MLLMs require high-resolution visual inputs for fine-grained tasks like document understanding and dense scene perception. However, current global resolution scaling paradigms indiscriminately flood the quadratic self-attention mechanism with visually redundant tokens, severely bottlenecking inference throughput while ignoring spatial sparsity and query intent. To overcome this, we propose Q-Zoom,…
▽ More
MLLMs require high-resolution visual inputs for fine-grained tasks like document understanding and dense scene perception. However, current global resolution scaling paradigms indiscriminately flood the quadratic self-attention mechanism with visually redundant tokens, severely bottlenecking inference throughput while ignoring spatial sparsity and query intent. To overcome this, we propose Q-Zoom, a query-aware adaptive high-resolution perception framework that operates in an efficient coarse-to-fine manner. First, a lightweight Dynamic Gating Network safely bypasses high-resolution processing when coarse global features suffice. Second, for queries demanding fine-grained perception, a Self-Distilled Region Proposal Network (SD-RPN) precisely localizes the task-relevant Region-of-Interest (RoI) directly from intermediate feature spaces. To optimize these modules efficiently, the gating network uses a consistency-aware generation strategy to derive deterministic routing labels, while the SD-RPN employs a fully self-supervised distillation paradigm. A continuous spatio-temporal alignment scheme and targeted fine-tuning then seamlessly fuse the dense local RoI with the coarse global layout. Extensive experiments demonstrate that Q-Zoom establishes a dominant Pareto frontier. Using Qwen2.5-VL-7B as a primary testbed, Q-Zoom accelerates inference by 2.52 times on Document & OCR benchmarks and 4.39 times in High-Resolution scenarios while matching the baseline's peak accuracy. Furthermore, when configured for maximum perceptual fidelity, Q-Zoom surpasses the baseline's peak performance by 1.1% and 8.1% on these respective benchmarks. These robust improvements transfer seamlessly to Qwen3-VL, LLaVA, and emerging RL-based thinking-with-image models. Project page is available at https://yuhengsss.github.io/Q-Zoom/.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Free Surface Enhancement of Droplet Rupture by Cavitation Bubble Collapse
Authors:
Chenghao Xu,
Zhengyu Yang,
Jie Feng
Abstract:
The interaction between cavitation bubbles and surrounding droplets plays a central role in applications such as surface cleaning, ultrasonic emulsification, and therapeutic delivery. These processes depend on bubble-driven microjets that drive the deformation and breakup of the droplets, which are significantly influenced by geometric confinements. Here, we investigate the hydrodynamic interactio…
▽ More
The interaction between cavitation bubbles and surrounding droplets plays a central role in applications such as surface cleaning, ultrasonic emulsification, and therapeutic delivery. These processes depend on bubble-driven microjets that drive the deformation and breakup of the droplets, which are significantly influenced by geometric confinements. Here, we investigate the hydrodynamic interaction between cavitation bubbles and oil droplets within a thin water layer considering the coupling confinements of a free surface and a rigid wall. We reveal two distinct regimes of droplet response to cavitation bubble collapse: the rupture regime, where oil droplets fragment into smaller droplets, and the no-rupture regime, where the droplet remains intact. By deriving a non-dimensional Kelvin impulse to represent the momentum of the bubble-induced jet, we establish a scaling law that correlates the criterion for droplet rupture to a characteristic Weber number and the bubble-to-droplet size ratio for the first time. This framework delineates the rupture boundary and even extends to predict the rupture of particle-laden droplets driven by cavitation bubbles. Our findings reveal the hydrodynamic principles underlying the cavitation bubble-driven droplet rupture and provide predictive criteria for controlling performance in engineering and biomedical systems involving cavitation bubble dynamics.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Precise measurement of the CKM angle $γ$ with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from…
▽ More
A measurement of the CKM angle $γ$ is performed by applying a novel, unbinned, model-independent approach to datasets of electron-positron collisions collected by the BESIII experiment and proton-proton collisions by the LHCb experiment, corresponding to integrated luminosities of 8 fb$^{-1}$ and 9 fb$^{-1}$, respectively. The $C\!P$-violating phase $γ$ is determined from ${B^{\pm}\rightarrow D(\rightarrow K_{\rm S}^{0} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays in LHCb data, where $h^{(\prime)}$ is either a pion or kaon, while the corresponding strong-phase parameters are measured using doubly tagged ${D\rightarrow K_{\rm S/L}^0 h^{\prime+} h^{\prime-}}$ decays in the quantum-correlated $D\overline{D}$ system present in BESIII data. A joint fit to both datasets, which allows for a simultaneous determination of the associated $C\!P$-violating observables and strong-phase parameters, yields ${γ= (71.3\pm 5.0)^{\circ}}$. The result is the most precise to date and consistent with previous measurements and world averages.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Measurement of the CKM angle $γ$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach
Authors:
The BESIII,
LHCb Collaborations,
:,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
C. S. Akondi,
R. Aliberti,
A. Amoroso,
Q. An,
Y. H. An,
Y. Bai,
O. Bakina,
H. R. Bao,
X. L. Bao,
M. Barbagiovanni,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. B. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco
, et al. (1936 additional authors not shown)
Abstract:
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider…
▽ More
A measurement of the CKM angle $γ$ and related strong-phase parameters is performed using a novel, model-independent approach in ${B^{\pm}\rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-}) h^{\pm}}$ decays, where $h^{(\prime)} \equiv π, K$. The analysis uses a joint data sample of electron-positron collisions collected by the BESIII experiment at the Beijing Electron-Positron Collider II during 2010--2011 and 2021--2022, corresponding to an integrated luminosity of 8 fb$^{-1}$, and proton-proton collisions collected by the LHCb experiment at the Large Hadron Collider during 2011--2018, corresponding to an integrated luminosity of 9 fb$^{-1}$. The two datasets are analyzed simultaneously by applying per-event weights based on the amplitude variation over the $D$-decay phase space to enhance the sensitivity to $C\!P$-violating observables. The CKM angle $γ$ is determined to be $γ= (71.3\pm 5.0)^{\circ}$, which constitutes the most precise single measurement to date.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening
Authors:
Chenyu Xue,
Yiran Liu,
Mian Zhou,
Jionglong Su,
Zhixiang Lu
Abstract:
Medical image segmentation driven by free-text clinical instructions is a critical frontier in computer-aided diagnosis. However, existing multimodal and foundation models struggle with the semantic ambiguity of clinical reports and fail to disambiguate complex anatomical overlaps in low-contrast scans. Furthermore, fully fine-tuning these massive architectures on limited medical datasets invariab…
▽ More
Medical image segmentation driven by free-text clinical instructions is a critical frontier in computer-aided diagnosis. However, existing multimodal and foundation models struggle with the semantic ambiguity of clinical reports and fail to disambiguate complex anatomical overlaps in low-contrast scans. Furthermore, fully fine-tuning these massive architectures on limited medical datasets invariably leads to severe overfitting. To address these challenges, we propose a novel Semantic-Topological Graph Reasoning (STGR) framework for language-guided pulmonary screening. Our approach elegantly synergizes the reasoning capabilities of large language models (LLaMA-3-V) with the zero-shot delineation of vision foundation models (MedSAM). Specifically, we introduce a Text-to-Vision Intent Distillation (TVID) module to extract precise diagnostic guidance. To resolve anatomical ambiguity, we formulate mask selection as a dynamic graph reasoning problem, where candidate lesions are modeled as nodes and edges capture spatial and semantic affinities. To ensure deployment feasibility, we introduce a Selective Asymmetric Fine-Tuning (SAFT) strategy that updates less than 1% of the parameters. Rigorous 5-fold cross-validation on the LIDC-IDRI and LNDb datasets demonstrates that our framework establishes a new state-of-the-art. Notably, it achieves an 81.5% Dice Similarity Coefficient (DSC) on LIDC-IDRI, outperforming leading LLM-based tools like LISA by over 5%. Crucially, our SAFT strategy acts as a powerful regularizer, yielding exceptional cross-fold stability (0.6% DSC variance) and paving the way for robust, context-aware clinical deployment.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Quasi-stationary Slice Detection-Based Robust Respiration Rate Estimation under Large-scale Random Body Movement
Authors:
Chendong Xu,
Shuai Yao,
Haoying Bao,
Chiyuan Ma,
Qisong Wu
Abstract:
Radar-based non-contact respiration rate (RR) measurement has become increasingly popular due to its convenience, non-intrusiveness, and low cost. However, it is still quite challenging to accurately acquire vital signs estimation in complex measurement scenarios with large-scale random body movements (RBM), particularly for RR estimation due to strong low-frequency interferences. To cope with the…
▽ More
Radar-based non-contact respiration rate (RR) measurement has become increasingly popular due to its convenience, non-intrusiveness, and low cost. However, it is still quite challenging to accurately acquire vital signs estimation in complex measurement scenarios with large-scale random body movements (RBM), particularly for RR estimation due to strong low-frequency interferences. To cope with the RBM challenge in RR estimation, we propose a novel two-stage RR estimation scheme involving detecting the portion of signals, called as quasi-stationary slices, exhibiting the quasi-stationary pattern. At the detection stage, an enhanced deep neural network framework incorporating the dynamic snake convolution is exploited to detect the quasi-stationary slices in the micro-Doppler spectra. At the estimation stage, we mitigate RBM interferences and achieve accurate RR estimation by only using the portion of ridges consistent with the location of detected quasi-stationary slice. Extensive experimental results demonstrate that our proposed scheme can accurately detect quasi-stationary slices under normal scenarios with large-scale RBM, thereby reducing the error of subsequent RR estimation.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation
Authors:
Zhixiang Lu,
Chong Zhang,
Chenyu Xue,
Angelos Stefanidis,
Chong Li,
Jionglong Su,
Zhengyong Jiang
Abstract:
Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such a…
▽ More
Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such as Lao, Burmese, and Tagalog with persistently low-quality translation systems despite recent advances in large multilingual models. We introduce \textbf{M}ultilingual \textbf{E}xpert-\textbf{R}eward \textbf{I}nformed \textbf{T}uning (\textbf{MERIT}), a unified translation framework that transforms the traditional English-centric ALT benchmark into a Chinese-centric evaluation suite for five Southeast Asian low-resource languages (LRLs). Our framework combines language-specific token prefixing (LTP) with supervised fine-tuning (SFT) and a novel group relative policy optimization (GRPO) guided by the semantic alignment reward (SAR). These results confirm that, in LRL{\textrightarrow}Chinese translation, targeted data curation and reward-guided optimization dramatically outperform mere model scaling.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection
Authors:
Cheng Xu,
Changhong Jin,
Yingjie Niu,
Nan Yan,
Yuke Mei,
Shuhao Guan,
Liming Chen,
M-Tahar Kechadi
Abstract:
The rapid development of Large Language Models (LLMs) has transformed fake news detection and fact-checking tasks from simple classification to complex reasoning. However, evaluation frameworks have not kept pace. Current benchmarks are static, making them vulnerable to benchmark data contamination (BDC) and ineffective at assessing reasoning under temporal uncertainty. To address this, we introdu…
▽ More
The rapid development of Large Language Models (LLMs) has transformed fake news detection and fact-checking tasks from simple classification to complex reasoning. However, evaluation frameworks have not kept pace. Current benchmarks are static, making them vulnerable to benchmark data contamination (BDC) and ineffective at assessing reasoning under temporal uncertainty. To address this, we introduce LiveFact a continuously updated benchmark that simulates the real-world "fog of war" in misinformation detection. LiveFact uses dynamic, temporal evidence sets to evaluate models on their ability to reason with evolving, incomplete information rather than on memorized knowledge. We propose a dual-mode evaluation: Classification Mode for final verification and Inference Mode for evidence-based reasoning, along with a component to monitor BDC explicitly. Tests with 22 LLMs show that open-source Mixture-of-Experts models, such as Qwen3-235B-A22B, now match or outperform proprietary state-of-the-art systems. More importantly, our analysis finds a significant "reasoning gap." Capable models exhibit epistemic humility by recognizing unverifiable claims in early data slices-an aspect traditional static benchmarks overlook. LiveFact sets a sustainable standard for evaluating robust, temporally aware AI verification.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
Authors:
Bin Wang,
Tianyao He,
Linke Ouyang,
Fan Wu,
Zhiyuan Zhao,
Tao Chu,
Yuan Qu,
Zhenjiang Jin,
Weijun Zeng,
Ziyang Miao,
Bangrui Xu,
Junbo Niu,
Mengzhang Cai,
Jiantao Qiu,
Qintong Zhang,
Dongsheng Ma,
Yuefeng Sun,
Hejun Dong,
Wenzheng Zhang,
Jutao Xiao,
Jiayong Shi,
Pengyu Liao,
Xiaomeng Zhao,
Huaping Zhong,
Liqun Wei
, et al. (18 additional authors not shown)
Abstract:
Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training…
▽ More
Current document parsing methods advance primarily through model architecture innovation, while systematic engineering of training data remains underexplored. Yet state-of-the-art models spanning diverse architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than from architectural differences. Building on this finding, we present MinerU2.5-Pro, which advances the state of the art purely through data engineering and training strategy design while retaining the 1.2B-parameter architecture of MinerU2.5 unchanged. At its core is a Data Engine co-designed around coverage, informativeness, and annotation accuracy: Diversity-and-Difficulty-Aware Sampling expands training data from under 10M to 65.5M samples while mitigating distribution shift; Cross-Model Consistency Verification leverages output consensus among heterogeneous models to assess sample difficulty and generate reliable annotations; the Judge-and-Refine pipeline improves annotation quality for hard samples through render-then-verify iterative correction. A three-stage progressive training strategy--large-scale pre-training, hard sample fine-tuning, and GRPO alignment--sequentially exploits these data at different quality tiers. On the evaluation front, we rectify element-matching biases in OmniDocBench v1.5 and introduce a Hard subset, establishing the more discriminative OmniDocBench v1.6 protocol. Without any architectural modification, MinerU2.5-Pro achieves 95.69 on OmniDocBench v1.6, improving over the same-architecture baseline by 2.71 points and surpassing all existing methods, including those based on models with over 200x more parameters.
△ Less
Submitted 9 April, 2026; v1 submitted 6 April, 2026;
originally announced April 2026.
-
Tighter entropic uncertainty relations in the presence of quantum memories for complete sets of mutually unbiased bases
Authors:
Qing-Hua Zhang,
Cong Xu,
Jing-Feng Wu,
Shao-Ming Fei
Abstract:
Entropic uncertainty relations provide an information-theoretic framework for quantifying the fundamental indeterminacy inherent in quantum mechanics. We propose more stringent quantum-memory-assisted entropic uncertainty relations for complete sets of mutually unbiased bases in multipartite scenarios. We present lower and upper bounds of the quantum uncertainties based on the complementarity of t…
▽ More
Entropic uncertainty relations provide an information-theoretic framework for quantifying the fundamental indeterminacy inherent in quantum mechanics. We propose more stringent quantum-memory-assisted entropic uncertainty relations for complete sets of mutually unbiased bases in multipartite scenarios. We present lower and upper bounds of the quantum uncertainties based on the complementarity of the observables, the purity of the measured state, the (conditional) von-Neumann entropies, the Holevo quantities and mutual information. The results are illustrated by several representative cases, showing that our bounds are tighter than and outperform previously existing bounds.
△ Less
Submitted 5 April, 2026;
originally announced April 2026.
-
Direct Photocurrent Detection of Optical Vortex Based on the Orbital Photo Galvanic Effect: Progress, Challenge and Perspective
Authors:
Jinluo Cheng,
Dehong Yang,
Weiming Wang,
Chang Xu,
Zipu Fan,
Dong Sun
Abstract:
A photodetector that can directly distinguish the orbital angular momentum (OAM) of light is highly desirable for integrated on-chip OAM detection and focal plane array devices. The recent development of OAM detectors based on the intrinsic orbital photo galvanic effects (OPGE) of materials provide a new route for direct OAM detection that is on-chip scalable with high resolution and speed. In thi…
▽ More
A photodetector that can directly distinguish the orbital angular momentum (OAM) of light is highly desirable for integrated on-chip OAM detection and focal plane array devices. The recent development of OAM detectors based on the intrinsic orbital photo galvanic effects (OPGE) of materials provide a new route for direct OAM detection that is on-chip scalable with high resolution and speed. In this paper, we summarize the current progress in direct photodetection of OAM via OPGE. We begin with a short review of the basic operation scheme of the OAM detector and provide a comprehensive symmetry analysis to sort out the favorable characteristics of the materials, incorporating considerations from device schemes based on various device performance characteristics and specific application circumstances. From that, we review the current experimental progress and technical challenges, then oversee the possible solutions to these challenges and provide a perspective on the future opportunities of this OAM detection route.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
Authors:
Zeyu Wang,
Xiaogang Li,
Peiyao Xiao,
Qinhao Kong,
Ben Wang,
Chengliang Xu,
Zichao Chen,
Bing Zhao,
Hu Wei
Abstract:
Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information extraction rather than the global structural logic inherent in formal scientific notations. In this work, we introduce FeynmanBench, the f…
▽ More
Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information extraction rather than the global structural logic inherent in formal scientific notations. In this work, we introduce FeynmanBench, the first benchmark centered on Feynman diagram tasks. It is designed to evaluate AI's capacity for multistep diagrammatic reasoning, which requires satisfying conservation laws and symmetry constraints, identifying graph topology, converting between diagrammatic and algebraic representations, and constructing scattering amplitudes under specific conventions and gauges. To support large-scale and reproducible evaluation, we developed an automated pipeline producing diverse Feynman diagrams along with verifiable topological annotations and amplitude results. Our database spans the electromagnetic, weak, and strong interactions of the Standard Model, encompasses over 100 distinct types and includes more than 2000 tasks. Experiments on state-of-the-art MLLMs reveal systematic failure modes, including unstable enforcement of physical constraints and violations of global topological conditions, highlighting the need for physics-grounded benchmarks for visual reasoning over scientific notation. FeynmanBench provides a logically rigorous test of whether AI can effectively engage in scientific discovery, particularly within theoretical physics.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
Region-Based Constellation Designs for Constructive Interference Precoding in MU-MIMO
Authors:
Yupeng Zheng,
Chunmei Xu,
Jinfei Wang,
Yi Ma,
Rahim Tafazolli
Abstract:
The performance of constructive interference precoding (CIP) for multi-user multi-antenna (MU-MIMO) systems is governed by the structure of the constructive interference (CI) regions, yet this is overlooked in conventional constellation design. This work proposes the region-based constellation (RBC) model to lay the foundation for CIP constellation design. An RBC directly defines the mapping betwe…
▽ More
The performance of constructive interference precoding (CIP) for multi-user multi-antenna (MU-MIMO) systems is governed by the structure of the constructive interference (CI) regions, yet this is overlooked in conventional constellation design. This work proposes the region-based constellation (RBC) model to lay the foundation for CIP constellation design. An RBC directly defines the mapping between messages and their feasible regions, instead of deriving them from an existing constellation. To provide insight for RBC design, we study the limitations of quadrature-amplitude-modulation (QAM)-based CIP. Analytical results show that the restrictive CI regions of QAM symbols are systematically misaligned with the objective-minimising sign pattern, resulting in a significant gap to the theoretical performance limit. From the perspective of improving sign alignment, two novel RBC schemes with non-convex feasible regions are proposed, namely mirrored-ends QAM (ME-QAM) and real-extended ME-QAM. A low-complexity algorithm is also developed for the resulting mixed-integer quadratic program, achieving a complexity comparable to QAM-based CIP. Simulation results with constellation sizes $\{16,64\}$ demonstrate up to $4$~dB signal-to-noise-ratio gain of the proposed schemes over QAM-based CIP. The proposed RBC model is also applicable to other systems with non-bijective modulation, representing a promising direction for future research.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction
Authors:
Wuqi Su,
Huilun Song,
Chen Zhao,
Chi Xu
Abstract:
Monocular depth estimation from a single RGB image remains a fundamental challenge in computer vision due to inherent scale ambiguity and the absence of explicit geometric cues. Existing approaches typically rely on increasingly complex network architectures to regress depth maps, which escalates training costs and computational overhead without fully exploiting inter-pixel spatial dependencies. W…
▽ More
Monocular depth estimation from a single RGB image remains a fundamental challenge in computer vision due to inherent scale ambiguity and the absence of explicit geometric cues. Existing approaches typically rely on increasingly complex network architectures to regress depth maps, which escalates training costs and computational overhead without fully exploiting inter-pixel spatial dependencies. We propose a multilevel perceptual conditional random field (CRF) model built upon the Swin Transformer backbone that addresses these limitations through three synergistic innovations: (1) an adaptive hybrid pyramid feature fusion (HPF) strategy that captures both short-range and long-range dependencies by combining multi-scale spatial pyramid pooling with biaxial feature aggregation, enabling effective integration of global and local contextual information; (2) a hierarchical awareness adapter (HA) that enriches cross-level feature interactions within the encoder through lightweight broadcast modules with learnable dimensional scaling, reducing computational complexity while enhancing representational capacity; and (3) a fully-connected CRF decoder with dynamic scaling attention that models fine-grained pixel-level spatial relationships, incorporating a bias learning unit to prevent extreme-value collapse and ensure stable training. Extensive experiments on NYU Depth v2, KITTI, and MatterPort3D datasets demonstrate that our method achieves state-of-the-art performance, reducing Abs Rel to 0.088 ($-$7.4\%) and RMSE to 0.316 ($-$5.4\%) on NYU Depth v2, while attaining near-perfect threshold accuracy ($δ< 1.25^3 \approx 99.8\%$) on KITTI with only 194M parameters and 21ms inference time.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
Authors:
Zhangyun Tan,
Zeliang Zhang,
Susan Liang,
Yolo Yunlong Tang,
Lisha Chen,
Chenliang Xu
Abstract:
VLMs trained on web-scale data retain sensitive and copyrighted visual concepts that deployment may require removing. Training-based unlearning methods share a structural flaw: fine-tuning on a narrow forget set degrades general capabilities before unlearning begins, making it impossible to attribute subsequent performance drops to the unlearning procedure itself. Training-free approaches sidestep…
▽ More
VLMs trained on web-scale data retain sensitive and copyrighted visual concepts that deployment may require removing. Training-based unlearning methods share a structural flaw: fine-tuning on a narrow forget set degrades general capabilities before unlearning begins, making it impossible to attribute subsequent performance drops to the unlearning procedure itself. Training-free approaches sidestep this by suppressing concepts through prompts or system instructions, but no rigorous benchmark exists for evaluating them on visual tasks.
We introduce VLM-UnBench, the first benchmark for training-free visual concept unlearning in VLMs. It covers four forgetting levels, 7 source datasets, and 11 concept axes, and pairs a three-level probe taxonomy with five evaluation conditions to separate genuine forgetting from instruction compliance. Across 8 evaluation settings and 13 VLM configurations, realistic unlearning prompts leave forget accuracy near the no-instruction baseline; meaningful reductions appear only under oracle conditions that disclose the target concept to the model. Object and scene concepts are the most resistant to suppression, and stronger instruction-tuned models remain capable despite explicit forget instructions. These results expose a clear gap between prompt-level suppression and true visual concept erasure.
△ Less
Submitted 3 April, 2026;
originally announced April 2026.
-
JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency
Authors:
Aichen Cai,
Anmeng Zhang,
Anyu Li,
Bo Zhang,
Bohua Cai,
Chang Li,
Changjian Jiang,
Changkai Lu,
Chao Xue,
Chaocai Liang,
Cheng Zhang,
Dongkai Liu,
Fei Wang,
Guoqiang Huang,
Haijian Ke,
Han Lin,
Hao Wang,
Ji Miao,
Jiacheng Zhang,
Jialong Shi,
Jifeng Zhu,
Jingjing Qian,
Junhui Luo,
Junwu Xiong,
Lam So
, et al. (44 additional authors not shown)
Abstract:
We introduce JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model designed to redefine the trade-off between strong performance and token efficiency in the sub-50B parameter regime. JoyAI-LLM Flash is pretrained on a massive corpus of 20 trillion tokens and further optimized through a rigorous post-training pipeline, including supervised fine-tuning (SFT), Direct Preference Optimi…
▽ More
We introduce JoyAI-LLM Flash, an efficient Mixture-of-Experts (MoE) language model designed to redefine the trade-off between strong performance and token efficiency in the sub-50B parameter regime. JoyAI-LLM Flash is pretrained on a massive corpus of 20 trillion tokens and further optimized through a rigorous post-training pipeline, including supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and large-scale reinforcement learning (RL) across diverse environments. To improve token efficiency, JoyAI-LLM Flash strategically balances \emph{thinking} and \emph{non-thinking} cognitive modes and introduces FiberPO, a novel RL algorithm inspired by fibration theory that decomposes trust-region maintenance into global and local components, providing unified multi-scale stability control for LLM policy optimization. To enhance architectural sparsity, the model comprises 48B total parameters while activating only 2.7B parameters per forward pass, achieving a substantially higher sparsity ratio than contemporary industry leading models of comparable scale. To further improve inference throughput, we adopt a joint training-inference co-design that incorporates dense Multi-Token Prediction (MTP) and Quantization-Aware Training (QAT). We release the checkpoints for both JoyAI-LLM-48B-A3B Base and its post-trained variants on Hugging Face to support the open-source community.
△ Less
Submitted 8 April, 2026; v1 submitted 3 April, 2026;
originally announced April 2026.