-
Decomposing Communication Gain and Delay Cost Under Cross-Timestep Delays in Cooperative Multi-Agent Reinforcement Learning
Authors:
Zihong Gao,
Hongjian Liang,
Lei Hao,
Liangjun Ke
Abstract:
Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed.
We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG…
▽ More
Communication is essential for coordination in \emph{cooperative} multi-agent reinforcement learning under partial observability, yet \emph{cross-timestep} delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed.
We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG) and decompose a message's effect into \emph{communication gain} and \emph{delay cost}, yielding the Communication Gain and Delay Cost (CGDC) metric.
We further establish a value-loss bound showing that the degradation induced by delayed messages is upper-bounded by a discounted accumulation of an information gap between the action distributions induced by timely versus delayed messages.
Guided by CGDC, we propose \textbf{CDCMA}, an actor--critic framework that requests messages only when predicted CGDC is positive, predicts future observations to reduce misalignment at consumption, and fuses delayed messages via CGDC-guided attention.
Experiments on no-teammate-vision variants of Cooperative Navigation and Predator Prey, and on SMAC maps across multiple delay levels show consistent improvements in performance, robustness, and generalization, with ablations validating each component.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Authors:
Boqiang Zhang,
Lei Ke,
Ruihan Yang,
Qi Gao,
Tianyuan Qu,
Rossell Chen,
Dong Yu,
Leoweiliang
Abstract:
Vision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this work, we explore the performance limits of compact (e.g., 2B and 8B) VLMs. We challenge the prevailing practice that state-of-the-art VLMs must rely on vision encoders initialized via massive contrastive pr…
▽ More
Vision Language Model (VLM) development has largely relied on scaling model size, which hinders deployment on compute-constrained mobile and edge devices such as smartphones and robots. In this work, we explore the performance limits of compact (e.g., 2B and 8B) VLMs. We challenge the prevailing practice that state-of-the-art VLMs must rely on vision encoders initialized via massive contrastive pretraining (e.g., CLIP/SigLIP). We identify an objective mismatch: contrastive learning, optimized for discrimination, enforces coarse and category-level invariances that suppress fine-grained visual cues needed for dense captioning and complex VLM reasoning. To address this issue, we present Penguin-VL, whose vision encoder is initialized from a text-only LLM. Our experiments reveal that Penguin-Encoder serves as a superior alternative to traditional contrastive pretraining, unlocking a higher degree of visual fidelity and data efficiency for multimodal understanding. Across various image and video benchmarks, Penguin-VL achieves performance comparable to leading VLMs (e.g., Qwen3-VL) in mathematical reasoning and surpasses them in tasks such as document understanding, visual knowledge, and multi-perspective video understanding. Notably, these gains are achieved with a lightweight architecture, demonstrating that improved visual representation rather than model scaling is the primary driver of performance. Our ablations show that Penguin-Encoder consistently outperforms contrastive-pretrained encoders, preserving fine-grained spatial and temporal cues that are critical for dense perception and complex reasoning. This makes it a strong drop-in alternative for compute-efficient VLMs and enables high performance in resource-constrained settings. Code: https://github.com/tencent-ailab/Penguin-VL
△ Less
Submitted 14 March, 2026; v1 submitted 6 March, 2026;
originally announced March 2026.
-
JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments
Authors:
Zhan Liu,
Changli Tang,
Yuxin Wang,
Zhiyuan Zhu,
Youjun Chen,
Yiwen Shao,
Tianzi Wang,
Lei Ke,
Zengrui Jin,
Chao Zhang
Abstract:
Current audio-visual large language models (AV-LLMs) are predominantly restricted to 2D perception, relying on RGB video and monaural audio. This design choice introduces a fundamental dimensionality mismatch that precludes reliable source localization and spatial reasoning in complex 3D environments. We address this limitation by presenting JAEGER, a framework that extends AV-LLMs to 3D space, to…
▽ More
Current audio-visual large language models (AV-LLMs) are predominantly restricted to 2D perception, relying on RGB video and monaural audio. This design choice introduces a fundamental dimensionality mismatch that precludes reliable source localization and spatial reasoning in complex 3D environments. We address this limitation by presenting JAEGER, a framework that extends AV-LLMs to 3D space, to enable joint spatial grounding and reasoning through the integration of RGB-D observations and multi-channel first-order ambisonics. A core contribution of our work is the neural intensity vector (Neural IV), a learned spatial audio representation that encodes robust directional cues to enhance direction-of-arrival estimation, even in adverse acoustic scenarios with overlapping sources. To facilitate large-scale training and systematic evaluation, we propose SpatialSceneQA, a benchmark of 61k instruction-tuning samples curated from simulated physical environments. Extensive experiments demonstrate that our approach consistently surpasses 2D-centric baselines across diverse spatial perception and reasoning tasks, underscoring the necessity of explicit 3D modelling for advancing AI in physical environments. Our source code, pre-trained model checkpoints and datasets will be released upon acceptance.
△ Less
Submitted 19 February, 2026;
originally announced February 2026.
-
DCTracks: An Open Dataset for Machine Learning-Based Drift Chamber Track Reconstruction
Authors:
Qian Liyan,
Zhang Yao,
Yuan Ye,
Zhang Zhaoke,
Fang Jin,
Jiang Shimiao,
Zhang Jin,
Li Ke,
Liu Beijiang,
Xu Chenglin,
Zhang Yifan,
Jia Xiaoqian,
Qin Xiaoshuai,
Huang Xingtao
Abstract:
We introduce a Monte Carlo (MC) dataset of single- and two-track drift chamber events to advance Machine Learning (ML)-based track reconstruction. To enable standardized and comparable evaluation, we define track reconstruction specific metrics and report results for traditional track reconstruction algorithms and a Graph Neural Networks (GNNs) method, facilitating rigorous, reproducible validatio…
▽ More
We introduce a Monte Carlo (MC) dataset of single- and two-track drift chamber events to advance Machine Learning (ML)-based track reconstruction. To enable standardized and comparable evaluation, we define track reconstruction specific metrics and report results for traditional track reconstruction algorithms and a Graph Neural Networks (GNNs) method, facilitating rigorous, reproducible validation for future research.
△ Less
Submitted 16 February, 2026;
originally announced February 2026.
-
InjectRBP: Steering Large Language Model Reasoning Behavior via Pattern Injection
Authors:
Xiuping Wu,
Zhao Yu,
Yuxin Cheng,
Ngai Wong,
Liangjun Ke,
Tapas Mishra,
Konstantinos V. Katsikopoulos
Abstract:
Reasoning can significantly enhance the performance of Large Language Models. While recent studies have exploited behavior-related prompts adjustment to enhance reasoning, these designs remain largely intuitive and lack a systematic analysis of the underlying behavioral patterns. Motivated by this, we investigate how models' reasoning behaviors shape reasoning from the perspective of behavioral pa…
▽ More
Reasoning can significantly enhance the performance of Large Language Models. While recent studies have exploited behavior-related prompts adjustment to enhance reasoning, these designs remain largely intuitive and lack a systematic analysis of the underlying behavioral patterns. Motivated by this, we investigate how models' reasoning behaviors shape reasoning from the perspective of behavioral patterns. We observe that models exhibit adaptive distributions of reasoning behaviors when responding to specific types of questions, and that structurally injecting these patterns can substantially influence the quality of the models' reasoning processes and outcomes. Building on these findings, we propose two optimization methods that require no parameter updates: InjectCorrect and InjectRLOpt. InjectCorrect guides the model by imitating behavioral patterns derived from its own past correct answers. InjectRLOpt learns a value function from historical behavior-pattern data and, via our proposed Reliability-Aware Softmax Policy, generates behavioral injectant during inference to steer the reasoning process. Our experiments demonstrate that both methods can improve model performance across various reasoning tasks without requiring any modifications to model parameters, achieving gains of up to 5.34% and 8.67%, respectively.
△ Less
Submitted 12 February, 2026;
originally announced February 2026.
-
Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling
Authors:
Gongye Liu,
Bo Yang,
Yida Zhi,
Zhizhou Zhong,
Lei Ke,
Didan Deng,
Han Gao,
Yongxiang Huang,
Kaihao Zhang,
Hongbo Fu,
Wenhan Luo
Abstract:
Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator…
▽ More
Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator through a pixel-space reward introduces a domain mismatch that complicates alignment. In this paper, we propose DiNa-LRM, a diffusion-native latent reward model that formulates preference learning directly on noisy diffusion states. Our method introduces a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty. DiNa-LRM leverages a pretrained latent diffusion backbone with a timestep-conditioned reward head, and supports inference-time noise ensembling, providing a diffusion-native mechanism for test-time scaling and robust rewarding. Across image alignment benchmarks, DiNa-LRM substantially outperforms existing diffusion-based reward baselines and achieves performance competitive with state-of-the-art VLMs at a fraction of the computational cost. In preference optimization, we demonstrate that DiNa-LRM improves preference optimization dynamics, enabling faster and more resource-efficient model alignment.
△ Less
Submitted 11 February, 2026;
originally announced February 2026.
-
Anisotropy, frustration and saddle point in the twisted Kagome antiferromagnet ErPdPb
Authors:
Resham Babu Regmi,
Sk Jamaluddin,
Y. Lee,
Hari Bhandari,
Po-Hao Chang,
Peter E. Siegfried,
Abhijeet Nayak,
Mohamed El Gazzah,
Bence G. Márkus,
Anna Nyáry,
Zachary T. Messegee,
Miya P. Zhao,
Xiaoyan Tan,
László Forró,
Liqin Ke,
Igor I. Mazin,
Nirmal J. Ghimire
Abstract:
The kagome lattice, with its inherent geometric frustration, provides a rich platform for exploring intriguing magnetic phenomena and topological electronic structures. In reduced-symmetry structures, such as twisted kagome systems involving rare earth elements, additional anisotropy can arise, enabling intriguing properties including spin-ice states, magnetocaloric effects, noncollinear magnetic…
▽ More
The kagome lattice, with its inherent geometric frustration, provides a rich platform for exploring intriguing magnetic phenomena and topological electronic structures. In reduced-symmetry structures, such as twisted kagome systems involving rare earth elements, additional anisotropy can arise, enabling intriguing properties including spin-ice states, magnetocaloric effects, noncollinear magnetic ordering, and anomalous Hall effect. Here, we report the synthesis of single crystals of ErPdPb, which features a twisted kagome lattice net of Er atoms within the hexagonal ZrNiAl-type structure, and we investigate its magnetic, electronic, and thermal properties. The material exhibits antiferromagnetic ordering below 2.2 K, consistently observed in magnetic, transport, and heat capacity measurements. Magnetization measurements reveal 1/3 metamagnetic steps along the c-axis below the Néel temperature, suggesting an Ising-spin-like state on the twisted kagome lattice. A pronounced anisotropy between in-plane and out-of-plane resistivity is observed throughout the temperature range of 1.8-300 K, and the compound exhibits a significant frustration index of 13.6 (12.7) along the c-axis (ab-plane). Heat capacity measurements show a broad hump at 2.2 K, with an additional increase below 0.5 K. The anisotropic magnetic properties are further explored through density functional theory (DFT) calculations, which suggest strong easy-axis anisotropy, consistent with experimental magnetic measurements and crystal-field model expectations, and quasi-one-dimensional bands and a spin-split saddle point at the zone center.
△ Less
Submitted 9 February, 2026;
originally announced February 2026.
-
A New Mode of Teaching Chinese as a Foreign Language from the Perspective of Smart System Studied by Using Rongzhixue
Authors:
Xiaohui Zou,
Lijun Ke,
Shunpeng Zou
Abstract:
The purpose of this study is to introduce a new model of teaching Chinese as a foreign language from the perspective of integrating wisdom. Its characteristics are as follows: focusing on the butterfly model of interpretation before translation, highlighting the new method of bilingual thinking training, on the one hand, applying the new theory of Chinese characters, the theory of the relationship…
▽ More
The purpose of this study is to introduce a new model of teaching Chinese as a foreign language from the perspective of integrating wisdom. Its characteristics are as follows: focusing on the butterfly model of interpretation before translation, highlighting the new method of bilingual thinking training, on the one hand, applying the new theory of Chinese characters, the theory of the relationship between language and speech, and the forward-looking research results of language science; On the other hand, the application of the new model of teaching Chinese as a foreign language, AI empowering teaching and learning, and the forward-looking research results of educational science fully reflect a series of characteristics of the new model of teaching Chinese as a foreign language from the perspective of integrating wisdom. Its beneficial effects are: not only the old view of language and education, especially the old view of teaching Chinese as a foreign language, but also the old view of human-computer interaction. Its significance lies in that a series of great cross-border Rongzhixue such as language, knowledge, education and teaching, as well as new methods and new topics of bilingual thinking training are clearly put forward from the perspective of integrating wisdom. Especially in the face of the challenge of Chat GPT to human learning ability and even creativity, the existing concepts of language knowledge education and teaching are already very backward. The old concepts of Chinese language education, and teaching Chinese as a foreign language are all facing a series of subversive innovation challenges. How to seek changes in adaptation? This study has made a series of innovative attempts, hoping to benefit academic colleagues, teachers and students.
△ Less
Submitted 28 January, 2026;
originally announced February 2026.
-
Reconstruction of atmospheric neutrinos in DUNE's horizontal-drift far-detector module
Authors:
DUNE Collaboration,
S. Abbaslu,
F. Abd Alrahman,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos
, et al. (1325 additional authors not shown)
Abstract:
This paper reports on the capabilities in reconstructing and identifying atmospheric neutrino interactions in one of the Deep Underground Neutrino Experiment's (DUNE) far detector modules, a liquid argon time projection chamber (LArTPC) with horizontal drift (FD-HD) of ionization electrons. The reconstruction is based upon the workflow developed for DUNE's long-baseline oscillation analysis, with…
▽ More
This paper reports on the capabilities in reconstructing and identifying atmospheric neutrino interactions in one of the Deep Underground Neutrino Experiment's (DUNE) far detector modules, a liquid argon time projection chamber (LArTPC) with horizontal drift (FD-HD) of ionization electrons. The reconstruction is based upon the workflow developed for DUNE's long-baseline oscillation analysis, with some necessary machine-learning models' retraining and the addition of features relevant only to atmospheric neutrinos such as the neutrino direction reconstruction. Where relevant, the impact of the detection of the charged particles of the hadronic system is emphasized, and comparisons are carried out between the case when lepton-only information is considered in the reconstruction (as is the case for many neutrino oscillation experiments), versus when all particles identified in the LArTPC were included. Three neutrino direction reconstruction methods have been developed and studied for the atmospheric analyses: using lepton-only information, using all reconstructed particles, and using only correlations from reconstructed hits. The results indicate that incorporating more than just lepton information significantly improves the resolution of both neutrino direction and energy reconstruction. The angle reconstruction algorithms developed in this work result in no strong dependence on particle direction for reconstruction efficiencies or neutrino flavor identification. This comprehensive review of the reconstruction of atmospheric neutrinos in DUNE's FD-HD LArTPC is the first step towards developing a first neutrino oscillation sensitivity analysis, which will ready DUNE for its first measurements.
△ Less
Submitted 9 January, 2026;
originally announced January 2026.
-
Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Authors:
Rui Liu,
Dian Yu,
Lei Ke,
Haolin Liu,
Yujun Zhou,
Zhenwen Liang,
Haitao Mi,
Pratap Tokekar,
Dong Yu
Abstract:
Reinforcement Learning with Verifiable Rewards (RLVR) has become a key paradigm to improve the reasoning capabilities of Multimodal Large Language Models (MLLMs). However, prevalent group-based algorithms such as GRPO require multi-rollout sampling for each prompt. While more efficient single-rollout variants have recently been explored in text-only settings, we find that they suffer from severe i…
▽ More
Reinforcement Learning with Verifiable Rewards (RLVR) has become a key paradigm to improve the reasoning capabilities of Multimodal Large Language Models (MLLMs). However, prevalent group-based algorithms such as GRPO require multi-rollout sampling for each prompt. While more efficient single-rollout variants have recently been explored in text-only settings, we find that they suffer from severe instability in multimodal contexts, often leading to training collapse. To address this training efficiency-stability trade-off, we introduce $\textbf{MSSR}$ (Multimodal Stabilized Single-Rollout), a group-free RLVR framework that achieves both stable optimization and effective multimodal reasoning performance. MSSR achieves this via an entropy-based advantage-shaping mechanism that adaptively regularizes advantage magnitudes, preventing collapse and maintaining training stability. While such mechanisms have been used in group-based RLVR, we show that in the multimodal single-rollout setting they are not merely beneficial but essential for stability. In in-distribution evaluations, MSSR demonstrates superior training compute efficiency, achieving similar validation accuracy to the group-based baseline with half the training steps. When trained for the same number of steps, MSSR's performance surpasses the group-based baseline and shows consistent generalization improvements across five diverse reasoning-intensive benchmarks. Together, these results demonstrate that MSSR enables stable, compute-efficient, and effective RLVR for complex multimodal reasoning tasks.
△ Less
Submitted 20 December, 2025;
originally announced December 2025.
-
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
Authors:
Tianyuan Qu,
Lei Ke,
Xiaohang Zhan,
Longxiang Tang,
Yuqi Liu,
Bohao Peng,
Bei Yu,
Dong Yu,
Jiaya Jia
Abstract:
Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate instructions meet cluttered or ambiguous scenes. We introduce RePlan (Region-aligned Planning), a plan-then-execute framework that couples a vision-language planner with a diffusion editor. The planner decomposes…
▽ More
Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate instructions meet cluttered or ambiguous scenes. We introduce RePlan (Region-aligned Planning), a plan-then-execute framework that couples a vision-language planner with a diffusion editor. The planner decomposes instructions via step-by-step reasoning and explicitly grounds them to target regions; the editor then applies changes using a training-free attention-region injection mechanism, enabling precise, parallel multi-region edits without iterative inpainting. To strengthen planning, we apply GRPO-based reinforcement learning using 1K instruction-only examples, yielding substantial gains in reasoning fidelity and format reliability. We further present IV-Edit, a benchmark focused on fine-grained grounding and knowledge-intensive edits. Across IV-Complex settings, RePlan consistently outperforms strong baselines trained on far larger datasets, improving regional precision and overall fidelity. Our project page: https://replan-iv-edit.github.io
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
Authors:
Yuxin Wang,
Lei Ke,
Boqiang Zhang,
Tianyuan Qu,
Hanxun Yu,
Zhenpeng Huang,
Meng Yu,
Dan Xu,
Dong Yu
Abstract:
While current multimodal models can answer questions based on 2D images, they lack intrinsic 3D object perception, limiting their ability to comprehend spatial relationships and depth cues in 3D scenes. In this work, we propose N3D-VLM, a novel unified framework that seamlessly integrates native 3D object perception with 3D-aware visual reasoning, enabling both precise 3D grounding and interpretab…
▽ More
While current multimodal models can answer questions based on 2D images, they lack intrinsic 3D object perception, limiting their ability to comprehend spatial relationships and depth cues in 3D scenes. In this work, we propose N3D-VLM, a novel unified framework that seamlessly integrates native 3D object perception with 3D-aware visual reasoning, enabling both precise 3D grounding and interpretable spatial understanding. Unlike conventional end-to-end models that directly predict answers from RGB/RGB-D inputs, our approach equips the model with native 3D object perception capabilities, enabling it to directly localize objects in 3D space based on textual descriptions. Building upon accurate 3D object localization, the model further performs explicit reasoning in 3D, achieving more interpretable and structured spatial understanding. To support robust training for these capabilities, we develop a scalable data construction pipeline that leverages depth estimation to lift large-scale 2D annotations into 3D space, significantly increasing the diversity and coverage for 3D object grounding data, yielding over six times larger than the largest existing single-image 3D detection dataset. Moreover, the pipeline generates spatial question-answering datasets that target chain-of-thought (CoT) reasoning in 3D, facilitating joint training for both 3D object localization and 3D spatial reasoning. Experimental results demonstrate that our unified framework not only achieves state-of-the-art performance on 3D grounding tasks, but also consistently surpasses existing methods in 3D spatial reasoning in vision-language model.
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
Authors:
Yixin Wan,
Lei Ke,
Wenhao Yu,
Kai-Wei Chang,
Dong Yu
Abstract:
We introduce MotionEdit, a novel dataset for motion-centric image editing-the task of modifying subject actions and interactions while preserving identity, structure, and physical plausibility. Unlike existing image editing datasets that focus on static appearance changes or contain only sparse, low-quality motion edits, MotionEdit provides high-fidelity image pairs depicting realistic motion tran…
▽ More
We introduce MotionEdit, a novel dataset for motion-centric image editing-the task of modifying subject actions and interactions while preserving identity, structure, and physical plausibility. Unlike existing image editing datasets that focus on static appearance changes or contain only sparse, low-quality motion edits, MotionEdit provides high-fidelity image pairs depicting realistic motion transformations extracted and verified from continuous videos. This new task is not only scientifically challenging but also practically significant, powering downstream applications such as frame-controlled video synthesis and animation.
To evaluate model performance on the novel task, we introduce MotionEdit-Bench, a benchmark that challenges models on motion-centric edits and measures model performance with generative, discriminative, and preference-based metrics. Benchmark results reveal that motion editing remains highly challenging for existing state-of-the-art diffusion-based editing models. To address this gap, we propose MotionNFT (Motion-guided Negative-aware Fine Tuning), a post-training framework that computes motion alignment rewards based on how well the motion flow between input and model-edited images matches the ground-truth motion, guiding models toward accurate motion transformations. Extensive experiments on FLUX.1 Kontext and Qwen-Image-Edit show that MotionNFT consistently improves editing quality and motion fidelity of both base models on the motion editing task without sacrificing general editing ability, demonstrating its effectiveness. Our code is at https://github.com/elainew728/motion-edit/.
△ Less
Submitted 13 December, 2025; v1 submitted 10 December, 2025;
originally announced December 2025.
-
Controlling Skyrmion Lattices via Strain: Elongation, Tilting, and Collapse Mechanisms
Authors:
Haijun Zhao,
Tae-Hoon Kim,
Lin Zhou,
Liqin Ke
Abstract:
This study establishes a comprehensive framework for the three-dimensional strain control of magnetic skyrmion strings. We integrate analytical modeling, micromagnetic simulations, and \textit{in situ} Lorentz transmission electron microscopy experiments to demonstrate that externally applied strain is a powerful stimuli for manipulating three-dimensional magnetic skyrmion strings. Analytical mode…
▽ More
This study establishes a comprehensive framework for the three-dimensional strain control of magnetic skyrmion strings. We integrate analytical modeling, micromagnetic simulations, and \textit{in situ} Lorentz transmission electron microscopy experiments to demonstrate that externally applied strain is a powerful stimuli for manipulating three-dimensional magnetic skyrmion strings. Analytical models predict that strain induces both elongation and bidirectional tilting of skyrmion strings in bulk systems, a finding corroborated by numerical simulations. These simulations further reveal that strain drives the system from fragmented multi-domain states toward unified single-domain configurations and facilitates skyrmion string rupture via bobber formation at critical strain levels. The collapse of the skyrmion lattice exhibits a temperature-dependent character, shifting from first-order to second-order behavior near the critical temperature $T_c$. Reducing sample thickness significantly increases the critical strain required for annihilation due to the suppression of tilting. Experimental validation on a $\text{Co}_8\text{Zn}_{8.5}\text{Mn}_{3.5}$ sample confirms strain-induced elongation and subsequent collapse into a conical phase via anti-cluster formation, directly implicating strain-modulated Dzyaloshinskii-Moriya interaction (DMI) as the primary mechanism in this system, over magnetocrystalline anisotropy. These findings provide a mechanistic understanding of strain-mediated control in three-dimensional magnetic systems, demonstrating its feasibility for energy-efficient spintronic applications.
△ Less
Submitted 10 December, 2025;
originally announced December 2025.
-
Topological Defect Mediated Helical Phase Reorientation by Uniaxial Stress
Authors:
Tae-Hoon Kim,
Haijun Zhao,
Brandt A. Jensen,
Liqin Ke,
Lin Zhou
Abstract:
Strain engineering enables precise, energy-efficient control of nanoscale magnetism. However, unlike well-studied strain-dislocation interactions in mechanical deformation, the spatial evolution of strain-induced spin rearrangement remains poorly understood. Using \emph{in situ} Lorentz transmission electron microscopy, we manipulate and observe helical domain reorientation under quantitatively ap…
▽ More
Strain engineering enables precise, energy-efficient control of nanoscale magnetism. However, unlike well-studied strain-dislocation interactions in mechanical deformation, the spatial evolution of strain-induced spin rearrangement remains poorly understood. Using \emph{in situ} Lorentz transmission electron microscopy, we manipulate and observe helical domain reorientation under quantitatively applied uniaxial tensile stress. Our findings reveal striking similarity to plastic deformation in metals, where the critical stress for propagation vector (\emph{\textbf{Q}}) reorientation depends on its angle with the stress direction. Magnetic defects mediate reorientation via "break-and-reconnect" or "dislocation gliding-annihilation" processes. Simulations confirm that strain-induced anisotropic Dzyaloshinskii-Moriya interaction may play a key role. These insights advance strain-driven magnetism and offer a promising route for energy-efficient magnetic nanophase control in next-generation information technology.
△ Less
Submitted 6 December, 2025;
originally announced December 2025.
-
Phase-Factor-Controlled Surface Spirals in the Magnetic Conical Phase: The Role of In-Plane Directionality
Authors:
Haijun Zhao,
Tae-Hoon Kim,
Lin Zhou,
Liqin Ke
Abstract:
In chiral magnets, the magnetic textures surrounding domain walls exhibit a rich variety of structures, offering insights into fundamental physics and potential applications in spintronic devices. Conical spirals and related structures possess intrinsic in-plane directionalities governed by phase factors $φ_0$, which are often obscured in long spirals due to cylindrical symmetry but become promine…
▽ More
In chiral magnets, the magnetic textures surrounding domain walls exhibit a rich variety of structures, offering insights into fundamental physics and potential applications in spintronic devices. Conical spirals and related structures possess intrinsic in-plane directionalities governed by phase factors $φ_0$, which are often obscured in long spirals due to cylindrical symmetry but become prominent in short spirals or thin films. Using micromagnetic simulations, we systematically studied magnetic textures at ferromagnetic-conical interfaces (FCI), including 1D and 2D FCIs with various shapes. Surface spirals (SS) emerge adjacent to these FCIs, closely linked to the cone's in-plane reorientation. In 1D FCIs, reorientation controls the presence, shape, and topological charge of the SS, with a discontinuity point observed where spirals with opposite charges form on opposite sides. In 2D FCIs, eyebrow-like SS are evident. The reorientation angle between top and bottom SS is controlled by the film thickness, similar to stacked spirals reported previously. We further demonstrate that SSs form at the facets of skyrmion clusters within the conical phase, as confirmed by both simulations and Lorentz transmission electron microscopy observations in Co$_8$Zn$_{10}$Mn$_2$ thin films. The experiments specifically reveal two distinct formation pathways: thermally activated co-growth and field-driven transformation from residual helices. These findings establish $φ_0$ as a fundamental control parameter for magnetic states, enabling promising spintronic functionalities such as multi-state memory through SS polymorphism and energy-efficient neuromorphic computing via controlled topological transitions.
△ Less
Submitted 6 December, 2025;
originally announced December 2025.
-
FlowSteer: Guiding Few-Step Image Synthesis with Authentic Trajectories
Authors:
Lei Ke,
Hubery Yin,
Gongye Liu,
Zhengyao Lv,
Jingcai Guo,
Chen Li,
Wenhan Luo,
Yujiu Yang,
Jing Lyu
Abstract:
With the success of flow matching in visual generation, sampling efficiency remains a critical bottleneck for its practical application. Among flow models' accelerating methods, ReFlow has been somehow overlooked although it has theoretical consistency with flow matching. This is primarily due to its suboptimal performance in practical scenarios compared to consistency distillation and score disti…
▽ More
With the success of flow matching in visual generation, sampling efficiency remains a critical bottleneck for its practical application. Among flow models' accelerating methods, ReFlow has been somehow overlooked although it has theoretical consistency with flow matching. This is primarily due to its suboptimal performance in practical scenarios compared to consistency distillation and score distillation. In this work, we investigate this issue within the ReFlow framework and propose FlowSteer, a method unlocks the potential of ReFlow-based distillation by guiding the student along teacher's authentic generation trajectories. We first identify that Piecewised ReFlow's performance is hampered by a critical distribution mismatch during the training and propose Online Trajectory Alignment(OTA) to resolve it. Then, we introduce a adversarial distillation objective applied directly on the ODE trajectory, improving the student's adherence to the teacher's generation trajectory. Furthermore, we find and fix a previously undiscovered flaw in the widely-used FlowMatchEulerDiscreteScheduler that largely degrades few-step inference quality. Our experiment result on SD3 demonstrates our method's efficacy.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
$π^{*}_{0.6}$: a VLA That Learns From Experience
Authors:
Physical Intelligence,
Ali Amin,
Raichelle Aniceto,
Ashwin Balakrishna,
Kevin Black,
Ken Conley,
Grace Connors,
James Darpinian,
Karan Dhabalia,
Jared DiCarlo,
Danny Driess,
Michael Equi,
Adnan Esmail,
Yunhao Fang,
Chelsea Finn,
Catherine Glossop,
Thomas Godden,
Ivan Goryachev,
Lachy Groom,
Hunter Hancock,
Karol Hausman,
Gashon Hussein,
Brian Ichter,
Szymon Jakubczak,
Rowan Jen
, et al. (31 additional authors not shown)
Abstract:
We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demon…
▽ More
We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $π^{*}_{0.6}$, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $π^{*}_{0.6}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput and roughly halves the task failure rate.
△ Less
Submitted 18 November, 2025; v1 submitted 18 November, 2025;
originally announced November 2025.
-
Measurement of Exclusive $π^+$--argon Interactions Using ProtoDUNE-SP
Authors:
DUNE Collaboration,
S. Abbaslu,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos,
M. Andreotti
, et al. (1304 additional authors not shown)
Abstract:
We present the measurement of $π^{+}$--argon inelastic cross sections using the ProtoDUNE Single-Phase liquid argon time projection chamber in the incident $π^+$ kinetic energy range of 500 -- 800 MeV in multiple exclusive channels (absorption, charge exchange, and the remaining inelastic interactions). The results of this analysis are important inputs to simulations of liquid argon neutrino exper…
▽ More
We present the measurement of $π^{+}$--argon inelastic cross sections using the ProtoDUNE Single-Phase liquid argon time projection chamber in the incident $π^+$ kinetic energy range of 500 -- 800 MeV in multiple exclusive channels (absorption, charge exchange, and the remaining inelastic interactions). The results of this analysis are important inputs to simulations of liquid argon neutrino experiments such as the Deep Underground Neutrino Experiment and the Short Baseline Neutrino program at Fermi National Accelerator Laboratory. They will be employed to improve the modeling of final state interactions within neutrino event generators used by these experiments, as well as the modeling of $π^{+}$--argon secondary interactions within the liquid argon. This is the first measurement of $π^+$--argon absorption at this kinetic energy range as well as the first ever measurement of $π^{+}$--argon charge exchange.
△ Less
Submitted 17 November, 2025;
originally announced November 2025.
-
First Measurement of $π^+$-Ar and $p$-Ar Total Inelastic Cross Sections in the Sub-GeV Energy Regime with ProtoDUNE-SP Data
Authors:
DUNE Collaboration,
S. Abbaslu,
F. Abd Alrahman,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
L. Aliaga Soplin,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1327 additional authors not shown)
Abstract:
The ProtoDUNE-SP detector, a kiloton-scale prototype for the Deep Underground Neutrino Experiment (DUNE), is the largest liquid argon time projection chamber built to date. Operated at CERN from 2018 to 2020, it collected both cosmic-ray data and a beam consisting of positively-charged particles with discrete momentum settings across a range of 0.3 GeV/$c$ to 7 GeV/$c$. In this letter, we report t…
▽ More
The ProtoDUNE-SP detector, a kiloton-scale prototype for the Deep Underground Neutrino Experiment (DUNE), is the largest liquid argon time projection chamber built to date. Operated at CERN from 2018 to 2020, it collected both cosmic-ray data and a beam consisting of positively-charged particles with discrete momentum settings across a range of 0.3 GeV/$c$ to 7 GeV/$c$. In this letter, we report the total inelastic cross section measurements for $π^+$-Ar and $p$-Ar interactions using selected $π^+$ and proton samples from the 1 GeV/$c$ beam data. These results provide the first measurement of the total inelastic cross sections for $π^+$-Ar in the 500-900 MeV kinetic energy range and for $p$-Ar below 450 MeV, both of which are directly relevant to the DUNE energy range. The measured cross sections are consistent with predictions and provide a dataset that was previously unavailable for argon targets. These measurements are essential for constraining neutrino-argon interaction models, which are crucial for the precision physics goals of the upcoming DUNE experiment.
△ Less
Submitted 14 November, 2025;
originally announced November 2025.
-
Spatio-Temporal Cluster-Triggered Encoding for Spiking Neural Networks
Authors:
Lingyun Ke,
Minchi Hu
Abstract:
Encoding static images into spike trains is a crucial step for enabling Spiking Neural Networks (SNNs) to process visual information efficiently. However, existing schemes such as rate coding, Poisson encoding, and time-to-first-spike (TTFS) often ignore spatial relationships and yield temporally inconsistent spike patterns. In this article, a novel cluster-based encoding approach is proposed, whi…
▽ More
Encoding static images into spike trains is a crucial step for enabling Spiking Neural Networks (SNNs) to process visual information efficiently. However, existing schemes such as rate coding, Poisson encoding, and time-to-first-spike (TTFS) often ignore spatial relationships and yield temporally inconsistent spike patterns. In this article, a novel cluster-based encoding approach is proposed, which leverages local density computation to preserve semantic structure in both spatial and temporal domains. This method introduces a 2D spatial cluster trigger that identifies foreground regions through connected component analysis and local density estimation. Then, extend to a 3D spatio-temporal (ST3D) framework that jointly considers temporal neighborhoods, producing spike trains with improved temporal consistency. Experiments on the N-MNIST dataset demonstrate that our ST3D encoder achieves 98.17% classification accuracy with a simple single-layer SNN, outperforming standard TTFS encoding (97.58%) and matching the performance of more complex deep architectures while using significantly fewer spikes (~3800 vs ~5000 per sample). The results demonstrate that this approach provides an interpretable and efficient encoding strategy for neuromorphic computing applications.
△ Less
Submitted 11 November, 2025;
originally announced November 2025.
-
Test-driven Reinforcement Learning in Continuous Control
Authors:
Zhao Yu,
Xiuping Wu,
Liangjun Ke
Abstract:
Reinforcement learning (RL) has been recognized as a powerful tool for robot control tasks. RL typically employs reward functions to define task objectives and guide agent learning. However, since the reward function serves the dual purpose of defining the optimal goal and guiding learning, it is challenging to design the reward function manually, which often results in a suboptimal task represent…
▽ More
Reinforcement learning (RL) has been recognized as a powerful tool for robot control tasks. RL typically employs reward functions to define task objectives and guide agent learning. However, since the reward function serves the dual purpose of defining the optimal goal and guiding learning, it is challenging to design the reward function manually, which often results in a suboptimal task representation. To tackle the reward design challenge in RL, inspired by the satisficing theory, we propose a Test-driven Reinforcement Learning (TdRL) framework. In the TdRL framework, multiple test functions are used to represent the task objective rather than a single reward function. Test functions can be categorized as pass-fail tests and indicative tests, each dedicated to defining the optimal objective and guiding the learning process, respectively, thereby making defining tasks easier. Building upon such a task definition, we first prove that if a trajectory return function assigns higher returns to trajectories closer to the optimal trajectory set, maximum entropy policy optimization based on this return function will yield a policy that is closer to the optimal policy set. Then, we introduce a lexicographic heuristic approach to compare the relative distance relationship between trajectories and the optimal trajectory set for learning the trajectory return function. Furthermore, we develop an algorithm implementation of TdRL. Experimental results on the DeepMind Control Suite benchmark demonstrate that TdRL matches or outperforms handcrafted reward methods in policy training, with greater design simplicity and inherent support for multi-objective optimization. We argue that TdRL offers a novel perspective for representing task objectives, which could be helpful in addressing the reward design challenges in RL applications.
△ Less
Submitted 9 December, 2025; v1 submitted 11 November, 2025;
originally announced November 2025.
-
Hidden symmetry-breaking in a kagome Ising ferromagnet
Authors:
Tianxiong Han,
Tyler J. Slade,
Liqin Ke,
Qing-Ping Ding,
Minseong Lee,
Ryan D. McKenzie,
Bing Li,
Durba R. Jaishi,
Yongbin Lee,
Daniel M. Pajerowski,
Qiang Zhang,
Tao Hong,
Paul C. Canfield,
Yuji Furukawa,
Komalavalli Thirunavukkuarasu,
Aashish Sapkota,
Rebecca Flint,
Robert J. McQueeney
Abstract:
Kagome metals can host unconventional electronic phenomena that emerge from their frustrated lattice geometry and associated band topology. Correlated electronic orders, such as charge-density waves and superconductivity, are observed to intertwine with subtle time-reversal symmetry breaking whose microscopic origin is not currently understood. Here, we provide evidence for such time-reversal symm…
▽ More
Kagome metals can host unconventional electronic phenomena that emerge from their frustrated lattice geometry and associated band topology. Correlated electronic orders, such as charge-density waves and superconductivity, are observed to intertwine with subtle time-reversal symmetry breaking whose microscopic origin is not currently understood. Here, we provide evidence for such time-reversal symmetry breaking in the kagome metal TbV$_6$Sn$_6$ arising from staggered magnetic moments within the kagome layers. TbV$_6$Sn$_6$ consists of metallic V kagome layers separated by Tb triangular layers that host Ising ferromagnetic order. Deep in the ferromagnetic state, the Tb Ising doublet ground state should display a single, dispersionless spin-flip excitation. Instead, inelastic neutron scattering reveals two sharp excitations associated with inequivalent Tb sites, demonstrating that a symmetry-broken phase coexists with Ising ferromagnetism. No additional structural or magnetic phase transitions are detected, and first-principles calculations rule out lattice distortions as the origin of the splitting. We attribute this effect to time-reversal symmetry breaking encoded by small V moments that couple to the Tb sublattice and leave a measurable spectral fingerprint. Our results establish rare-earth local moment spectroscopy as a sensitive probe of subtle broken symmetries and highlight an unexpected interplay between kagome magnetism and rare-earth local moment magnetism.
△ Less
Submitted 10 November, 2025;
originally announced November 2025.
-
A Computer Vision Based Proxy for Political Polarization in Religious Countries: A Turkiye Case Study
Authors:
Liangze Ke
Abstract:
This paper examines a novel proxy for political polarization, initially proposed by Caliskan et al., which estimates intergroup distances using computer vision. Analyzing 1,400+ YouTube videos with advanced object detection, their study quantifies demographic and religious divides in Turkiye, a deeply polarized nation. Our findings reveal strong correlations between intergroup distances and electo…
▽ More
This paper examines a novel proxy for political polarization, initially proposed by Caliskan et al., which estimates intergroup distances using computer vision. Analyzing 1,400+ YouTube videos with advanced object detection, their study quantifies demographic and religious divides in Turkiye, a deeply polarized nation. Our findings reveal strong correlations between intergroup distances and electoral polarization, measured via entropy-based voting metrics weighted by religiosity and political inclination. Two key insights emerge: (1) Greater distances between religious and nonreligious individuals (NRP vs RP) heighten electoral entropy, underscoring sociocultural fragmentation. (2) Intragroup diversity among nonreligious individuals (NRP vs NRP) stabilizes polarization, aligning with Axelrod's cultural dissemination model. This research advances computational social science and economics by showing that physical distancing serves as a scalable proxy for polarization, complementing traditional economic indicators.
△ Less
Submitted 4 November, 2025;
originally announced November 2025.
-
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
Authors:
Yash Jangir,
Yidi Zhang,
Pang-Chi Lo,
Kashu Yamazaki,
Chenyu Zhang,
Kuan-Hsun Tu,
Tsung-Wei Ke,
Lei Ke,
Yonatan Bisk,
Katerina Fragkiadaki
Abstract:
The pursuit of robot generalists, agents capable of performing diverse tasks across diverse environments, demands rigorous and scalable evaluation. Yet real-world testing of robot policies remains fundamentally constrained: it is labor-intensive, slow, unsafe at scale, and difficult to reproduce. As policies expand in scope and complexity, these barriers only intensify, since defining "success" in…
▽ More
The pursuit of robot generalists, agents capable of performing diverse tasks across diverse environments, demands rigorous and scalable evaluation. Yet real-world testing of robot policies remains fundamentally constrained: it is labor-intensive, slow, unsafe at scale, and difficult to reproduce. As policies expand in scope and complexity, these barriers only intensify, since defining "success" in robotics often hinges on nuanced human judgments of execution quality. We introduce RobotArena Infinity, a new benchmarking framework that overcomes these challenges by shifting vision-language-action (VLA) evaluation into large-scale simulated environments augmented with online human feedback. Leveraging advances in vision-language models, 2D-to-3D generative modeling, and differentiable rendering, our approach automatically converts video demonstrations from widely used robot datasets into simulated counterparts. Within these digital twins, we assess VLA policies using both automated vision-language-model-guided scoring and scalable human preference judgments collected from crowdworkers, transforming human involvement from tedious scene setup, resetting, and safety supervision into lightweight preference comparisons. To measure robustness, we systematically perturb simulated environments along multiple axes, including textures and object placements, stress-testing policy generalization under controlled variation. The result is a continuously evolving, reproducible, and scalable benchmark for real-world-trained robot manipulation policies, addressing a critical missing capability in today's robotics landscape.
△ Less
Submitted 19 March, 2026; v1 submitted 27 October, 2025;
originally announced October 2025.
-
Identification of low-energy kaons in the ProtoDUNE-SP detector
Authors:
DUNE Collaboration,
S. Abbaslu,
F. Abd Alrahman,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos
, et al. (1325 additional authors not shown)
Abstract:
The Deep Underground Neutrino Experiment (DUNE) is a next-generation neutrino experiment with a rich physics program that includes searches for the hypothetical phenomenon of proton decay. Utilizing liquid-argon time-projection chamber technology, DUNE is expected to achieve world-leading sensitivity in the proton decay channels that involve charged kaons in their final states. The first DUNE demo…
▽ More
The Deep Underground Neutrino Experiment (DUNE) is a next-generation neutrino experiment with a rich physics program that includes searches for the hypothetical phenomenon of proton decay. Utilizing liquid-argon time-projection chamber technology, DUNE is expected to achieve world-leading sensitivity in the proton decay channels that involve charged kaons in their final states. The first DUNE demonstrator, ProtoDUNE Single-Phase, was a 0.77 kt detector that operated from 2018 to 2020 at the CERN Neutrino Platform, exposed to a mixed hadron and electron test-beam with momenta ranging from 0.3 to 7 GeV/c. We present a selection of low-energy kaons among the secondary particles produced in hadronic reactions, using data from the 6 and 7 GeV/c beam runs. The selection efficiency is 1\% and the sample purity 92\%. The initial energies of the selected kaon candidates encompass the expected energy range of kaons originating from proton decay events in DUNE (below $\sim$200 MeV). In addition, we demonstrate the capability of this detector technology to discriminate between kaons and other particles such as protons and muons, and provide a comprehensive description of their energy loss in liquid argon, which shows good agreement with the simulation. These results pave the way for future proton decay searches at DUNE.
△ Less
Submitted 9 October, 2025;
originally announced October 2025.
-
Pilot selection in the era of Virtual reality: algorithms for accurate and interpretable machine learning models
Authors:
Luoma Ke,
Guangpeng Zhang,
Jibo He,
Yajing Li,
Yan Li,
Xufeng Liu,
Peng Fang
Abstract:
With the rapid growth of the aviation industry, there is a need for a large number of flight crew. How to select the right pilots in a cost-efficient manner has become an important research question. In the current study, twenty-three pilots were recruited from China Eastern Airlines, and 23 novices were from the community of Tsinghua University. A novel approach incorporating machine learning and…
▽ More
With the rapid growth of the aviation industry, there is a need for a large number of flight crew. How to select the right pilots in a cost-efficient manner has become an important research question. In the current study, twenty-three pilots were recruited from China Eastern Airlines, and 23 novices were from the community of Tsinghua University. A novel approach incorporating machine learning and virtual reality technology was applied to distinguish features between these participants with different flight skills. Results indicate that SVM with the MIC feature selection method consistently achieved the highest prediction performance on all metrics with an Accuracy of 0.93, an AUC of 0.96, and an F1 of 0.93, which outperforms four other classifier algorithms and two other feature selection methods. From the perspective of feature selection methods, the MIC method can select features with a nonlinear relationship to sampling labels, instead of a simple filter-out. Our new implementation of the SVM + MIC algorithm outperforms all existing pilot selection algorithms and perhaps provides the first implementation based on eye tracking and flight dynamics data. This study's VR simulation platforms and algorithms can be used for pilot selection and training.
△ Less
Submitted 2 October, 2025;
originally announced October 2025.
-
Towards mono-energetic virtual $ν$ beam cross-section measurements: A feasibility study of $ν$-Ar interaction analysis with DUNE-PRISM
Authors:
DUNE Collaboration,
S. Abbaslu,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos,
M. Andreotti
, et al. (1302 additional authors not shown)
Abstract:
Neutrino-nucleus cross-section measurements are critical for future neutrino oscillation analyses. However, our models to describe them require further refinement, and a deeper understanding of the underlying physics is essential for future neutrino oscillation experiments to realize their ambitious physics goals. Current neutrino cross-section measurements provide clear deficiencies in neutrino i…
▽ More
Neutrino-nucleus cross-section measurements are critical for future neutrino oscillation analyses. However, our models to describe them require further refinement, and a deeper understanding of the underlying physics is essential for future neutrino oscillation experiments to realize their ambitious physics goals. Current neutrino cross-section measurements provide clear deficiencies in neutrino interaction modeling, but almost all are reported averaged over broad neutrino fluxes, rendering their interpretation challenging. Using the DUNE-PRISM concept (Deep Underground Neutrino Experiment Precision Reaction Independent Spectrum Measurement) -- a movable near detector that samples multiple off-axis positions -- neutrino interaction measurements can be used to construct narrow virtual fluxes (less than 100 MeV wide). These fluxes can be used to extract charged-current neutrino-nucleus cross sections as functions of outgoing lepton kinematics within specific neutrino energy ranges. Based on a dedicated simulation with realistic event statistics and flux-related systematic uncertainties, but assuming an almost-perfect detector, we run a feasibility study demonstrating how DUNE-PRISM data can be used to measure muon neutrino charged-current integrated and differential cross sections over narrow fluxes. We find that this approach enables a model independent reconstruction of powerful observables, including energy transfer, typically accessible only in electron scattering measurements, but that large exposures may be required for differential cross-section measurements with few-\% statistical uncertainties.
△ Less
Submitted 9 September, 2025;
originally announced September 2025.
-
Operation of a Modular 3D-Pixelated Liquid Argon Time-Projection Chamber in a Neutrino Beam
Authors:
DUNE Collaboration,
S. Abbaslu,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos,
M. Andreotti
, et al. (1299 additional authors not shown)
Abstract:
The 2x2 Demonstrator, a prototype for the Deep Underground Neutrino Experiment (DUNE) liquid argon (LAr) Near Detector, was exposed to the Neutrinos from the Main Injector (NuMI) neutrino beam at Fermi National Accelerator Laboratory (Fermilab). This detector prototypes a new modular design for a liquid argon time-projection chamber (LArTPC), comprised of a two-by-two array of four modules, each f…
▽ More
The 2x2 Demonstrator, a prototype for the Deep Underground Neutrino Experiment (DUNE) liquid argon (LAr) Near Detector, was exposed to the Neutrinos from the Main Injector (NuMI) neutrino beam at Fermi National Accelerator Laboratory (Fermilab). This detector prototypes a new modular design for a liquid argon time-projection chamber (LArTPC), comprised of a two-by-two array of four modules, each further segmented into two optically-isolated LArTPCs. The 2x2 Demonstrator features a number of pioneering technologies, including a low-profile resistive field shell to establish drift fields, native 3D ionization pixelated imaging, and a high-coverage dielectric light readout system. The 2.4 tonne active mass detector is flanked upstream and downstream by supplemental solid-scintillator tracking planes, repurposed from the MINERvA experiment, which track ionizing particles exiting the argon volume. The antineutrino beam data collected by the detector over a 4.5 day period in 2024 include over 30,000 neutrino interactions in the LAr active volume-the first neutrino interactions reported by a DUNE detector prototype. During its physics-quality run, the 2x2 Demonstrator operated at a nominal drift field of 500 V/cm and maintained good LAr purity, with a stable electron lifetime of approximately 1.25 ms. This paper describes the detector and supporting systems, summarizes the installation and commissioning, and presents the initial validation of collected NuMI beam and off-beam self-triggers. In addition, it highlights observed interactions in the detector volume, including candidate muon anti-neutrino events.
△ Less
Submitted 6 September, 2025;
originally announced September 2025.
-
Multi-View 3D Point Tracking
Authors:
Frano Rajič,
Haofei Xu,
Marko Mihajlovic,
Siyuan Li,
Irem Demir,
Emircan Gündoğdu,
Lei Ke,
Sergey Prokudin,
Marc Pollefeys,
Siyu Tang
Abstract:
We introduce the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera views. Unlike existing monocular trackers, which struggle with depth ambiguities and occlusion, or prior multi-camera methods that require over 20 cameras and tedious per-sequence optimization, our feed-forward model directly predicts 3D correspondences using a…
▽ More
We introduce the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera views. Unlike existing monocular trackers, which struggle with depth ambiguities and occlusion, or prior multi-camera methods that require over 20 cameras and tedious per-sequence optimization, our feed-forward model directly predicts 3D correspondences using a practical number of cameras (e.g., four), enabling robust and accurate online tracking. Given known camera poses and either sensor-based or estimated multi-view depth, our tracker fuses multi-view features into a unified point cloud and applies k-nearest-neighbors correlation alongside a transformer-based update to reliably estimate long-range 3D correspondences, even under occlusion. We train on 5K synthetic multi-view Kubric sequences and evaluate on two real-world benchmarks: Panoptic Studio and DexYCB, achieving median trajectory errors of 3.1 cm and 2.0 cm, respectively. Our method generalizes well to diverse camera setups of 1-8 views with varying vantage points and video lengths of 24-150 frames. By releasing our tracker alongside training and evaluation datasets, we aim to set a new standard for multi-view 3D tracking research and provide a practical tool for real-world applications. Project page available at https://ethz-vlg.github.io/mvtracker.
△ Less
Submitted 28 August, 2025;
originally announced August 2025.
-
Accurate calculation of light rare-earth magnetic anisotropy with density functional theory
Authors:
Liqin Ke,
R. Flint,
Y. Lee
Abstract:
Density functional theory (DFT) has long struggled to treat light rare-earth magnetism. We show that this difficulty arises from an overestimate of the $4f$ charge asphericity, and thus the magnetic anisotropy energy, due to the inadequacy of single Slater-determinant representations. We propose an effective solution by combining constrained DFT+U with crystal field theory and a systematic many-bo…
▽ More
Density functional theory (DFT) has long struggled to treat light rare-earth magnetism. We show that this difficulty arises from an overestimate of the $4f$ charge asphericity, and thus the magnetic anisotropy energy, due to the inadequacy of single Slater-determinant representations. We propose an effective solution by combining constrained DFT+U with crystal field theory and a systematic many-body correction to the charge asphericity. We confirm the validity of this combination on TbV$_6$Sn$_6$ and TbCo$_5$, and then show how the many-body correction adjusts the calculated magnetic anisotropy energy of SmCo$_5$ to match experiment. Our method is an efficient DFT-based approach to address light-rare-earth magnetism.
△ Less
Submitted 26 August, 2025;
originally announced August 2025.
-
Generative Video Matting
Authors:
Yongtao Ge,
Kangyang Xie,
Guangkai Xu,
Mingyu Liu,
Li Ke,
Longtao Huang,
Hui Xue,
Hao Chen,
Chunhua Shen
Abstract:
Video matting has traditionally been limited by the lack of high-quality ground-truth data. Most existing video matting datasets provide only human-annotated imperfect alpha and foreground annotations, which must be composited to background images or videos during the training stage. Thus, the generalization capability of previous methods in real-world scenarios is typically poor. In this work, we…
▽ More
Video matting has traditionally been limited by the lack of high-quality ground-truth data. Most existing video matting datasets provide only human-annotated imperfect alpha and foreground annotations, which must be composited to background images or videos during the training stage. Thus, the generalization capability of previous methods in real-world scenarios is typically poor. In this work, we propose to solve the problem from two perspectives. First, we emphasize the importance of large-scale pre-training by pursuing diverse synthetic and pseudo-labeled segmentation datasets. We also develop a scalable synthetic data generation pipeline that can render diverse human bodies and fine-grained hairs, yielding around 200 video clips with a 3-second duration for fine-tuning. Second, we introduce a novel video matting approach that can effectively leverage the rich priors from pre-trained video diffusion models. This architecture offers two key advantages. First, strong priors play a critical role in bridging the domain gap between synthetic and real-world scenes. Second, unlike most existing methods that process video matting frame-by-frame and use an independent decoder to aggregate temporal information, our model is inherently designed for video, ensuring strong temporal consistency. We provide a comprehensive quantitative evaluation across three benchmark datasets, demonstrating our approach's superior performance, and present comprehensive qualitative results in diverse real-world scenes, illustrating the strong generalization capability of our method. The code is available at https://github.com/aim-uofa/GVM.
△ Less
Submitted 11 August, 2025;
originally announced August 2025.
-
Spatial and Temporal Evaluations of the Liquid Argon Purity in ProtoDUNE-SP
Authors:
DUNE Collaboration,
S. Abbaslu,
A. Abed Abud,
R. Acciarri,
L. P. Accorsi,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
C. Adriano,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade,
C. Andreopoulos,
M. Andreotti
, et al. (1301 additional authors not shown)
Abstract:
Liquid argon time projection chambers (LArTPCs) rely on highly pure argon to ensure that ionization electrons produced by charged particles reach readout arrays. ProtoDUNE Single-Phase (ProtoDUNE-SP) was an approximately 700-ton liquid argon detector intended to prototype the Deep Underground Neutrino Experiment (DUNE) Far Detector Horizontal Drift module. It contains two drift volumes bisected by…
▽ More
Liquid argon time projection chambers (LArTPCs) rely on highly pure argon to ensure that ionization electrons produced by charged particles reach readout arrays. ProtoDUNE Single-Phase (ProtoDUNE-SP) was an approximately 700-ton liquid argon detector intended to prototype the Deep Underground Neutrino Experiment (DUNE) Far Detector Horizontal Drift module. It contains two drift volumes bisected by the cathode plane assembly, which is biased to create an almost uniform electric field in both volumes. The DUNE Far Detector modules must have robust cryogenic systems capable of filtering argon and supplying the TPC with clean liquid. This paper will explore comparisons of the argon purity measured by the purity monitors with those measured using muons in the TPC from October 2018 to November 2018. A new method is introduced to measure the liquid argon purity in the TPC using muons crossing both drift volumes of ProtoDUNE-SP. For extended periods on the timescale of weeks, the drift electron lifetime was measured to be above 30 ms using both systems. A particular focus will be placed on the measured purity of argon as a function of position in the detector.
△ Less
Submitted 27 August, 2025; v1 submitted 11 July, 2025;
originally announced July 2025.
-
BiVM: Accurate Binarized Neural Network for Efficient Video Matting
Authors:
Haotong Qin,
Xianglong Liu,
Xudong Ma,
Lei Ke,
Yulun Zhang,
Jie Luo,
Michele Magno
Abstract:
Deep neural networks for real-time video matting suffer significant computational limitations on edge devices, hindering their adoption in widespread applications such as online conferences and short-form video production. Binarization emerges as one of the most common compression approaches with compact 1-bit parameters and efficient bitwise operations. However, accuracy and efficiency limitation…
▽ More
Deep neural networks for real-time video matting suffer significant computational limitations on edge devices, hindering their adoption in widespread applications such as online conferences and short-form video production. Binarization emerges as one of the most common compression approaches with compact 1-bit parameters and efficient bitwise operations. However, accuracy and efficiency limitations exist in the binarized video matting network due to its degenerated encoder and redundant decoder. Following a theoretical analysis based on the information bottleneck principle, the limitations are mainly caused by the degradation of prediction-relevant information in the intermediate features and the redundant computation in prediction-irrelevant areas. We present BiVM, an accurate and resource-efficient Binarized neural network for Video Matting. First, we present a series of binarized computation structures with elastic shortcuts and evolvable topologies, enabling the constructed encoder backbone to extract high-quality representation from input videos for accurate prediction. Second, we sparse the intermediate feature of the binarized decoder by masking homogeneous parts, allowing the decoder to focus on representation with diverse details while alleviating the computation burden for efficient inference. Furthermore, we construct a localized binarization-aware mimicking framework with the information-guided strategy, prompting matting-related representation in full-precision counterparts to be accurately and fully utilized. Comprehensive experiments show that the proposed BiVM surpasses alternative binarized video matting networks, including state-of-the-art (SOTA) binarization methods, by a substantial margin. Moreover, our BiVM achieves significant savings of 14.3x and 21.6x in computation and storage costs, respectively. We also evaluate BiVM on ARM CPU hardware.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning
Authors:
Yunchu Zhang,
Shubham Mittal,
Zhengyu Zhang,
Liyiming Ke,
Siddhartha Srinivasa,
Abhishek Gupta
Abstract:
Visuomotor policies often suffer from perceptual challenges, where visual differences between training and evaluation environments degrade policy performance. Policies relying on state estimations, like 6D pose, require task-specific tracking and are difficult to scale, while raw sensor-based policies may lack robustness to small visual disturbances. In this work, we leverage 2D keypoints--spatial…
▽ More
Visuomotor policies often suffer from perceptual challenges, where visual differences between training and evaluation environments degrade policy performance. Policies relying on state estimations, like 6D pose, require task-specific tracking and are difficult to scale, while raw sensor-based policies may lack robustness to small visual disturbances. In this work, we leverage 2D keypoints--spatially consistent features in the image frame--as a flexible state representation for robust policy learning and apply it to both sim-to-real transfer and real-world imitation learning. However, the choice of which keypoints to use can vary across objects and tasks. We propose a novel method, ATK, to automatically select keypoints in a task-driven manner so that the chosen keypoints are predictive of optimal behavior for the given task. Our proposal optimizes for a minimal set of keypoints that focus on task-relevant parts while preserving policy performance and robustness. We distill expert data (either from an expert policy in simulation or a human expert) into a policy that operates on RGB images while tracking the selected keypoints. By leveraging pre-trained visual modules, our system effectively encodes states and transfers policies to the real-world evaluation scenario despite wide scene variations and perceptual challenges such as transparent objects, fine-grained tasks, and deformable objects manipulation. We validate ATK on various robotic tasks, demonstrating that these minimal keypoint representations significantly improve robustness to visual disturbances and environmental variations. See all experiments and more details at https://yunchuzhang.github.io/ATK/.
△ Less
Submitted 4 October, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
Authors:
Wen-Hsuan Chu,
Lei Ke,
Jianmeng Liu,
Mingxiao Huo,
Pavel Tokmakov,
Katerina Fragkiadaki
Abstract:
We tackle the challenge of generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions, and introduce GenMOJO, a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis. While existing models perform well on novel view synthesis for isolated objects, they struggle to generalize to complex, cluttered sce…
▽ More
We tackle the challenge of generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions, and introduce GenMOJO, a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis. While existing models perform well on novel view synthesis for isolated objects, they struggle to generalize to complex, cluttered scenes. To address this, GenMOJO decomposes the scene into individual objects, optimizing a differentiable set of deformable Gaussians per object. This object-wise decomposition allows leveraging object-centric diffusion models to infer unobserved regions in novel viewpoints. It performs joint Gaussian splatting to render the full scene, capturing cross-object occlusions, and enabling occlusion-aware supervision. To bridge the gap between object-centric priors and the global frame-centric coordinate system of videos, GenMOJO uses differentiable transformations that align generative and rendering constraints within a unified framework. The resulting model generates 4D object reconstructions over space and time, and produces accurate 2D and 3D point tracks from monocular input. Quantitative evaluations and perceptual human studies confirm that GenMOJO generates more realistic novel views of scenes and produces more accurate point tracks compared to existing approaches.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Score-based Generative Modeling for Conditional Independence Testing
Authors:
Yixin Ren,
Chenghou Jin,
Yewei Xia,
Li Ke,
Longtao Huang,
Hui Xue,
Hao Zhang,
Jihong Guan,
Shuigeng Zhou
Abstract:
Determining conditional independence (CI) relationships between random variables is a fundamental yet challenging task in machine learning and statistics, especially in high-dimensional settings. Existing generative model-based CI testing methods, such as those utilizing generative adversarial networks (GANs), often struggle with undesirable modeling of conditional distributions and training insta…
▽ More
Determining conditional independence (CI) relationships between random variables is a fundamental yet challenging task in machine learning and statistics, especially in high-dimensional settings. Existing generative model-based CI testing methods, such as those utilizing generative adversarial networks (GANs), often struggle with undesirable modeling of conditional distributions and training instability, resulting in subpar performance. To address these issues, we propose a novel CI testing method via score-based generative modeling, which achieves precise Type I error control and strong testing power. Concretely, we first employ a sliced conditional score matching scheme to accurately estimate conditional score and use Langevin dynamics conditional sampling to generate null hypothesis samples, ensuring precise Type I error control. Then, we incorporate a goodness-of-fit stage into the method to verify generated samples and enhance interpretability in practice. We theoretically establish the error bound of conditional distributions modeled by score-based generative models and prove the validity of our CI tests. Extensive experiments on both synthetic and real-world datasets show that our method significantly outperforms existing state-of-the-art methods, providing a promising way to revitalize generative model-based CI testing.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object
Authors:
Yuxuan Lin,
Ruihang Chu,
Zhenyu Chen,
Xiao Tang,
Lei Ke,
Haoling Li,
Yingji Zhong,
Zhihao Li,
Shiyong Liu,
Xiaofei Wu,
Jianzhuang Liu,
Yujiu Yang
Abstract:
Generative 3D reconstruction shows strong potential in incomplete observations. While sparse-view and single-image reconstruction are well-researched, partial observation remains underexplored. In this context, dense views are accessible only from a specific angular range, with other perspectives remaining inaccessible. This task presents two main challenges: (i) limited View Range: observations c…
▽ More
Generative 3D reconstruction shows strong potential in incomplete observations. While sparse-view and single-image reconstruction are well-researched, partial observation remains underexplored. In this context, dense views are accessible only from a specific angular range, with other perspectives remaining inaccessible. This task presents two main challenges: (i) limited View Range: observations confined to a narrow angular scope prevent effective traditional interpolation techniques that require evenly distributed perspectives. (ii) inconsistent Generation: views created for invisible regions often lack coherence with both visible regions and each other, compromising reconstruction consistency. To address these challenges, we propose \method, a novel training-free approach that integrates the local dense observations and multi-source priors for reconstruction. Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views. We further design an iterative refinement strategy, which uses the geometric structures of the object to enhance reconstruction quality. Extensive experiments on multiple datasets show the superiority of our method over SOTAs, especially in invisible regions.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Doping-induced Spin Reorientation in Kagome Magnet TmMn6Sn6
Authors:
Mohamed El Gazzah,
Po-Hao Chang,
Y. Lee,
Hari Bhandari,
Resham Regmi,
Xiuqan Zhou,
John F. Mitchell,
Liqin Ke,
Igor I. Mazin,
Nirmal J. Ghimire
Abstract:
The kagome-lattice compounds RMn6Sn6 (R is a rare earth element), where the Mn atoms form a kagome net in the basal plane, are currently attracting a great deal of attention as they have been shown to host complex magnetic textures and electronic topological states strongly sensitive to the choice of the R atom. Among the magnetic R atoms, TmMn6Sn6 orders with the easy-plane magnetization forming…
▽ More
The kagome-lattice compounds RMn6Sn6 (R is a rare earth element), where the Mn atoms form a kagome net in the basal plane, are currently attracting a great deal of attention as they have been shown to host complex magnetic textures and electronic topological states strongly sensitive to the choice of the R atom. Among the magnetic R atoms, TmMn6Sn6 orders with the easy-plane magnetization forming a complex magnetic spiral along the c-axis. Previous neutron studies, carried on polycrystalline, samples found that Ga doping changes the magnetic anisotropy from easy-plane to easy-axis. Here we present magnetic and magnetotransport measurements on a single crystal and first principles calculations in the doping series of TmMn6Sn6-xGax. We find that the magnetic properties are highly sensitive even to a small concentration of Ga. With minimal Ga substitution, the easy-plane anisotropy is maintained, which gradually changes to the easy-axis anisotropy with increasing Ga. We discuss these observations with respect to the effect of Ga doping on magnetocrystalline anisotropy and Tm crystal field
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
Authors:
Physical Intelligence,
Kevin Black,
Noah Brown,
James Darpinian,
Karan Dhabalia,
Danny Driess,
Adnan Esmail,
Michael Equi,
Chelsea Finn,
Niccolo Fusai,
Manuel Y. Galliker,
Dibya Ghosh,
Lachy Groom,
Karol Hausman,
Brian Ichter,
Szymon Jakubczak,
Tim Jones,
Liyiming Ke,
Devin LeBlanc,
Sergey Levine,
Adrian Li-Bell,
Mohith Mothukuri,
Suraj Nair,
Karl Pertsch,
Allen Z. Ren
, et al. (11 additional authors not shown)
Abstract:
In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks…
▽ More
In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks to enable broad generalization. $π_{0.5}$\ uses data from multiple robots, high-level semantic prediction, web data, and other sources to enable broadly generalizable real-world robotic manipulation. Our system uses a combination of co-training and hybrid multi-modal examples that combine image observations, language commands, object detections, semantic subtask prediction, and low-level actions. Our experiments show that this kind of knowledge transfer is essential for effective generalization, and we demonstrate for the first time that an end-to-end learning-enabled robotic system can perform long-horizon and dexterous manipulation skills, such as cleaning a kitchen or bedroom, in entirely new homes.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
Authors:
Bowei Zhang,
Lei Ke,
Adam W. Harley,
Katerina Fragkiadaki
Abstract:
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular RGB and RGB-D videos. TAPIP3D represents videos as camera-stabilized spatio-temporal feature clouds, leveraging depth and camera motion information to lift 2D video features into a 3D world space where camera movement is effectively canceled out. Within this stabilized 3D representation, TAPIP3D iteratively refines…
▽ More
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular RGB and RGB-D videos. TAPIP3D represents videos as camera-stabilized spatio-temporal feature clouds, leveraging depth and camera motion information to lift 2D video features into a 3D world space where camera movement is effectively canceled out. Within this stabilized 3D representation, TAPIP3D iteratively refines multi-frame motion estimates, enabling robust point tracking over long time horizons. To handle the irregular structure of 3D point distributions, we propose a 3D Neighborhood-to-Neighborhood (N2N) attention mechanism - a 3D-aware contextualization strategy that builds informative, spatially coherent feature neighborhoods to support precise trajectory estimation. Our 3D-centric formulation significantly improves performance over existing 3D point tracking methods and even surpasses state-of-the-art 2D pixel trackers in accuracy when reliable depth is available. The model supports inference in both camera-centric (unstabilized) and world-centric (stabilized) coordinates, with experiments showing that compensating for camera motion leads to substantial gains in tracking robustness. By replacing the conventional 2D square correlation windows used in prior 2D and 3D trackers with a spatially grounded 3D attention mechanism, TAPIP3D achieves strong and consistent results across multiple 3D point tracking benchmarks. Project Page: https://tapip3d.github.io
△ Less
Submitted 14 November, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
European Contributions to Fermilab Accelerator Upgrades and Facilities for the DUNE Experiment
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The Proton Improvement Plan (PIP-II) to the FNAL accelerator chain and the Long-Baseline Neutrino Facility (LBNF) will provide the world's most intense neutrino beam to the Deep Underground Neutrino Experiment (DUNE) enabling a wide-ranging physics program. This document outlines the significant contributions made by European national laboratories and institutes towards realizing the first phase o…
▽ More
The Proton Improvement Plan (PIP-II) to the FNAL accelerator chain and the Long-Baseline Neutrino Facility (LBNF) will provide the world's most intense neutrino beam to the Deep Underground Neutrino Experiment (DUNE) enabling a wide-ranging physics program. This document outlines the significant contributions made by European national laboratories and institutes towards realizing the first phase of the project with a 1.2 MW neutrino beam. Construction of this first phase is well underway. For DUNE Phase II, this will be closely followed by an upgrade of the beam power to > 2 MW, for which the European groups again have a key role and which will require the continued support of the European community for machine aspects of neutrino physics. Beyond the neutrino beam aspects, LBNF is also responsible for providing unique infrastructure to install and operate the DUNE neutrino detectors at FNAL and at the Sanford Underground Research Facility (SURF). The cryostats for the first two Liquid Argon Time Projection Chamber detector modules at SURF, a contribution of CERN to LBNF, are central to the success of the ongoing execution of DUNE Phase I. Likewise, successful and timely procurement of cryostats for two additional detector modules at SURF will be critical to the success of DUNE Phase II and the overall physics program. The DUNE Collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This paper is being submitted to the 'Accelerator technologies' and 'Projects and Large Experiments' streams. Additional inputs related to the DUNE science program, DUNE detector technologies and R&D, and DUNE software and computing, are also being submitted to other streams.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
DUNE Software and Computing Research and Development
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The ambitious physics program of Phase I and Phase II of DUNE is dependent upon deployment and utilization of significant computing res…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The ambitious physics program of Phase I and Phase II of DUNE is dependent upon deployment and utilization of significant computing resources, and successful research and development of software (both infrastructure and algorithmic) in order to achieve these scientific goals. This submission discusses the computing resources projections, infrastructure support, and software development needed for DUNE during the coming decades as an input to the European Strategy for Particle Physics Update for 2026. The DUNE collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This submission to the 'Computing' stream focuses on DUNE software and computing. Additional inputs related to the DUNE science program, DUNE detector technologies and R&D, and European contributions to Fermilab accelerator upgrades and facilities for the DUNE experiment, are also being submitted to other streams.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
The DUNE Phase II Detectors
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the previous European Strategy for Particle Physics. The construction of DUNE Phase I is well underway. DUNE Phase II consists of a third and fourth far detector module, an upgraded near detector complex, and an enhanced > 2 MW beam. The fourth FD module is conceived as a 'Module of Opportunity', aimed at supporting the core DUNE science program while also expanding the physics opportunities with more advanced technologies. The DUNE collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This submission to the 'Detector instrumentation' stream focuses on technologies and R&D for the DUNE Phase II detectors. Additional inputs related to the DUNE science program, DUNE software and computing, and European contributions to Fermilab accelerator upgrades and facilities for the DUNE experiment, are also being submitted to other streams.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
The DUNE Science Program
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1322 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy for the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the previous European Strategy for Particle Physics. The construction of DUNE Phase I is well underway. DUNE Phase II consists of a third and fourth far detector module, an upgraded near detector complex, and an enhanced > 2 MW beam. The fourth FD module is conceived as a 'Module of Opportunity', aimed at supporting the core DUNE science program while also expanding the physics opportunities with more advanced technologies. The DUNE collaboration is submitting four main contributions to the 2026 Update of the European Strategy for Particle Physics process. This submission to the 'Neutrinos and cosmic messengers', 'BSM physics' and 'Dark matter and dark sector' streams focuses on the physics program of DUNE. Additional inputs related to DUNE detector technologies and R&D, DUNE software and computing, and European contributions to Fermilab accelerator upgrades and facilities for the DUNE experiment, are also being submitted to other streams.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Machine learning predictions from unpredictable chaos
Authors:
Jian Jiang,
Long Chen,
Lu ke,
Bozheng Dou,
Yueying Zhu,
Yazhou Shi,
Huahai Qiu,
Bengong Zhang,
Tianshou Zhou,
Guo-Wei Wei
Abstract:
Chaos is omnipresent in nature, and its understanding provides enormous social and economic benefits. However, the unpredictability of chaotic systems is a textbook concept due to their sensitivity to initial conditions, aperiodic behavior, fractal dimensions, nonlinearity, and strange attractors. In this work, we introduce, for the first time, chaotic learning, a novel multiscale topological para…
▽ More
Chaos is omnipresent in nature, and its understanding provides enormous social and economic benefits. However, the unpredictability of chaotic systems is a textbook concept due to their sensitivity to initial conditions, aperiodic behavior, fractal dimensions, nonlinearity, and strange attractors. In this work, we introduce, for the first time, chaotic learning, a novel multiscale topological paradigm that enables accurate predictions from chaotic systems. We show that seemingly random and unpredictable chaotic dynamics counterintuitively offer unprecedented quantitative predictions. Specifically, we devise multiscale topological Laplacians to embed real-world data into a family of interactive chaotic dynamical systems, modulate their dynamical behaviors, and enable the accurate prediction of the input data. As a proof of concept, we consider 28 datasets from four categories of realistic problems: 10 brain waves, four benchmark protein datasets, 13 single-cell RNA sequencing datasets, and an image dataset, as well as two distinct chaotic dynamical systems, namely the Lorenz and Rossler attractors. We demonstrate chaotic learning predictions of the physical properties from chaos. Our new chaotic learning paradigm profoundly changes the textbook perception of chaos and bridges topology, chaos, and learning for the first time.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
ProReflow: Progressive Reflow with Decomposed Velocity
Authors:
Lei Ke,
Haohang Xu,
Xuefei Ning,
Yu Li,
Jiajun Li,
Haoling Li,
Yuxuan Lin,
Dongsheng Jiang,
Yujiu Yang,
Linfeng Zhang
Abstract:
Diffusion models have achieved significant progress in both image and video generation while still suffering from huge computation costs. As an effective solution, flow matching aims to reflow the diffusion process of diffusion models into a straight line for a few-step and even one-step generation. However, in this paper, we suggest that the original training pipeline of flow matching is not opti…
▽ More
Diffusion models have achieved significant progress in both image and video generation while still suffering from huge computation costs. As an effective solution, flow matching aims to reflow the diffusion process of diffusion models into a straight line for a few-step and even one-step generation. However, in this paper, we suggest that the original training pipeline of flow matching is not optimal and introduce two techniques to improve it. Firstly, we introduce progressive reflow, which progressively reflows the diffusion models in local timesteps until the whole diffusion progresses, reducing the difficulty of flow matching. Second, we introduce aligned v-prediction, which highlights the importance of direction matching in flow matching over magnitude matching. Experimental results on SDv1.5 and SDXL demonstrate the effectiveness of our method, for example, conducting on SDv1.5 achieves an FID of 10.70 on MSCOCO2014 validation set with only 4 sampling steps, close to our teacher model (32 DDIM steps, FID = 10.05).
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
Authors:
Lucy Xiaoyang Shi,
Brian Ichter,
Michael Equi,
Liyiming Ke,
Karl Pertsch,
Quan Vuong,
James Tanner,
Anna Walling,
Haohuan Wang,
Niccolo Fusai,
Adrian Li-Bell,
Danny Driess,
Lachy Groom,
Sergey Levine,
Chelsea Finn
Abstract:
Generalist robots that can perform a range of different tasks in open-world settings must be able to not only reason about the steps needed to accomplish their goals, but also process complex instructions, prompts, and even feedback during task execution. Intricate instructions (e.g., "Could you make me a vegetarian sandwich?" or "I don't like that one") require not just the ability to physically…
▽ More
Generalist robots that can perform a range of different tasks in open-world settings must be able to not only reason about the steps needed to accomplish their goals, but also process complex instructions, prompts, and even feedback during task execution. Intricate instructions (e.g., "Could you make me a vegetarian sandwich?" or "I don't like that one") require not just the ability to physically perform the individual steps, but the ability to situate complex commands and feedback in the physical world. In this work, we describe a system that uses vision-language models in a hierarchical structure, first reasoning over complex prompts and user feedback to deduce the most appropriate next step to fulfill the task, and then performing that step with low-level actions. In contrast to direct instruction following methods that can fulfill simple commands ("pick up the cup"), our system can reason through complex prompts and incorporate situated feedback during task execution ("that's not trash"). We evaluate our system across three robotic platforms, including single-arm, dual-arm, and dual-arm mobile robots, demonstrating its ability to handle tasks such as cleaning messy tables, making sandwiches, and grocery shopping. Videos are available at https://www.pi.website/research/hirobot
△ Less
Submitted 15 July, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration
Authors:
Shaola Ren,
Li Ke,
Longtao Huang,
Dehong Gao,
Hui Xue
Abstract:
Automatically extracting effective queries is challenging in information retrieval, especially in toxic content exploration, as such content is likely to be disguised. With the recent achievements in generative Large Language Model (LLM), we are able to leverage the capabilities of LLMs to extract effective queries for similar content exploration directly. This study proposes QExplorer, an approac…
▽ More
Automatically extracting effective queries is challenging in information retrieval, especially in toxic content exploration, as such content is likely to be disguised. With the recent achievements in generative Large Language Model (LLM), we are able to leverage the capabilities of LLMs to extract effective queries for similar content exploration directly. This study proposes QExplorer, an approach of large language model based Query Extraction for toxic content Exploration. The QExplorer approach involves a 2-stage training process: instruction Supervised FineTuning (SFT) and preference alignment using Direct Preference Optimization (DPO), as well as the datasets construction with feedback of search system. To verify the effectiveness of QExplorer, a series of offline and online experiments are conducted on our real-world system. The offline empirical results demonstrate that the performance of our automatic query extraction outperforms that of several LLMs and humans. The online deployment shows a significant increase in the detection of toxic items.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Neutrino Interaction Vertex Reconstruction in DUNE with Pandora Deep Learning
Authors:
DUNE Collaboration,
A. Abed Abud,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
F. Alemanno,
N. S. Alex,
K. Allison,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
A. Aman,
H. Amar,
P. Amedo,
J. Anderson,
C. Andreopoulos
, et al. (1313 additional authors not shown)
Abstract:
The Pandora Software Development Kit and algorithm libraries perform reconstruction of neutrino interactions in liquid argon time projection chamber detectors. Pandora is the primary event reconstruction software used at the Deep Underground Neutrino Experiment, which will operate four large-scale liquid argon time projection chambers at the far detector site in South Dakota, producing high-resolu…
▽ More
The Pandora Software Development Kit and algorithm libraries perform reconstruction of neutrino interactions in liquid argon time projection chamber detectors. Pandora is the primary event reconstruction software used at the Deep Underground Neutrino Experiment, which will operate four large-scale liquid argon time projection chambers at the far detector site in South Dakota, producing high-resolution images of charged particles emerging from neutrino interactions. While these high-resolution images provide excellent opportunities for physics, the complex topologies require sophisticated pattern recognition capabilities to interpret signals from the detectors as physically meaningful objects that form the inputs to physics analyses. A critical component is the identification of the neutrino interaction vertex. Subsequent reconstruction algorithms use this location to identify the individual primary particles and ensure they each result in a separate reconstructed particle. A new vertex-finding procedure described in this article integrates a U-ResNet neural network performing hit-level classification into the multi-algorithm approach used by Pandora to identify the neutrino interaction vertex. The machine learning solution is seamlessly integrated into a chain of pattern-recognition algorithms. The technique substantially outperforms the previous BDT-based solution, with a more than 20\% increase in the efficiency of sub-1\,cm vertex reconstruction across all neutrino flavours.
△ Less
Submitted 26 June, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.