-
Gravitational Gertsenshtein-Zeldovich mechanism for the Association between GW190425 and FRB 20190425A
Authors:
Shao-Qin Wu,
Jing-Rui Zhang,
Rong-Gen Cai,
Bing Zhang,
Yun-Long Zhang
Abstract:
The temporal and spatial coincidence between the gravitational wave (GW) event GW190425 and the fast radio burst (FRB) event FRB 20190425A raises the intriguing possibility of a physical connection between the two. The widely discussed possibility invoking the collapse of a supermassive neutron star as the merger product suffers the inconsistency between the model prediction and the measured incli…
▽ More
The temporal and spatial coincidence between the gravitational wave (GW) event GW190425 and the fast radio burst (FRB) event FRB 20190425A raises the intriguing possibility of a physical connection between the two. The widely discussed possibility invoking the collapse of a supermassive neutron star as the merger product suffers the inconsistency between the model prediction and the measured inclination angle of the system. Here, we propose a novel physical mechanism to account for the association. We envisage a magnetar located at about 2.5 light hours away from the binary neutron star merger site. The kiloherz GWs generated by the merger are converted into kiloherz electromagnetic (EM) radiation via the Gertsenshtein-Zeldovich (GZ) effect near the magnetar. Subsequent inverse Compton scattering off the kilohertz EM waves by relativistic particles generates the observed gigahertz FRB emission. Our calculation reveals that, with appropriate parameter choices, the properties of FRB 20190425A can be reproduced.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Authors:
NVIDIA,
:,
Aakshita Chandiramani,
Aaron Blakeman,
Abdullahi Olaoye,
Abhibha Gupta,
Abhilash Somasamudramath,
Abhinav Khattar,
Adeola Adesoba,
Adi Renduchintala,
Adil Asif,
Aditya Agrawal,
Aditya Vavre,
Ahmad Kiswani,
Aishwarya Padmakumar,
Ajay Hotchandani,
Akanksha Shukla,
Akhiad Bercovich,
Aleksander Ficek,
Aleksandr Shaposhnikov,
Alex Gronskiy,
Alex Kondratenko,
Alex Neefus,
Alex Steiner,
Alex Yang
, et al. (522 additional authors not shown)
Abstract:
We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, a…
▽ More
We describe the pre-training, post-training, and quantization of Nemotron 3 Super, a 120 billion (active 12 billion) parameter hybrid Mamba-Attention Mixture-of-Experts model. Nemotron 3 Super is the first model in the Nemotron 3 family to 1) be pre-trained in NVFP4, 2) leverage LatentMoE, a new Mixture-of-Experts architecture that optimizes for both accuracy per FLOP and accuracy per parameter, and 3) include MTP layers for inference acceleration through native speculative decoding. We pre-trained Nemotron 3 Super on 25 trillion tokens followed by post-training using supervised fine tuning (SFT) and reinforcement learning (RL). The final model supports up to 1M context length and achieves comparable accuracy on common benchmarks, while also achieving up to 2.2x and 7.5x higher inference throughput compared to GPT-OSS-120B and Qwen3.5-122B, respectively. Nemotron 3 Super datasets, along with the base, post-trained, and quantized checkpoints, are open-sourced on HuggingFace.
△ Less
Submitted 14 April, 2026;
originally announced April 2026.
-
Detecting Chiral Gravitational Wave Background with a Dipole Pulsar Timing Array
Authors:
Baoyu Xu,
Hanyu Jiang,
Rong-Gen Cai,
Misao Sasaki,
Yun-Long Zhang
Abstract:
The pulsar timing array (PTA) is a powerful technique for detecting nanohertz gravitational wave backgrounds (GWBs). However, conventional PTAs lack sensitivity to parity violation in the GWB. In this work, we propose a dipole pulsar timing array system (dPTA). By deriving the overlap reduction functions (ORFs) from the cross-correlation of timing signals, we find that this system exhibits sensiti…
▽ More
The pulsar timing array (PTA) is a powerful technique for detecting nanohertz gravitational wave backgrounds (GWBs). However, conventional PTAs lack sensitivity to parity violation in the GWB. In this work, we propose a dipole pulsar timing array system (dPTA). By deriving the overlap reduction functions (ORFs) from the cross-correlation of timing signals, we find that this system exhibits sensitivity to chiral GWBs in the nanohertz regime. Furthermore, through numerical calculations of its sensitivity curves, we demonstrate that the dPTA extends the detectable frequency range of PTAs for GWBs from the nanohertz to the microhertz regime.
△ Less
Submitted 9 April, 2026;
originally announced April 2026.
-
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
Authors:
Qiyao Ma,
Dechen Gao,
Rui Cai,
Boqi Zhao,
Hanchu Zhou,
Junshan Zhang,
Zhe Zhao
Abstract:
Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values. While benchmarks for general response quality are prevalent, evaluating how well reward models account for individual user preferences remains an open challenge. To bridge this gap, we introduce Pers…
▽ More
Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values. While benchmarks for general response quality are prevalent, evaluating how well reward models account for individual user preferences remains an open challenge. To bridge this gap, we introduce Personalized RewardBench, a novel benchmark designed to rigorously assess reward models' capacity to model personalized preferences. We construct chosen and rejected response pairs based on strict adherence to (or violation of) user-specific rubrics, ensuring that preference distinctions are uniquely tailored to the individual. In particular, human evaluations confirm that the primary discriminative factor between pairs is strictly personal preference, with both responses maintaining high general quality (e.g., correctness, relevance and helpfulness). Extensive testing reveals that existing state-of-the-art reward models struggle significantly with personalization, peaking at an accuracy of just 75.94%. Crucially, because an effective reward model benchmark should predict a reward model's performance on downstream tasks, we conduct experiments demonstrating that our benchmark exhibits a significantly higher correlation with downstream performance in both Best-of-N (BoN) sampling and Proximal Policy Optimization (PPO) compared to existing baselines. These findings establish Personalized RewardBench as a robust and accurate proxy for evaluating reward models' performance in downstream applications.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
Authors:
Qihan Ren,
Peng Wang,
Ruikun Cai,
Shuai Shao,
Dadi Guo,
Yuejin Xie,
Yafu Li,
Quanshi Zhang,
Xia Hu,
Jing Shao,
Dongrui Liu
Abstract:
A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by optimization dynamics, training data, and base-model capability. Some reported failu…
▽ More
A prevailing narrative in LLM post-training holds that supervised finetuning (SFT) memorizes while reinforcement learning (RL) generalizes. We revisit this claim for reasoning SFT with long chain-of-thought (CoT) supervision and find that cross-domain generalization is not absent but conditional, jointly shaped by optimization dynamics, training data, and base-model capability. Some reported failures are under-optimization artifacts: cross-domain performance first degrades before recovering and improving with extended training (a dip-and-recovery pattern), so shorttraining checkpoints can underestimate generalization. Data quality and structure both matter: low-quality solutions broadly hurt generalization,while verified long-CoT traces yield consistent cross-domain gains. Model capability is essential: stronger models internalize transferable procedural patterns (e.g., backtracking) even from a toy arithmetic game, while weaker ones imitate surface verbosity. This generalization is asymmetric, however: reasoning improves while safety degrades, reframing the question from whether reasoning SFT generalizes to under what conditions and at what cost.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs
Authors:
Wenhui Zhu,
Xuanzhao Dong,
Xiwen Chen,
Rui Cai,
Peijie Qiu,
Zhipeng Wang,
Oana Frunza,
Shao Tang,
Jindong Gu,
Yalin Wang
Abstract:
The rapid deployment of open-source frameworks has significantly advanced the development of modern multi-agent systems. However, expanded action spaces, including uncontrolled privilege exposure and hidden inter-system interactions, pose severe security challenges. Specifically, Indirect Prompt Injections (IPI), which conceal malicious instructions within third-party content, can trigger unauthor…
▽ More
The rapid deployment of open-source frameworks has significantly advanced the development of modern multi-agent systems. However, expanded action spaces, including uncontrolled privilege exposure and hidden inter-system interactions, pose severe security challenges. Specifically, Indirect Prompt Injections (IPI), which conceal malicious instructions within third-party content, can trigger unauthorized actions such as data exfiltration during normal operations. While current security evaluations predominantly rely on isolated single-turn benchmarks, the systemic vulnerabilities of these agents within complex dynamic environments remain critically underexplored. To bridge this gap, we systematically evaluate six defense strategies against four sophisticated IPI attack vectors across nine LLM backbones. Crucially, we conduct our evaluation entirely within dynamic multi-step tool-calling environments to capture the true attack surface of modern autonomous agents. Moving beyond binary success rates, our multidimensional analysis reveals a pronounced fragility. Advanced injections successfully bypass nearly all baseline defenses, and some surface-level mitigations even produce counterproductive side effects. Furthermore, while agents execute malicious instructions almost instantaneously, their internal states exhibit abnormally high decision entropy. Motivated by this latent hesitation, we investigate Representation Engineering (RepE) as a robust detection strategy. By extracting hidden states at the tool-input position, we revealed that the RepE-based circuit breaker successfully identifies and intercepts unauthorized actions before the agent commits to them, achieving high detection accuracy across diverse LLM backbones. This study exposes the limitations of current IPI defenses and provides a highly practical paradigm for building resilient multi-agent architectures.
△ Less
Submitted 4 April, 2026;
originally announced April 2026.
-
Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation
Authors:
Daiwei Chen,
Zhoutong Fu,
Chengming Jiang,
Haichao Zhang,
Ran Zhou,
Tan Wang,
Chunnan Yao,
Guoyao Li,
Rui Cai,
Yihan Cao,
Ruijie Jiang,
Fedor Borisyuk,
Jianqiang Shen,
Jingwei Wu,
Ramya Korlakai Vinayak
Abstract:
Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their representations. We present a systematic analysis of this strategy: through spec…
▽ More
Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies on supervised fine-tuning to learn their representations. We present a systematic analysis of this strategy: through spectral and geometric diagnostics, we show that mean initialization collapses all new tokens into a degenerate subspace, erasing inter-token distinctions that subsequent fine-tuning struggles to fully recover. These findings suggest that \emph{token initialization} is a key bottleneck when extending LMs with new vocabularies. Motivated by this diagnosis, we propose the \emph{Grounded Token Initialization Hypothesis}: linguistically grounding novel tokens in the pretrained embedding space before fine-tuning better enables the model to leverage its general-purpose knowledge for novel-token domains. We operationalize this hypothesis as GTI (Grounded Token Initialization), a lightweight grounding stage that, prior to fine-tuning, maps new tokens to distinct, semantically meaningful locations in the pretrained embedding space using only paired linguistic supervision. Despite its simplicity, GTI outperforms both mean initialization and existing auxiliary-task adaptation methods in the majority of evaluation settings across multiple generative recommendation benchmarks, including industry-scale and public datasets. Further analyses show that grounded embeddings produce richer inter-token structure that persists through fine-tuning, corroborating the hypothesis that initialization quality is a key bottleneck in vocabulary extension.
△ Less
Submitted 2 April, 2026;
originally announced April 2026.
-
Non-minimally coupled quintessence with sign-switching interaction
Authors:
Jia-Qi Wang,
Rong-Gen Cai,
Zong-Kuan Guo,
Yun-He Li,
Shao-Jiang Wang,
Xin Zhang
Abstract:
We propose a new non-minimally coupled quintessence model to account for the late-time dark energy dynamics indicated by recent DESI measurements. Within this framework, the quintessence density begins to decrease only when it starts to dominate the universe, which naturally accounts for the late-time onset of dark energy weakening. The coupling also induces a sign change in the effective energy t…
▽ More
We propose a new non-minimally coupled quintessence model to account for the late-time dark energy dynamics indicated by recent DESI measurements. Within this framework, the quintessence density begins to decrease only when it starts to dominate the universe, which naturally accounts for the late-time onset of dark energy weakening. The coupling also induces a sign change in the effective energy transfer between dark matter and dark energy during cosmic evolution. While the scalar field itself remains canonical and never crosses the phantom divide, the modified evolution of the dark matter density gives rise to an effective crossing behavior in the observationally inferred dark energy sector. Compared with both $Λ$CDM and $w_0w_a$CDM models, our model is favored more strongly by current cosmological data. This work may provide a promising avenue for understanding the observational late-time weakening of dark energy and the origin of its dynamics.
△ Less
Submitted 9 April, 2026; v1 submitted 2 April, 2026;
originally announced April 2026.
-
Vacuum bubbles from cosmic ripples
Authors:
Zi-Yan Yuwen,
Rong-Gen Cai,
Shao-Jiang Wang
Abstract:
We investigate vacuum decays in the early Universe in the presence of curvature perturbations. For sufficiently large perturbations associated with over-densities, we find that the bounce solution develops an oscillating middle stage near the bubble wall. For small perturbations, we analytically show within the thin-wall approximation that an over- (under-) density would enhance (suppress) the vac…
▽ More
We investigate vacuum decays in the early Universe in the presence of curvature perturbations. For sufficiently large perturbations associated with over-densities, we find that the bounce solution develops an oscillating middle stage near the bubble wall. For small perturbations, we analytically show within the thin-wall approximation that an over- (under-) density would enhance (suppress) the vacuum decay rate with a smaller (larger) initial bubble radius. By numerically solving for the bounce solutions and evaluating the corresponding Euclidean action, we further confirm this behaviour in thick-wall cases. Our results indicate that over-densities can generically trigger vacuum decay at an earlier moment.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
An Intertwined Short and Long GRB with 4-minute Separation
Authors:
Liang Li,
Yu Wang,
Bing Zhang,
Ye Li,
Shu-Rui Zhang,
Jochen Greiner,
Zhi-Ping Jin,
Jin-Jun Geng,
Hou-Jun Lv,
Asaf Peer,
Maria Dainotti,
Tong Liu,
Yi-Zhong Fan,
Yong-Feng Huang,
Zi-Gao Dai,
Melin Kole,
Wei-Hua Lei,
Ye-Fei Yuan,
Shuang-Nan Zhang,
Felix Ryde,
She-Sheng Xue,
Rong-Gen Cai
Abstract:
Gamma-ray bursts (GRBs), the most energetic transients in the Universe, are traditionally classified into long-duration ($T_{90}>2$ s) and short-duration ($T_{90}<2$ s) events, associated with the core collapse of massive stars (Type II) and the merger of compact binary systems (Type I), respectively. The two classes exhibit distinct observational properties that serve as key diagnostic criteria f…
▽ More
Gamma-ray bursts (GRBs), the most energetic transients in the Universe, are traditionally classified into long-duration ($T_{90}>2$ s) and short-duration ($T_{90}<2$ s) events, associated with the core collapse of massive stars (Type II) and the merger of compact binary systems (Type I), respectively. The two classes exhibit distinct observational properties that serve as key diagnostic criteria for classification. Here we report GRB 160425A, a peculiar event comprising two sub-bursts separated by four minutes: a short-duration burst ($G_1$) and a long-duration burst ($G_2$). Nearly all standard prompt-emission diagnostics, including pulse morphology, duration, hardness ratio, minimum variability timescale, spectral properties, and established empirical correlations, consistently categorize $G_1$ as a short-like (Type I, merger-origin) and $G_2$ as a long-like (Type II, collapsar-origin) GRB. The coexistence of merger and collapsar signatures in a single event challenges existing progenitor frameworks and calls for a re-evaluation of GRB classification schemes and progenitor scenarios.
△ Less
Submitted 3 April, 2026; v1 submitted 30 March, 2026;
originally announced March 2026.
-
QuitoBench: A High-Quality Open Time Series Forecasting Benchmark
Authors:
Siqiao Xue,
Zhaoyang Zhu,
Wei Zhang,
Rongyao Cai,
Rui Wang,
Yixiang Mu,
Fan Zhou,
Jianguo Li,
Peng Di,
Hang Yu
Abstract:
Time series forecasting is critical across finance, healthcare, and cloud computing, yet progress is constrained by a fundamental bottleneck: the scarcity of large-scale, high-quality benchmarks. To address this gap, we introduce \textsc{QuitoBench}, a regime-balanced benchmark for time series forecasting with coverage across eight trend$\times$seasonality$\times$forecastability (TSF) regimes, des…
▽ More
Time series forecasting is critical across finance, healthcare, and cloud computing, yet progress is constrained by a fundamental bottleneck: the scarcity of large-scale, high-quality benchmarks. To address this gap, we introduce \textsc{QuitoBench}, a regime-balanced benchmark for time series forecasting with coverage across eight trend$\times$seasonality$\times$forecastability (TSF) regimes, designed to capture forecasting-relevant properties rather than application-defined domain labels. The benchmark is built upon \textsc{Quito}, a billion-scale time series corpus of application traffic from Alipay spanning nine business domains. Benchmarking 10 models from deep learning, foundation models, and statistical baselines across 232,200 evaluation instances, we report four key findings: (i) a context-length crossover where deep learning models lead at short context ($L=96$) but foundation models dominate at long context ($L \ge 576$); (ii) forecastability is the dominant difficulty driver, producing a $3.64 \times$ MAE gap across regimes; (iii) deep learning models match or surpass foundation models at $59 \times$ fewer parameters; and (iv) scaling the amount of training data provides substantially greater benefit than scaling model size for both model families. These findings are validated by strong cross-benchmark and cross-metric consistency. Our open-source release enables reproducible, regime-aware evaluation for time series forecasting research.
△ Less
Submitted 26 March, 2026;
originally announced March 2026.
-
GRMLR: Knowledge-Enhanced Small-Data Learning for Deep-Sea Cold Seep Stage Inference
Authors:
Chenxu Zhou,
Zelin Liu,
Rui Cai,
Houlin Gong,
Yikang Yu,
Jia Zeng,
Yanru Pei,
Liang Zhang,
Weishu Zhao,
Xiaofeng Gao
Abstract:
Deep-sea cold seep stage assessment has traditionally relied on costly, high-risk manned submersible operations and visual surveys of macrofauna. Although microbial communities provide a promising and more cost-effective alternative, reliable inference remains challenging because the available deep-sea dataset is extremely small ($n = 13$) relative to the microbial feature dimension ($p = 26$), ma…
▽ More
Deep-sea cold seep stage assessment has traditionally relied on costly, high-risk manned submersible operations and visual surveys of macrofauna. Although microbial communities provide a promising and more cost-effective alternative, reliable inference remains challenging because the available deep-sea dataset is extremely small ($n = 13$) relative to the microbial feature dimension ($p = 26$), making purely data-driven models highly prone to overfitting. To address this, we propose a knowledge-enhanced classification framework that incorporates an ecological knowledge graph as a structural prior. By fusing macro-microbe coupling and microbial co-occurrence patterns, the framework internalizes established ecological logic into a \underline{\textbf{G}}raph-\underline{\textbf{R}}egularized \underline{\textbf{M}}ultinomial \underline{\textbf{L}}ogistic \underline{\textbf{R}}egression (GRMLR) model, effectively constraining the feature space through a manifold penalty to ensure biologically consistent classification. Importantly, the framework removes the need for macrofauna observations at inference time: macro-microbe associations are used only to guide training, whereas prediction relies solely on microbial abundance profiles. Experimental results demonstrate that our approach significantly outperforms standard baselines, highlighting its potential as a robust and scalable framework for deep-sea ecological assessment.
△ Less
Submitted 25 March, 2026;
originally announced March 2026.
-
Thermodynamics of Kerr-Bertotti-Robinson black hole
Authors:
Li Hu,
Rong-Gen Cai,
Shao-Jiang Wang
Abstract:
We investigate the thermodynamic properties of the Kerr-Bertotti-Robinson black hole, an exact Petrov type D solution of Einstein-Maxwell theory describing a rotating black hole immersed in an external electromagnetic field. While the conserved angular momentum and electric charge can be computed straightforwardly, the conserved mass cannot be obtained through standard integrability methods due to…
▽ More
We investigate the thermodynamic properties of the Kerr-Bertotti-Robinson black hole, an exact Petrov type D solution of Einstein-Maxwell theory describing a rotating black hole immersed in an external electromagnetic field. While the conserved angular momentum and electric charge can be computed straightforwardly, the conserved mass cannot be obtained through standard integrability methods due to the nontrivial asymptotically uniform external electromagnetic field. To overcome this difficulty, we adopt the Christodoulou-Ruffini mass relation as a thermodynamic definition of the conserved mass, and identify the associated generator, thereby fixing the ambiguity in defining this conserved mass and constructing the thermodynamic potentials. These thermodynamic quantities naturally satisfy the first law of black-hole thermodynamics as well as the Smarr formula.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation
Authors:
Jingtao Wang,
Yucong Wang,
Jun Ding,
Rui Cai,
Xun Wang
Abstract:
Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in post-training techniques-especially training-free approaches that improve models at inference time without updating weights. Most training-free methods treat the model as a black box and improve outputs via input/output-level interventions, such as pr…
▽ More
Large language models (LLMs) achieve remarkable performance, yet further gains often require costly training. This has motivated growing interest in post-training techniques-especially training-free approaches that improve models at inference time without updating weights. Most training-free methods treat the model as a black box and improve outputs via input/output-level interventions, such as prompt design and test-time scaling through repeated sampling, reranking/verification, or search. In contrast, they rarely offer a plug-and-play mechanism to intervene in a model's internal computation. We propose ARACH(Attention Reallocation via an Adaptive Context Hub), a training-free inference-time plug-in that augments LLMs with an adaptive context hub to aggregate context and reallocate attention. Extensive experiments across multiple language modeling tasks show consistent improvements with modest inference overhead and no parameter updates. Attention analyses further suggest that ARACH mitigates the attention sink phenomenon. These results indicate that engineering a model's internal computation offers a distinct inference-time strategy, fundamentally different from both prompt-based test-time methods and training-based post-training approaches.
△ Less
Submitted 10 March, 2026;
originally announced March 2026.
-
Atomic-resolution imaging of gold species at organic liquid-solid interfaces
Authors:
Sam Sullivan-Allsop,
Nick Clark,
Wendong Wang,
Rongsheng Cai,
William Thornley,
David G. Hopkinson,
James G. McHugh,
Ben Davies,
Samuel Pattisson,
Nicholas F. Dummer,
Rui Zhang,
Matthew Lindley,
Gareth Tainton,
Jack Harrison,
Hugo De Latour,
Joseph Parker,
Joshua Swindell,
Eli G. Castanon,
Amy Carl,
David J. Lewis,
Natalia Martsinovich,
Christopher S. Allen,
Mohsen Danaie,
Andrew J. Logsdail,
Vladimir Falko
, et al. (4 additional authors not shown)
Abstract:
Understanding solid-liquid interfaces at the atomic-scale is key to improved performance of heterogeneous catalysts, electrodes and membranes. Here we combine unique specimen design, record atomic resolution in situ electron microscopy, and artificial intelligence-enabled analysis to achieve a step change in quantitative understanding of interfacial atomic behaviour. We create the first graphene l…
▽ More
Understanding solid-liquid interfaces at the atomic-scale is key to improved performance of heterogeneous catalysts, electrodes and membranes. Here we combine unique specimen design, record atomic resolution in situ electron microscopy, and artificial intelligence-enabled analysis to achieve a step change in quantitative understanding of interfacial atomic behaviour. We create the first graphene liquid cells with organic solvents and employ them to track over 106 gold adatoms and clusters at a graphene surface immersed in acetone and cyclohexanone. We reveal dynamic correlated behaviour of gold adatom monomers, dimers, trimers and clusters, strongly influenced by each other, the solvent properties, and the atomic lattice of the substrate, in good agreement with theoretical calculations. We use the results to interpret differences in catalytic activity towards the industrially important acetylene hydrochlorination reaction. This new capability for exploration of atomic scale chemistry could enable rational design of future catalysts, membranes and electrodes with improved functionality.
△ Less
Submitted 9 March, 2026;
originally announced March 2026.
-
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL
Authors:
Bingfeng Chen,
Shaobin Shi,
Yongqi Luo,
Boyan Xu,
Ruichu Cai,
Zhifeng Hao
Abstract:
Generative language models have shown significant potential in single-turn Text-to-SQL. However, their performance does not extend equivalently to multi-turn Text-to-SQL. This is primarily due to generative language models' inadequacy in handling the complexities of context information and dynamic schema linking in multi-turn interactions. In this paper, we propose a framework named Track-SQL, whi…
▽ More
Generative language models have shown significant potential in single-turn Text-to-SQL. However, their performance does not extend equivalently to multi-turn Text-to-SQL. This is primarily due to generative language models' inadequacy in handling the complexities of context information and dynamic schema linking in multi-turn interactions. In this paper, we propose a framework named Track-SQL, which enhances generative language models with dual-extractive modules designed to track schema and contextual changes in multi-turn Text-to-SQL. Specifically, Track-SQL incorporates a \emph{Semantic-enhanced Schema Extractor} and a \emph{Schema-aware Context Extractor}. Experimental results demonstrate that Track-SQL achieves state-of-the-art performance on the SparC and CoSQL datasets. Furthermore, detailed ablation studies reveal that Track-SQL significantly improves execution accuracy in multi-turn interactions by 7.1\% and 9.55\% on these datasets, respectively. Our implementation will be open-sourced at https://github.com/DMIRLAB-Group/Track-SQL.
△ Less
Submitted 6 March, 2026;
originally announced March 2026.
-
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space
Authors:
Peihao Wang,
Ruisi Cai,
Zhen Wang,
Hongyuan Mei,
Qiang Liu,
Pan Li,
Zhangyang Wang
Abstract:
Scaling inference-time compute for Large Language Models (LLMs) has unlocked unprecedented reasoning capabilities. However, existing inference-time scaling methods typically rely on inefficient and suboptimal discrete search algorithms or trial-and-error prompting to improve the online policy. In this paper, we propose $\nabla$-Reasoner, an iterative generation framework that integrates differenti…
▽ More
Scaling inference-time compute for Large Language Models (LLMs) has unlocked unprecedented reasoning capabilities. However, existing inference-time scaling methods typically rely on inefficient and suboptimal discrete search algorithms or trial-and-error prompting to improve the online policy. In this paper, we propose $\nabla$-Reasoner, an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop to refine the policy on the fly. Our core component, Differentiable Textual Optimization (DTO), leverages gradient signals from both the LLM's likelihood and a reward model to refine textual representations. $\nabla$-Reasoner further incorporates rejection sampling and acceleration design to robustify and speed up decoding. Theoretically, we show that performing inference-time gradient descent in the sample space to maximize reward is dual to aligning an LLM policy via KL-regularized reinforcement learning. Empirically, $\nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark, while reducing number of model calls by approximately 10-40% compared to strong baselines. Overall, our work introduces a paradigm shift from zeroth-order search to first-order optimization at test time, offering a cost-effective path to amplify LLM reasoning.
△ Less
Submitted 5 March, 2026;
originally announced March 2026.
-
AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents
Authors:
Wenhui Zhu,
Xiwen Chen,
Zhipeng Wang,
Jingjing Wang,
Xuanzhao Dong,
Minzhou Huang,
Rui Cai,
Hejian Sang,
Hao Wang,
Peijie Qiu,
Yueyue Deng,
Prayag Tiwari,
Brendan Hogan Rappazzo,
Yalin Wang
Abstract:
Long-horizon LLM agents require memory systems that remain accurate under fixed context budgets. However, existing systems struggle with two persistent challenges in long-term dialogue: (i) \textbf{disconnected evidence}, where multi-hop answers require linking facts distributed across time, and (ii) \textbf{state updates}, where evolving information (e.g., schedule changes) creates conflicts with…
▽ More
Long-horizon LLM agents require memory systems that remain accurate under fixed context budgets. However, existing systems struggle with two persistent challenges in long-term dialogue: (i) \textbf{disconnected evidence}, where multi-hop answers require linking facts distributed across time, and (ii) \textbf{state updates}, where evolving information (e.g., schedule changes) creates conflicts with older static logs. We propose AriadneMem, a structured memory system that addresses these failure modes via a decoupled two-phase pipeline. In the \textbf{offline construction phase}, AriadneMem employs \emph{entropy-aware gating} to filter noise and low-information message before LLM extraction and applies \emph{conflict-aware coarsening} to merge static duplicates while preserving state transitions as temporal edges. In the \textbf{online reasoning phase}, rather than relying on expensive iterative planning, AriadneMem executes \emph{algorithmic bridge discovery} to reconstruct missing logical paths between retrieved facts, followed by \emph{single-call topology-aware synthesis}. On LoCoMo experiments with GPT-4o, AriadneMem improves \textbf{Multi-Hop F1 by 15.2\%} and \textbf{Average F1 by 9.0\%} over strong baselines. Crucially, by offloading reasoning to the graph layer, AriadneMem reduces \textbf{total runtime by 77.8\%} using only \textbf{497} context tokens. The code is available at https://github.com/LLM-VLM-GSL/AriadneMem.
△ Less
Submitted 5 February, 2026;
originally announced March 2026.
-
Hierarchical Action Learning for Weakly-Supervised Action Segmentation
Authors:
Junxian Huang,
Ruichu Cai,
Hao Zhu,
Juntao Fang,
Boyan Xu,
Weilin Chen,
Zijian Li,
Shenghua Gao
Abstract:
Humans perceive actions through key transitions that structure actions across multiple abstraction levels, whereas machines, relying on visual features, tend to over-segment. This highlights the difficulty of enabling hierarchical reasoning in video understanding. Interestingly, we observe that lower-level visual and high-level action latent variables evolve at different rates, with low-level visu…
▽ More
Humans perceive actions through key transitions that structure actions across multiple abstraction levels, whereas machines, relying on visual features, tend to over-segment. This highlights the difficulty of enabling hierarchical reasoning in video understanding. Interestingly, we observe that lower-level visual and high-level action latent variables evolve at different rates, with low-level visual variables changing rapidly, while high-level action variables evolve more slowly, making them easier to identify. Building on this insight, we propose the Hierarchical Action Learning (\textbf{HAL}) model for weakly-supervised action segmentation. Our approach introduces a hierarchical causal data generation process, where high-level latent action governs the dynamics of low-level visual features. To model these varying timescales effectively, we introduce deterministic processes to align these latent variables over time. The \textbf{HAL} model employs a hierarchical pyramid transformer to capture both visual features and latent variables, and a sparse transition constraint is applied to enforce the slower dynamics of high-level action variables. This mechanism enhances the identification of these latent variables over time. Under mild assumptions, we prove that these latent action variables are strictly identifiable. Experimental results on several benchmarks show that the \textbf{HAL} model significantly outperforms existing methods for weakly-supervised action segmentation, confirming its practical effectiveness in real-world applications.
△ Less
Submitted 27 February, 2026;
originally announced February 2026.
-
Pulse-resolved Classification and Characteristics of Long-duration GRBs with \emph{Swift}-BAT Data.II. Main Burst versus Extended Emission
Authors:
Liang Li,
Xiao Wang,
Zhi-Li Cui,
Cheng-Long Xiao,
Wen Li,
Yu Wang,
Zi-Gao Dai,
Rong-Gen Cai
Abstract:
Long gamma-ray bursts (GRBs) frequently exhibit complex prompt emission structures with multiple temporally distinct episodes, such as a main emission (ME) phase followed by a weak extended emission (EE) tail. Whether these subcomponents from a common physical origin with similar classification properties, or instead represent fundamentally different emission mechanisms within a single event, rema…
▽ More
Long gamma-ray bursts (GRBs) frequently exhibit complex prompt emission structures with multiple temporally distinct episodes, such as a main emission (ME) phase followed by a weak extended emission (EE) tail. Whether these subcomponents from a common physical origin with similar classification properties, or instead represent fundamentally different emission mechanisms within a single event, remains an open question. Here, we present a systematic, pulse-resolved analysis of 22 \emph{Swift}/BAT long-duration GRBs, each exhibiting a well-separated, bright ME ($G_1$) followed by a fainter EE ($G_2$) after a background-consistent quiescent gap. For each component, we independently measure standard classification diagnostics, including duration ($T_{90}$), spectral hardness ratio (HR), minimum variability timescale (MVT), and spectral lag. We then compare these properties between the ME and EE within individual bursts. We find that the EE is systematically softer (lower HR in 19 of 22 events), smoother (longer MVT in 17 of 22 events), and more diverse in spectral lag than the ME. However, both components still occupy the long-GRB track in the traditional duration-hardness and duration-MVT planes, indicating a common Type~II (collapsar) origin. These results suggest that the EE in long GRBs represents a physically distinct regime of the central engine, characterized by a lower luminosity, longer emission timescales, and evolved spectral properties, rather than a simple continuation of the main burst. This picture is consistent with late-time fallback accretion onto a black hole or proto-magnetar spin-down.
△ Less
Submitted 26 February, 2026;
originally announced February 2026.
-
Parallel Complex Diffusion for Scalable Time Series Generation
Authors:
Rongyao Cai,
Yuxi Wan,
Kexin Zhang,
Ming Jin,
Zhiqiang Ge,
Qingsong Wen,
Yong Liu
Abstract:
Modeling long-range dependencies in time series generation poses a fundamental trade-off between representational capacity and computational efficiency. Traditional temporal diffusion models suffer from local entanglement and the $\mathcal{O}(L^2)$ cost of attention mechanisms. We address these limitations by introducing PaCoDi (Parallel Complex Diffusion), a spectral-native architecture that deco…
▽ More
Modeling long-range dependencies in time series generation poses a fundamental trade-off between representational capacity and computational efficiency. Traditional temporal diffusion models suffer from local entanglement and the $\mathcal{O}(L^2)$ cost of attention mechanisms. We address these limitations by introducing PaCoDi (Parallel Complex Diffusion), a spectral-native architecture that decouples generative modeling in the frequency domain. PaCoDi fundamentally alters the problem topology: the Fourier Transform acts as a diagonalizing operator, converting locally coupled temporal signals into globally decorrelated spectral components. Theoretically, we prove the Quadrature Forward Diffusion and Conditional Reverse Factorization theorem, demonstrating that the complex diffusion process can be split into independent real and imaginary branches. We bridge the gap between this decoupled theory and data reality using a \textbf{Mean Field Theory (MFT) approximation} reinforced by an interactive correction mechanism. Furthermore, we generalize this discrete DDPM to continuous-time Frequency SDEs, rigorously deriving the Spectral Wiener Process describe the differential spectral Brownian motion limit. Crucially, PaCoDi exploits the Hermitian Symmetry of real-valued signals to compress the sequence length by half, achieving a 50% reduction in attention FLOPs without information loss. We further derive a rigorous Heteroscedastic Loss to handle the non-isotropic noise distribution on the compressed manifold. Extensive experiments show that PaCoDi outperforms existing baselines in both generation quality and inference speed, offering a theoretically grounded and computationally efficient solution for time series modeling.
△ Less
Submitted 10 February, 2026;
originally announced February 2026.
-
Wearable AR for Restorative Breaks: How Interactive Narrative Experiences Support Relaxation for Young Adults
Authors:
Jindu Wang,
Runze Cai,
Shuchang Xu,
Tianrui Hu,
Huamin Qu,
Shengdong Zhao,
Ling-Ping Yuan
Abstract:
Young adults often take breaks from screen-intensive work by consuming digital content on mobile phones, which undermines rest through visual fatigue and inactivity. We introduce a design framework that embeds light break activities into media content on AR smart glasses, balancing engagement and recovery. The framework employs three strategies: (1) seamlessly guiding users by embedding activity c…
▽ More
Young adults often take breaks from screen-intensive work by consuming digital content on mobile phones, which undermines rest through visual fatigue and inactivity. We introduce a design framework that embeds light break activities into media content on AR smart glasses, balancing engagement and recovery. The framework employs three strategies: (1) seamlessly guiding users by embedding activity cues aligned with media elements; (2) transitioning to audio-centric formats to reduce visual load while sustaining immersion; and (3) structuring sessions with "rise-peak-closure" pacing for smooth transitions. In a within-subjects study (N = 16) comparing passive viewing, reminder-based breaks, and non-narrative activities, InteractiveBreak instantiated from our framework seamlessly guided activities, sustained engagement, and enhanced break quality. These findings demonstrate wearable AR's potential to support restorative relaxation by transforming breaks into engaging and meaningful experiences.
△ Less
Submitted 18 February, 2026;
originally announced February 2026.
-
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution
Authors:
Rui Cai,
Jun Guo,
Xinze He,
Piaopiao Jin,
Jie Li,
Bingxuan Lin,
Futeng Liu,
Wei Liu,
Fei Ma,
Kun Ma,
Feng Qiu,
Heng Qu,
Yifei Su,
Qiao Sun,
Dong Wang,
Donghao Wang,
Yunhong Wang,
Rujie Wu,
Diyun Xiang,
Yu Yang,
Hangjun Ye,
Yuan Zhang,
Quanyun Zhou
Abstract:
In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. The key to our method lies in a carefully designed training recipe and deployment strategy. Xiaomi-Robotics-0 is first pre-trained on large-scale cross-embodiment robot trajectories and vision-language data, endowing it with broad an…
▽ More
In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. The key to our method lies in a carefully designed training recipe and deployment strategy. Xiaomi-Robotics-0 is first pre-trained on large-scale cross-embodiment robot trajectories and vision-language data, endowing it with broad and generalizable action-generation capabilities while avoiding catastrophic forgetting of the visual-semantic knowledge of the underlying pre-trained VLM. During post-training, we propose several techniques for training the VLA model for asynchronous execution to address the inference latency during real-robot rollouts. During deployment, we carefully align the timesteps of consecutive predicted action chunks to ensure continuous and seamless real-time rollouts. We evaluate Xiaomi-Robotics-0 extensively in simulation benchmarks and on two challenging real-robot tasks that require precise and dexterous bimanual manipulation. Results show that our method achieves state-of-the-art performance across all simulation benchmarks. Moreover, Xiaomi-Robotics-0 can roll out fast and smoothly on real robots using a consumer-grade GPU, achieving high success rates and throughput on both real-robot tasks. To facilitate future research, code and model checkpoints are open-sourced at https://xiaomi-robotics-0.github.io
△ Less
Submitted 25 March, 2026; v1 submitted 13 February, 2026;
originally announced February 2026.
-
CausalAgent: A Conversational Multi-Agent System for End-to-End Causal Inference
Authors:
Jiawei Zhu,
Wei Chen,
Ruichu Cai
Abstract:
Causal inference holds immense value in fields such as healthcare, economics, and social sciences. However, traditional causal analysis workflows impose significant technical barriers, requiring researchers to possess dual backgrounds in statistics and computer science, while manually selecting algorithms, handling data quality issues, and interpreting complex results. To address these challenges,…
▽ More
Causal inference holds immense value in fields such as healthcare, economics, and social sciences. However, traditional causal analysis workflows impose significant technical barriers, requiring researchers to possess dual backgrounds in statistics and computer science, while manually selecting algorithms, handling data quality issues, and interpreting complex results. To address these challenges, we propose CausalAgent, a conversational multi-agent system for end-to-end causal inference. The system innovatively integrates Multi-Agent Systems (MAS), Retrieval-Augmented Generation (RAG), and the Model Context Protocol (MCP) to achieve automation from data cleaning and causal structure learning to bias correction and report generation through natural language interaction. Users need only upload a dataset and pose questions in natural language to receive a rigorous, interactive analysis report. As a novel user-centered human-AI collaboration paradigm, CausalAgent explicitly models the analysis workflow. By leveraging interactive visualizations, it significantly lowers the barrier to entry for causal analysis while ensuring the rigor and interpretability of the process.
△ Less
Submitted 11 February, 2026;
originally announced February 2026.
-
Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner
Authors:
Qian-Wei Wang,
Guanghao Meng,
Ren Cai,
Yaguang Song,
Shu-Tao Xia
Abstract:
Large-scale vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization, but adapting them to downstream tasks typically requires costly labeled data. Existing unsupervised self-training methods rely on pseudo-labeling, yet often suffer from unreliable confidence filtering, confirmation bias, and underutilization of low-confidence samples. We propose Collaborative Fine-Tunin…
▽ More
Large-scale vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization, but adapting them to downstream tasks typically requires costly labeled data. Existing unsupervised self-training methods rely on pseudo-labeling, yet often suffer from unreliable confidence filtering, confirmation bias, and underutilization of low-confidence samples. We propose Collaborative Fine-Tuning (CoFT), an unsupervised adaptation framework that leverages unlabeled data through a dual-model, cross-modal collaboration mechanism. CoFT introduces a dual-prompt learning strategy with positive and negative textual prompts to explicitly model pseudo-label cleanliness in a sample-dependent manner, removing the need for hand-crafted thresholds or noise assumptions. The negative prompt also regularizes lightweight visual adaptation modules, improving robustness under noisy supervision. CoFT employs a two-phase training scheme, transitioning from parameter-efficient fine-tuning on high-confidence samples to full fine-tuning guided by collaboratively filtered pseudo-labels. Building on CoFT, CoFT+ further enhances adaptation via iterative fine-tuning, momentum contrastive learning, and LLM-generated prompts. Extensive experiments demonstrate consistent gains over existing unsupervised methods and even few-shot supervised baselines.
△ Less
Submitted 4 February, 2026;
originally announced February 2026.
-
Rethinking Zero-Shot Time Series Classification: From Task-specific Classifiers to In-Context Inference
Authors:
Juntao Fang,
Shifeng Xie,
Shengbin Nie,
Yuhui Ling,
Yuming Liu,
Zijian Li,
Keli Zhang,
Lujia Pan,
Themis Palpanas,
Ruichu Cai
Abstract:
The zero-shot evaluation of time series foundation models (TSFMs) for classification typically uses a frozen encoder followed by a task-specific classifier. However, this practice violates the training-free premise of zero-shot deployment and introduces evaluation bias due to classifier-dependent training choices. To address this issue, we propose TIC-FM, an in-context learning framework that trea…
▽ More
The zero-shot evaluation of time series foundation models (TSFMs) for classification typically uses a frozen encoder followed by a task-specific classifier. However, this practice violates the training-free premise of zero-shot deployment and introduces evaluation bias due to classifier-dependent training choices. To address this issue, we propose TIC-FM, an in-context learning framework that treats the labeled training set as context and predicts labels for all test instances in a single forward pass, without parameter updates. TIC-FM pairs a time series encoder and a lightweight projection adapter with a split-masked latent memory Transformer. We further provide theoretical justification that in-context inference can subsume trained classifiers and can emulate gradient-based classifier training within a single forward pass. Experiments on 128 UCR datasets show strong accuracy, with consistent gains in the extreme low-label situation, highlighting training-free transfer
△ Less
Submitted 31 January, 2026;
originally announced February 2026.
-
Numerical simulations of primordial black hole formation via delayed first-order phase transitions
Authors:
Zhuan Ning,
Xiang-Xi Zeng,
Rong-Gen Cai,
Shao-Jiang Wang
Abstract:
We perform fully nonlinear, spherically symmetric numerical simulations of superhorizon false-vacuum-domain (FVD) collapse in a coupled gravity-scalar-fluid system to study primordial black hole (PBH) formation during delayed first-order phase transitions (FOPTs). Using adaptive mesh refinement to resolve the bubble wall, we identify three dynamical outcomes: type B (supercritical) PBHs with an in…
▽ More
We perform fully nonlinear, spherically symmetric numerical simulations of superhorizon false-vacuum-domain (FVD) collapse in a coupled gravity-scalar-fluid system to study primordial black hole (PBH) formation during delayed first-order phase transitions (FOPTs). Using adaptive mesh refinement to resolve the bubble wall, we identify three dynamical outcomes: type B (supercritical) PBHs with an interior baby universe and a bifurcating trapping horizon, type A (subcritical) PBHs with an apparent horizon formed by direct wall collapse, and dispersal with no PBH formation. To separate these three cases, we evaluate two commonly used PBH-formation criteria: the time scale ratio $t_\mathrm{H}/t_\mathrm{V}$ (horizon crossing time versus vacuum-energy domination time) and the local density contrast $δ(t_\mathrm{H})$ at horizon crossing. For the parameter space explored, we find that $t_\mathrm{H}/t_\mathrm{V}$ is a more robust predictor of outcome: type B PBHs form when $t_\mathrm{H}/t_\mathrm{V} \gtrsim 1$ (critical range $\sim 1.1 - 1.6$ in our survey), type A PBHs arise when $t_\mathrm{H}/t_\mathrm{V}$ is below this threshold but remains above a lower bound (typical range $\sim 0.35 - 0.7$), and no-PBH dispersal occurs when $t_\mathrm{H}/t_\mathrm{V}$ falls below this lower bound. When a clear thin-wall FVD boundary exists, $δ(t_\mathrm{H})$ can correspondingly distinguish different outcomes (roughly $δ_c(t_\mathrm{H}) \sim 1 - 1.7$ for type B and $δ_c(t_\mathrm{H}) \sim 0.35 - 0.5$ for type A), but is highly sensitive to wall structure and model details and thus less universal. These results offer new insights into the dynamics of FVD collapse, quantify practical PBH-formation thresholds, and pave the way for precise predictions of PBH abundance from delayed FOPTs.
△ Less
Submitted 29 January, 2026;
originally announced January 2026.
-
Pulse-resolved Classification and Characteristics of Long-duration GRBs with \emph{Swift}-BAT Data.I. Precursors versus Main Bursts
Authors:
Liang Li,
Yu Wang,
Jin-Jun Geng,
Yong-Feng Huang,
Rong-Gen Cai
Abstract:
We present a systematic pulse-by-pulse analysis of 22 long-duration GRBs observed by \emph{Swift}, each exhibiting a well-separated precursor before the main burst. We compare duration, spectral hardness ratio, minimum variability timescale (MVT), and spectral lag between these components. Both precursors and main bursts have durations and hardness broadly consistent with Type II GRBs. However, pr…
▽ More
We present a systematic pulse-by-pulse analysis of 22 long-duration GRBs observed by \emph{Swift}, each exhibiting a well-separated precursor before the main burst. We compare duration, spectral hardness ratio, minimum variability timescale (MVT), and spectral lag between these components. Both precursors and main bursts have durations and hardness broadly consistent with Type II GRBs. However, precursors show longer MVTs (by factors of 3-10) and diverse lags with near-zero median values, while main bursts display variable MVTs and positive lags. These differences suggest precursors may originate from distinct dissipation conditions, possibly due to cocoon shock breakout or early magnetically dominated outflows. Despite temporal differences, both episodes are consistent with a single collapsar origin, providing no evidence for dual-progenitor events. Our findings support pulse-resolved classification and show that precursors offer critical insights into jet formation and pre-burst activity.
△ Less
Submitted 29 January, 2026;
originally announced January 2026.
-
CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Authors:
Donghee Lee,
Rui Cai,
Zhe Zhao
Abstract:
Large vision-language models (LVLMs) are typically trained using autoregressive language modeling objectives, which align visual representations with linguistic space. While effective for multimodal reasoning, this alignment can weaken vision-centric capabilities, causing LVLMs to underperform their base vision encoders on tasks such as image classification. To address this limitation, we propose…
▽ More
Large vision-language models (LVLMs) are typically trained using autoregressive language modeling objectives, which align visual representations with linguistic space. While effective for multimodal reasoning, this alignment can weaken vision-centric capabilities, causing LVLMs to underperform their base vision encoders on tasks such as image classification. To address this limitation, we propose Context-Aware Image Representation Prioritization via Ensemble (CARPE), a lightweight framework that integrates raw vision features with aligned LLM representations through vision-integration layers and a context-aware ensemble mechanism. This design enhances the model's ability to adaptively weight visual and textual modalities and enables the model to capture various aspects of image representations. Extensive experiments demonstrate that CARPE improves performance on both image classification and diverse vision-language benchmarks. Our results suggest that modality balancing plays a critical role in multimodal generalization by improving representation utilization within autoregressive LVLMs.
△ Less
Submitted 26 March, 2026; v1 submitted 20 January, 2026;
originally announced January 2026.
-
Confined non-Hermitian skin effect in a semi-infinite Fock-state lattice
Authors:
Zhi Jiao Deng,
Xing Yao Mi,
Ruo Kun Cai,
Chun Wang Wu,
Ping Xing Chen
Abstract:
In this paper, we investigate the non-Hermitian skin effect in a semi-infinite Fock-state lattice, where the inherent coupling scales as \sqrt{n}. By analytically solving a non-uniform, non-reciprocal SSH model, we demonstrate that the intrinsic inhomogeneous coupling, in combination with nonreciprocity, fundamentally modifies the conventional skin effect. Instead of accumulating at the physical b…
▽ More
In this paper, we investigate the non-Hermitian skin effect in a semi-infinite Fock-state lattice, where the inherent coupling scales as \sqrt{n}. By analytically solving a non-uniform, non-reciprocal SSH model, we demonstrate that the intrinsic inhomogeneous coupling, in combination with nonreciprocity, fundamentally modifies the conventional skin effect. Instead of accumulating at the physical boundary, all eigenmodes become compressed and skewed within a finite spatial range determined by the inhomogeneous profile-a phenomenon we term the confined non-Hermitian skin effect. Consequently, the evolution of the probability distribution on the lattice starting from a single site is doubly confined: it is spatially bounded to a finite range by the inhomogeneous coupling, and further restricted to a one-sided trajectory at the edge of this range by the non-reciprocity. Moreover, a feasible experimental scheme based on a single trapped ion is also proposed. This work reveals how engineered coupling profiles in synthetic dimensions can reshape non-Hermitian properties and enable new protocols for quantum state manipulation.
△ Less
Submitted 19 January, 2026;
originally announced January 2026.
-
Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers
Authors:
Runyuan Cai,
Yu Lin,
Yiming Wang,
Chunlin Fu,
Xiaodong Zeng
Abstract:
Traditional speech systems typically rely on separate, task-specific models for text-to-speech (TTS), automatic speech recognition (ASR), and voice conversion (VC), resulting in fragmented pipelines that limit scalability, efficiency, and cross-task generalization. In this paper, we present General-Purpose Audio (GPA), a unified audio foundation model that integrates multiple core speech tasks wit…
▽ More
Traditional speech systems typically rely on separate, task-specific models for text-to-speech (TTS), automatic speech recognition (ASR), and voice conversion (VC), resulting in fragmented pipelines that limit scalability, efficiency, and cross-task generalization. In this paper, we present General-Purpose Audio (GPA), a unified audio foundation model that integrates multiple core speech tasks within a single large language model (LLM) architecture. GPA operates on a shared discrete audio token space and supports instruction-driven task induction, enabling a single autoregressive model to flexibly perform TTS, ASR, and VC without architectural modifications. This unified design combines a fully autoregressive formulation over discrete speech tokens, joint multi-task training across speech domains, and a scalable inference pipeline that achieves high concurrency and throughput. The resulting model family supports efficient multi-scale deployment, including a lightweight 0.3B-parameter variant optimized for edge and resource-constrained environments. Together, these design choices demonstrate that a unified autoregressive architecture can achieve competitive performance across diverse speech tasks while remaining viable for low-latency, practical deployment.
△ Less
Submitted 15 January, 2026;
originally announced January 2026.
-
What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models
Authors:
Guimin Hu,
Meng Li,
Qiwei Peng,
Lijie Hu,
Boyan Xu,
Ruichu Cai
Abstract:
Most interpretability work focuses on layer- or neuron-level mechanisms in Transformers, leaving expert-level behavior in MoE LLMs underexplored. Motivated by functional specialization in the human brain, we analyze expert activation by distinguishing domain and driver experts. In this work, we study expert activation in MoE models across three public domains and address two key questions: (1) whi…
▽ More
Most interpretability work focuses on layer- or neuron-level mechanisms in Transformers, leaving expert-level behavior in MoE LLMs underexplored. Motivated by functional specialization in the human brain, we analyze expert activation by distinguishing domain and driver experts. In this work, we study expert activation in MoE models across three public domains and address two key questions: (1) which experts are activated, and whether certain expert types exhibit consistent activation patterns; and (2) how tokens are associated with and trigger the activation of specific experts. To answer these questions, we introduce entropy-based and causal-effect metrics to assess whether an expert is strongly favored for a particular domain, and how strongly expert activation contributes causally to the model's output, thus identify domain and driver experts, respectively. Furthermore, we explore how individual tokens are associated with the activation of specific experts. Our analysis reveals that (1) Among the activated experts, some show clear domain preferences, while others exert strong causal influence on model performance, underscoring their decisive roles. (2) tokens occurring earlier in a sentence are more likely to trigger the driver experts, and (3) adjusting the weights of domain and driver experts leads to significant performance gains across all three models and domains. These findings shed light on the internal mechanisms of MoE models and enhance their interpretability.
△ Less
Submitted 20 January, 2026; v1 submitted 15 January, 2026;
originally announced January 2026.
-
Noise2Void for Denoising Atomic Resolution Scanning Transmission Electron Microscopy Images
Authors:
William Thornley,
Sam Sullivan-Allsop,
Rongsheng Cai,
Nick Clark,
Roman Gorbachev,
Sarah J. Haigh
Abstract:
The Noise2Void technique is demonstrated for successful denoising of atomic-resolution scanning transmission electron microscopy (STEM) images. The technique is applied to denoising atomic resolution images and videos of gold adatoms on a graphene surface within a graphene liquid cell, with the denoised experimental data qualitatively demonstrating improved visibility of both the Au adatoms and th…
▽ More
The Noise2Void technique is demonstrated for successful denoising of atomic-resolution scanning transmission electron microscopy (STEM) images. The technique is applied to denoising atomic resolution images and videos of gold adatoms on a graphene surface within a graphene liquid cell, with the denoised experimental data qualitatively demonstrating improved visibility of both the Au adatoms and the graphene lattice. The denoising performance is quantified by comparison to similar simulated data and the approach is found to significantly outperform both total variation and simple Gaussian blurring. Compared to other denoising methods, the Noise2Void technique has the combined advantages that it requires no manual intervention during training or denoising, no prior knowledge of the sample and is compatible with real time data acquisition rates of at least 45 frames per second.
△ Less
Submitted 12 January, 2026;
originally announced January 2026.
-
ENTRA: Entropy-Based Redundancy Avoidance in Large Language Model Reasoning
Authors:
Ruichu Cai,
Haopeng Du,
Qingwen Lin,
Yutong Chen,
Zijian Li,
Boyan Xu
Abstract:
Large Reasoning Models (LRMs) often suffer from overthinking, generating unnecessarily long reasoning chains even for simple tasks. This leads to substantial computational overhead with limited performance gain, primarily due to redundant verification and repetitive generation. While prior work typically constrains output length or optimizes correctness, such coarse supervision fails to guide mode…
▽ More
Large Reasoning Models (LRMs) often suffer from overthinking, generating unnecessarily long reasoning chains even for simple tasks. This leads to substantial computational overhead with limited performance gain, primarily due to redundant verification and repetitive generation. While prior work typically constrains output length or optimizes correctness, such coarse supervision fails to guide models toward concise yet accurate inference. In this paper, we propose ENTRA, an entropy-based training framework that suppresses redundant reasoning while preserving performance. ENTRA first estimates the token-level importance using a lightweight Bidirectional Importance Estimation (BIE) method, which accounts for both prediction confidence and forward influence. It then computes a redundancy reward based on the entropy of low-importance tokens, normalized by its theoretical upper bound, and optimizes this reward via reinforcement learning. Experiments on mathematical reasoning benchmarks demonstrate that ENTRA reduces output length by 37% to 53% with no loss-and in some cases, gains-in accuracy. Our approach offers a principled and efficient solution to reduce overthinking in LRMs, and provides a generalizable path toward redundancy-aware reasoning optimization.
△ Less
Submitted 11 January, 2026;
originally announced January 2026.
-
Cryogenic interface-state filling and tunneling mechanisms in strained Ge/SiGe heterostructures
Authors:
Jingrui Ma,
Yuan Kang,
Rui Wu,
Zheng Liu,
Zong-Hu Li,
Tian-Yue Hao,
Zhen-Zhen Kong,
Gui-Lei Wang,
Yong-Qiang Xu,
Ran-Ran Cai,
Bao-Chuan Wang,
Hai-Ou Li,
Gang Cao,
Guo-Ping Guo
Abstract:
Traps at the semiconductor-oxide interface are considered as a major source of instability in strained Ge/SiGe quantum devices, yet the quantified study of their cryogenic behavior remains limited. In this work, we investigate interface-state trapping using Hall-bar field-effect transistors fabricated on strained Ge/SiGe heterostructures. Combining transport measurements with long-term stabilizati…
▽ More
Traps at the semiconductor-oxide interface are considered as a major source of instability in strained Ge/SiGe quantum devices, yet the quantified study of their cryogenic behavior remains limited. In this work, we investigate interface-state trapping using Hall-bar field-effect transistors fabricated on strained Ge/SiGe heterostructures. Combining transport measurements with long-term stabilization and Schrödinger-Poisson modelling, we reconstruct the gradual filling process of interface states at cryogenic condition. Using the calculated valence band profiles, we further evaluate the tunneling current density between the quantum well and the semiconductor-oxide interface. Our calculation demonstrates that the total tunneling current is consistent with a crossover from trap-assisted-tunneling-dominated transport to Fowler-Nordheim-tunneling-dominated transport under different gate bias regimes. These results refine the conventional Fowler-Nordheim-based picture of interface trapping in strained Ge/SiGe heterostructures and provide guidelines for improving Ge-based quantum device performance by improving barrier crystalline qualities and reducing dislocation-related trap densities.
△ Less
Submitted 11 January, 2026;
originally announced January 2026.
-
Efficient Sequential Recommendation for Long Term User Interest Via Personalization
Authors:
Qiang Zhang,
Hanchao Yu,
Ivan Ji,
Chen Yuan,
Yi Zhang,
Chihuang Liu,
Xiaolong Wang,
Christopher E. Lambert,
Ren Chen,
Chen Kovacs,
Xinzhu Bei,
Renqin Cai,
Rui Li,
Lizhu Zhang,
Xiangjun Fan,
Qunshu Zhang,
Benyu Zhang
Abstract:
Recent years have witnessed success of sequential modeling, generative recommender, and large language model for recommendation. Though the scaling law has been validated for sequential models, it showed inefficiency in computational capacity when considering real-world applications like recommendation, due to the non-linear(quadratic) increasing nature of the transformer model. To improve the eff…
▽ More
Recent years have witnessed success of sequential modeling, generative recommender, and large language model for recommendation. Though the scaling law has been validated for sequential models, it showed inefficiency in computational capacity when considering real-world applications like recommendation, due to the non-linear(quadratic) increasing nature of the transformer model. To improve the efficiency of the sequential model, we introduced a novel approach to sequential recommendation that leverages personalization techniques to enhance efficiency and performance. Our method compresses long user interaction histories into learnable tokens, which are then combined with recent interactions to generate recommendations. This approach significantly reduces computational costs while maintaining high recommendation accuracy. Our method could be applied to existing transformer based recommendation models, e.g., HSTU and HLLM. Extensive experiments on multiple sequential models demonstrate its versatility and effectiveness. Source code is available at \href{https://github.com/facebookresearch/PerSRec}{https://github.com/facebookresearch/PerSRec}.
△ Less
Submitted 6 January, 2026;
originally announced January 2026.
-
Acoustic gravitational waves from primordial curvature perturbations
Authors:
Zhuan Ning,
Zi-Yan Yuwen,
Xiang-Xi Zeng,
Rong-Gen Cai,
Shao-Jiang Wang
Abstract:
Standard perturbative calculations of scalar-induced gravitational waves (SIGWs) have neglected nonperturbative effects in the large-amplitude regime. We develop a hybrid numerical framework to signify nonperturbative effects on the stochastic gravitational wave (GW) background sourced by primordial curvature perturbations, focusing on the acoustic channel (fluid motions). Fully general-relativist…
▽ More
Standard perturbative calculations of scalar-induced gravitational waves (SIGWs) have neglected nonperturbative effects in the large-amplitude regime. We develop a hybrid numerical framework to signify nonperturbative effects on the stochastic gravitational wave (GW) background sourced by primordial curvature perturbations, focusing on the acoustic channel (fluid motions). Fully general-relativistic, spherically symmetric simulations are used to extract nonperturbative sound-shell profiles from isolated curvature peaks; these profiles are then embedded into three-dimensional lattice evolutions of relativistic hydrodynamics coupled to transverse-traceless metric perturbations to compute the acoustic GW spectra. The acoustic signal has a peak frequency determined by the comoving shell thickness, and its amplitude is extremely sensitive to the mean comoving separation of peaks, scaling approximately as $R_{*c}^{-7}$. We find a robust causal low-frequency tail $\propto k^{3}$, and the nonlinear hydrodynamic interactions can enhance the ultraviolet power. Comparing with SIGWs computed perturbatively from the same real-space configuration, we show that acoustic GWs can be amplified by an order of magnitude and display a peak shifted to a lower frequency in the large-amplitude regime. These results highlight the importance of nonperturbative effects for accurate predictions of stochastic GW signals induced from primordial curvature perturbations.
△ Less
Submitted 24 December, 2025;
originally announced December 2025.
-
OmniMER: Auxiliary-Enhanced LLM Adaptation for Indonesian Multimodal Emotion Recognition
Authors:
Xueming Yan,
Boyan Xu,
Yaochu Jin,
Lixian Xiao,
Wenlong Ye,
Runyang Cai,
Zeqi Zheng,
Jingfa Liu,
Aimin Yang,
Yongduan Song
Abstract:
Indonesian, spoken by over 200 million people, remains underserved in multimodal emotion recognition research despite its dominant presence on Southeast Asian social media platforms. We introduce IndoMER, the first multimodal emotion recognition benchmark for Indonesian, comprising 1,944 video segments from 203 speakers with temporally aligned text, audio, and visual annotations across seven emoti…
▽ More
Indonesian, spoken by over 200 million people, remains underserved in multimodal emotion recognition research despite its dominant presence on Southeast Asian social media platforms. We introduce IndoMER, the first multimodal emotion recognition benchmark for Indonesian, comprising 1,944 video segments from 203 speakers with temporally aligned text, audio, and visual annotations across seven emotion categories. The dataset exhibits realistic challenges including cross-modal inconsistency and long-tailed class distributions shaped by Indonesian cultural communication norms. To address these challenges, we propose OmniMER, a multimodal adaptation framework built upon Qwen2.5-Omni that enhances emotion recognition through three auxiliary modality-specific perception tasks: emotion keyword extraction for text, facial expression analysis for video, and prosody analysis for audio. These auxiliary tasks help the model identify emotion-relevant cues in each modality before fusion, reducing reliance on spurious correlations in low-resource settings. Experiments on IndoMER show that OmniMER achieves 0.582 Macro-F1 on sentiment classification and 0.454 on emotion recognition, outperforming the base model by 7.6 and 22.1 absolute points respectively. Cross-lingual evaluation on the Chinese CH-SIMS dataset further demonstrates the generalizability of the proposed framework. The dataset and code are publicly available. https://github.com/yanxm01/INDOMER
△ Less
Submitted 10 February, 2026; v1 submitted 22 December, 2025;
originally announced December 2025.
-
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss
Authors:
Rongyao Cai,
Yuxi Wan,
Kexin Zhang,
Ming Jin,
Hao Wang,
Zhiqiang Ge,
Daoyi Dong,
Yong Liu,
Qingsong Wen
Abstract:
Optimizing time series models via point-wise loss functions (e.g., MSE) relying on a heuristic point-wise i.i.d. assumption disregards the causal temporal structure. Focusing on the core independence issue under covariance stationarity, this paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB). Our analysis reveals a fundamental paradigm paradox: The more…
▽ More
Optimizing time series models via point-wise loss functions (e.g., MSE) relying on a heuristic point-wise i.i.d. assumption disregards the causal temporal structure. Focusing on the core independence issue under covariance stationarity, this paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB). Our analysis reveals a fundamental paradigm paradox: The more deterministic and structured the time series, the more severe the bias incurred by point-wise loss function. We derive the first closed-form quantification for the non-deterministic EOB across linear and non-linear systems, and prove EOB is an intrinsic data property, governed exclusively by sequence length and the defined Structural Signal-to-Noise Ratio. This theoretical discovery motivates our principled debiasing program that eliminates the bias through sequence length reduction and structural orthogonalization. We present a concrete solution via DFT or DWT, and propose a novel harmonized $\ell_p$ norm framework to rectify gradient optimization pathologies of high-variance sequences. Extensive experiments validate EOB Theory's generality and the superior performance of debiasing program, achieving 5.2% and 5.1% average improvement of MSE and MAE conducted on the iTransformer across 11 datasets, respectively.
△ Less
Submitted 1 February, 2026; v1 submitted 21 December, 2025;
originally announced December 2025.
-
Dark matter in ALFALFA galaxies: Investigating galaxy-halo connection
Authors:
Meng Yang,
Ling Zhu,
Niankun Yu,
Yu Lei,
Runsheng Cai,
Jie Wang,
Zheng Zheng
Abstract:
This paper aims to investigate the galaxy-halo connection using a large sample of individual galaxies with $\mathrm{H\,I}$ integrated spectra. We determine their dark matter content by applying a dynamical method based on $\mathrm{H\,I}$ line widths measured with the curve-of-growth technique, together with inclination corrections inferred from optical images. We build a sample of 2453 gas-rich pr…
▽ More
This paper aims to investigate the galaxy-halo connection using a large sample of individual galaxies with $\mathrm{H\,I}$ integrated spectra. We determine their dark matter content by applying a dynamical method based on $\mathrm{H\,I}$ line widths measured with the curve-of-growth technique, together with inclination corrections inferred from optical images. We build a sample of 2453 gas-rich predominantly late-type galaxies spanning a stellar mass range of $10^{8.7}M_\odot$ to $10^{11.4}M_\odot$ by matching them one-to-one with their counterparts from the ALFALFA survey and the TNG100 simulation, ensuring a direct match of stellar mass and $\mathrm{H\,I}$ radius. We generate mock images and mock $\mathrm{H\,I}$ integrated spectra for TNG100 galaxies, and apply the same dynamical method to both ALFALFA and TNG100 mock galaxies to infer their dark matter masses. Across all stellar mass bins, ALFALFA galaxies exhibit lower median dark matter masses than the mock TNG100 simulation results. In each bin, this offset is driven by a tail of galaxies with comparatively low dark matter content, which becomes more prominent toward higher stellar masses. In the highest mass bin ($M_* > 10^{11} M_\odot$), late-type ALFALFA galaxies show a median dark matter mass that is 23% lower than that of their counterparts in the TNG100 dark-matter-only simulation, with 32% of ALFALFA galaxies having $M_\mathrm{DM}(<R_\mathrm{HI})<10^{11.5} M_\odot$, compared to 17% in the mock TNG100 sample. These results suggest that a larger fraction of massive late-type galaxies reside in relatively less massive dark matter haloes than predicted by the TNG100 simulation.
△ Less
Submitted 16 December, 2025;
originally announced December 2025.
-
BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding
Authors:
Jiayi Yuan,
Cameron Shinn,
Kai Xu,
Jingze Cui,
George Klimiashvili,
Guangxuan Xiao,
Perkz Zheng,
Bo Li,
Yuxin Zhou,
Zhouhai Ye,
Weijie You,
Tian Zheng,
Dominic Brown,
Pengbo Wang,
Markus Hoehnerbach,
Richard Cai,
Julien Demouth,
John D. Owens,
Xia Hu,
Song Han,
Timmy Liu,
Huizi Mao
Abstract:
The growing demand for long-context inference capabilities in Large Language Models (LLMs) has intensified the computational and memory bottlenecks inherent to the self-attention mechanism. To address this challenge, we introduce BLASST, a drop-in, dynamic sparse attention mechanism that accelerates inference by using only a fixed scalar threshold to skip attention blocks. Our method targets pract…
▽ More
The growing demand for long-context inference capabilities in Large Language Models (LLMs) has intensified the computational and memory bottlenecks inherent to the self-attention mechanism. To address this challenge, we introduce BLASST, a drop-in, dynamic sparse attention mechanism that accelerates inference by using only a fixed scalar threshold to skip attention blocks. Our method targets practical inference deployment by removing the barriers to adoption present in existing works. As such, BLASST eliminates training requirements, avoids expensive pre-computation passes, accelerates both prefill and decode across all major attention variants (MHA, GQA, MQA, and MLA), provides optimized support for modern hardware, and easily integrates into existing frameworks. This is achieved by reusing online softmax statistics to identify negligible attention scores, skipping softmax, value block loads, and the subsequent matrix multiplication. We demonstrate the BLASST algorithm by delivering optimized kernels with negligible latency overhead. Our automated threshold calibration procedure reveals a simple inverse relationship between optimal threshold and context length, meaning we require only a single threshold each for prefill and decode per model. Preserving benchmark accuracy, we demonstrate a 1.52x speedup for prefill at 71.9% sparsity and a 1.48x speedup for decode at 73.2% sparsity on modern GPUs.
△ Less
Submitted 6 April, 2026; v1 submitted 12 December, 2025;
originally announced December 2025.
-
DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components
Authors:
Yupei Li,
Li Wang,
Yuxiang Wang,
Lei Wang,
Rizhao Cai,
Jie Shi,
Björn W. Schuller,
Zhizheng Wu
Abstract:
Audio deepfake detection has recently garnered public concern due to its implications for security and reliability. Traditional deep learning methods have been widely applied to this task but often lack generalisability when confronted with newly emerging spoofing techniques and more tasks such as spoof attribution recognition rather than simple binary classification. In principle, Large Language…
▽ More
Audio deepfake detection has recently garnered public concern due to its implications for security and reliability. Traditional deep learning methods have been widely applied to this task but often lack generalisability when confronted with newly emerging spoofing techniques and more tasks such as spoof attribution recognition rather than simple binary classification. In principle, Large Language Models (LLMs) are considered to possess the needed generalisation capabilities. However, previous research on Audio LLMs (ALLMs) indicates a generalization bottleneck in audio deepfake detection performance, even when sufficient data is available. Consequently, this study investigates the model architecture and examines the effects of the primary components of ALLMs, namely the audio encoder and the text-based LLM. Our experiments demonstrate that the careful selection and combination of audio encoders and text-based LLMs are crucial for unlocking the deepfake detection potential of ALLMs. We further propose an ALLM structure capable of generalizing deepfake detection abilities to out-of-domain spoofing tests and other deepfake tasks, such as spoof positioning and spoof attribution recognition. Our proposed model architecture achieves state-of-the-art (SOTA) performance across multiple datasets, including ASVSpoof2019, InTheWild, and Demopage, with accuracy reaching up to 95.76% on average, and exhibits competitive capabilities in other deepfake detection tasks such as attribution, and localisation compared to SOTA audio understanding models. Data and codes are provided in supplementary materials.
△ Less
Submitted 15 December, 2025; v1 submitted 9 December, 2025;
originally announced December 2025.
-
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study
Authors:
Yixuan Li,
Yuhao Lu,
Yang Liu,
Liang Li,
R. Ruffini,
Di Li,
Rong-Gen Cai,
Xiaoyan Zhu,
Wenbin Lin,
Yu Wang
Abstract:
This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with non-Gaussian, non-stationary noise and limited labeled samples. Gravitational wave observations provide an suitable test case, using only 90 LIGO events, finetuned LLMs achieve 97.4\% accuracy for identifying signals. Further experiments sh…
▽ More
This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with non-Gaussian, non-stationary noise and limited labeled samples. Gravitational wave observations provide an suitable test case, using only 90 LIGO events, finetuned LLMs achieve 97.4\% accuracy for identifying signals. Further experiments show that, in contrast to traditional networks that rely on large simulated datasets, additional simulated samples do not improve LLM performance, while scaling studies reveal predictable gains with increasing model size and dataset size. These results indicate that LLMs can extract discriminative structure directly from observational data and provide an efficient assessment for gravitational wave identification. The same strategy may extend to other astronomical domains with similar noise properties, such as radio or pulsar observations.
△ Less
Submitted 11 January, 2026; v1 submitted 3 December, 2025;
originally announced December 2025.
-
Emergent Extreme-View Geometry in 3D Foundation Models
Authors:
Yiwen Zhang,
Joseph Tung,
Ruojin Cai,
David Fouhey,
Hadar Averbuch-Elor
Abstract:
3D foundation models (3DFMs) have recently transformed 3D vision, enabling joint prediction of depths, poses, and point maps directly from images. Yet their ability to reason under extreme, non-overlapping views remains largely unexplored. In this work, we study their internal representations and find that 3DFMs exhibit an emergent understanding of extreme-view geometry, despite never being traine…
▽ More
3D foundation models (3DFMs) have recently transformed 3D vision, enabling joint prediction of depths, poses, and point maps directly from images. Yet their ability to reason under extreme, non-overlapping views remains largely unexplored. In this work, we study their internal representations and find that 3DFMs exhibit an emergent understanding of extreme-view geometry, despite never being trained for such conditions. To further enhance these capabilities, we introduce a lightweight alignment scheme that refines their internal 3D representation by tuning only a small subset of backbone bias terms, leaving all decoder heads frozen. This targeted adaptation substantially improves relative pose estimation under extreme viewpoints without degrading per-image depth or point quality. Additionally, we contribute MegaUnScene, a new benchmark of Internet scenes unseen by existing 3DFMs, with dedicated test splits for both relative pose estimation and dense 3D reconstruction. All code and data will be released.
△ Less
Submitted 1 December, 2025; v1 submitted 27 November, 2025;
originally announced November 2025.
-
Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation
Authors:
Zhifeng Hao,
Qibin Song,
Ruichu Cai,
Boyan Xu
Abstract:
Recent divide-and-conquer reasoning approaches, particularly those based on Chain-of-Thought (CoT), have substantially improved the Text-to-SQL capabilities of Large Language Models (LLMs). However, when applied to complex enterprise databases, such methods struggle to maintain coherent reasoning due to limited context capacity, unreliable schema linking, and weak grounding in database semantics.…
▽ More
Recent divide-and-conquer reasoning approaches, particularly those based on Chain-of-Thought (CoT), have substantially improved the Text-to-SQL capabilities of Large Language Models (LLMs). However, when applied to complex enterprise databases, such methods struggle to maintain coherent reasoning due to limited context capacity, unreliable schema linking, and weak grounding in database semantics. To overcome these issues, we introduce DSR-SQL, a \textbf{D}ual-\textbf{S}tate \textbf{R}easoning framework that models Text-to-SQL as an interaction between an adaptive context state and a progressive generation state. The first constructs a compact, semantically faithful environment by refining large schemas and selecting relevant structures, while the second formalizes SQL synthesis as feedback-guided state transitions, enabling the model to self-correct and align with user intent. Without any post-training or in-context examples, DSR-SQL achieves competitive performance, reaching 35.28\% execution accuracy on Spider 2.0-Snow and 68.32\% on BIRD development set. Our implementation will be open-sourced at: https://github.com/DMIRLAB-Group/DSR-SQL.
△ Less
Submitted 26 November, 2025;
originally announced November 2025.
-
Periodic gravitational lensing by oscillating boson stars
Authors:
Xing-Yu Yang,
Tan Chen,
Rong-Gen Cai
Abstract:
We show that oscillating (real-scalar) boson stars can act as strictly periodic gravitational lenses and generically host an \emph{oscillating radial caustic}. Sources near this caustic cross it every half period, producing achromatic phase-locked photometric spikes synchronized with an astrometric wobble, providing a promising target for time-domain astronomy. Event-number estimation indicates a…
▽ More
We show that oscillating (real-scalar) boson stars can act as strictly periodic gravitational lenses and generically host an \emph{oscillating radial caustic}. Sources near this caustic cross it every half period, producing achromatic phase-locked photometric spikes synchronized with an astrometric wobble, providing a promising target for time-domain astronomy. Event-number estimation indicates a measurable discovery space with current astrometric and high-cadence photometric surveys. These predictions rely only on the dynamics of long-lived real-scalar condensates, therefore offering a clean test of self-gravitating quantum fields in curved spacetime. The framework extends naturally to self-interacting real scalars (including axion-like particles) and to ultralight vector bosons.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Authors:
Ali Taghibakhshi,
Sharath Turuvekere Sreenivas,
Saurav Muralidharan,
Ruisi Cai,
Marcin Chochowski,
Ameya Sunil Mahabaleshwarkar,
Yoshi Suhara,
Oluwatobi Olabiyi,
Daniel Korzekwa,
Mostofa Patwary,
Mohammad Shoeybi,
Jan Kautz,
Bryan Catanzaro,
Ashwath Aithal,
Nima Tajbakhsh,
Pavlo Molchanov
Abstract:
Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent work on model compression through pruning and knowledge distillation has reduced this cost; however, this process still incurs hundreds of billions of tokens worth of training cost per compressed model. In this p…
▽ More
Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent work on model compression through pruning and knowledge distillation has reduced this cost; however, this process still incurs hundreds of billions of tokens worth of training cost per compressed model. In this paper, we present Nemotron Elastic, a framework for building reasoning-oriented LLMs, including hybrid Mamba-Attention architectures, that embed multiple nested submodels within a single parent model, each optimized for different deployment configurations and budgets. Each of these submodels shares weights with the parent model and can be extracted zero-shot during deployment without additional training or fine-tuning. We enable this functionality through an end-to-end trained router, tightly coupled to a two-stage training curriculum designed specifically for reasoning models. We additionally introduce group-aware SSM elastification that preserves Mamba's structural constraints, heterogeneous MLP elastification, normalized MSE-based layer importance for improved depth selection, and knowledge distillation enabling simultaneous multi-budget optimization. We apply Nemotron Elastic to the Nemotron Nano V2 12B model, simultaneously producing a 9B and a 6B model using only 110B training tokens; this results in over 360x cost reduction compared to training model families from scratch, and around 7x compared to SoTA compression techniques. Each of the nested models performs on par or better than the SoTA in accuracy. Moreover, unlike other compression methods, the nested capability of our approach allows having a many-in-one reasoning model that has constant deployment memory against the number of models in the family.
△ Less
Submitted 20 November, 2025;
originally announced November 2025.
-
MiMo-Embodied: X-Embodied Foundation Model Technical Report
Authors:
Xiaoshuai Hao,
Lei Zhou,
Zhijian Huang,
Zhiwen Hou,
Yingbo Tang,
Lingfeng Zhang,
Guang Li,
Zheng Lu,
Shuhuai Ren,
Xianhui Meng,
Yuchen Zhang,
Jing Wu,
Jinghui Lu,
Chenxu Dang,
Jiayi Guan,
Jianhua Wu,
Zhiyi Hou,
Hanbing Li,
Shumeng Xia,
Mingliang Zhou,
Yinan Zheng,
Zihao Yue,
Shuhao Gu,
Hao Tian,
Yuannan Shen
, et al. (19 additional authors not shown)
Abstract:
We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Percepti…
▽ More
We open-source MiMo-Embodied, the first cross-embodied foundation model to successfully integrate and achieve state-of-the-art performance in both Autonomous Driving and Embodied AI. MiMo-Embodied sets new records across 17 embodied AI benchmarks in Task Planning, Affordance Prediction and Spatial Understanding, while also excelling in 12 autonomous driving benchmarks across Environmental Perception, Status Prediction, and Driving Planning. Across these tasks, MiMo-Embodied significantly outperforms existing open-source, closed-source, and specialized baselines. Our results indicate that through multi-stage learning, curated data construction, and CoT/RL fine-tuning, these two domains exhibit strong positive transfer and mutually reinforce one another. We provide a detailed analysis of our model design and training methodologies to facilitate further research. Code and models are available at https://github.com/XiaomiMiMo/MiMo-Embodied.
△ Less
Submitted 20 November, 2025;
originally announced November 2025.
-
Constraining interacting dark energy models with black hole superradiance
Authors:
Zhen-Hong Lyu,
Rong-Gen Cai,
Shao-Jiang Wang,
Xiang-Xi Zeng
Abstract:
The recent preference for a dynamical dark energy (DE) from the Dark Energy Spectroscopic Instrument seems to call for interactions between DE and dark matter (DM), either from direct DE-DM interaction or indirect interaction induced by modified gravity. Therefore, an independent probe for these kinds of DE-DM interactions would be appealing from observational aspects. In this paper, we propose th…
▽ More
The recent preference for a dynamical dark energy (DE) from the Dark Energy Spectroscopic Instrument seems to call for interactions between DE and dark matter (DM), either from direct DE-DM interaction or indirect interaction induced by modified gravity. Therefore, an independent probe for these kinds of DE-DM interactions would be appealing from observational aspects. In this paper, we propose the black hole superradiance as a novel astrophysical probe for field-theoretic interacting DE-DM models, providing complementary constraints independent of large-scale cosmological observations. The core principle is that the DE-DM interaction can alter the effective mass of the superradiant ultralight boson, thereby modifying its superradiant instability rate around spinning black holes. We explore this connection through two distinct scenarios: a model where the DE field mediates a dark fifth force within the DM sector, affecting the superradiance from DM particles; and a novel mechanism where the DE field itself becomes superradiant due to the effective mass enhancement induced by dense DM spikes around supermassive black holes. By applying a statistical framework to black hole observations in both scenarios, we derive constraints on the fundamental DE-DM coupling strength. Although the current constraints are rather loose due to small samples and inaccurate measurements, our work provides new astrophysical constraints on these interacting DE-DM scenarios and establishes a new synergy between black hole physics and cosmology for probing the fundamental nature of the dark sector.
△ Less
Submitted 7 April, 2026; v1 submitted 20 November, 2025;
originally announced November 2025.
-
Toward Dignity-Aware AI: Next-Generation Elderly Monitoring from Fall Detection to ADL
Authors:
Xun Shao,
Aoba Otani,
Yuto Hirasuka,
Runji Cai,
Seng W. Loke
Abstract:
This position paper envisions a next-generation elderly monitoring system that moves beyond fall detection toward the broader goal of Activities of Daily Living (ADL) recognition. Our ultimate aim is to design privacy-preserving, edge-deployed, and federated AI systems that can robustly detect and understand daily routines, supporting independence and dignity in aging societies. At present, ADL-sp…
▽ More
This position paper envisions a next-generation elderly monitoring system that moves beyond fall detection toward the broader goal of Activities of Daily Living (ADL) recognition. Our ultimate aim is to design privacy-preserving, edge-deployed, and federated AI systems that can robustly detect and understand daily routines, supporting independence and dignity in aging societies. At present, ADL-specific datasets are still under collection. As a preliminary step, we demonstrate feasibility through experiments using the SISFall dataset and its GAN-augmented variants, treating fall detection as a proxy task. We report initial results on federated learning with non-IID conditions, and embedded deployment on Jetson Orin Nano devices. We then outline open challenges such as domain shift, data scarcity, and privacy risks, and propose directions toward full ADL monitoring in smart-room environments. This work highlights the transition from single-task detection to comprehensive daily activity recognition, providing both early evidence and a roadmap for sustainable and human-centered elderly care AI.
△ Less
Submitted 12 February, 2026; v1 submitted 12 November, 2025;
originally announced November 2025.