-
Higher-order exceptional ring semimetal with real hinge states in phononic crystals
Authors:
Yejian Hu,
Zhenhang Pu,
Xiangru Chen,
Yuxiang Xi,
Jiuyang Lu,
Weiyin Deng,
Manzhu Ke,
Zhengyou Liu
Abstract:
Non-Hermitian topological phase, with the novel concepts such as exceptional points and skin effect, has opened up a new paradigm beyond Hermitian topological physics. Exceptional ring semimetal, featured by a stable ring of exceptional points in three dimensions, exhibits first-order topological properties, including topological surface states and surface-dependent skin effect. Nevertheless, desp…
▽ More
Non-Hermitian topological phase, with the novel concepts such as exceptional points and skin effect, has opened up a new paradigm beyond Hermitian topological physics. Exceptional ring semimetal, featured by a stable ring of exceptional points in three dimensions, exhibits first-order topological properties, including topological surface states and surface-dependent skin effect. Nevertheless, despite extensive research on Hermitian higher-order insulators and semimetals, higher-order exceptional ring semimetal is just emerging. Here, we report the first realization of a higher-order Weyl exceptional ring semimetal in a three-dimensional lossy phononic crystal. The non-Hermitian higher-order topology reflects in the topological hinge states and hinge-dependent skin effect. Counterintuitively, the topological hinge states maintain purely real energy even under a high loss level, ensuring robust hinge-state propagation. Our findings evidence the non-Hermitian higher-order bulk-boundary correspondence of exceptional ring semimetal, and may pave the way to non-Hermitian functional acoustic devices.
△ Less
Submitted 25 December, 2025;
originally announced December 2025.
-
Step-DeepResearch Technical Report
Authors:
Chen Hu,
Haikuo Du,
Heng Wang,
Lin Lin,
Mingrui Chen,
Peng Liu,
Ruihang Miao,
Tianchi Yue,
Wang You,
Wei Ji,
Wei Yuan,
Wenjin Deng,
Xiaojian Yuan,
Xiaoyun Zhang,
Xiangyu Liu,
Xikai Liu,
Yanming Xu,
Yicheng Cao,
Yifei Zhang,
Yongyao Wang,
Yubo Shu,
Yurong Zhang,
Yuxiang Zhang,
Zheng Gong,
Zhichao Chang
, et al. (42 additional authors not shown)
Abstract:
As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-source verification. To address this, we introduce Step-DeepResearch, a cost-effective, end-to-end agent…
▽ More
As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-source verification. To address this, we introduce Step-DeepResearch, a cost-effective, end-to-end agent. We propose a Data Synthesis Strategy Based on Atomic Capabilities to reinforce planning and report writing, combined with a progressive training path from agentic mid-training to SFT and RL. Enhanced by a Checklist-style Judger, this approach significantly improves robustness. Furthermore, to bridge the evaluation gap in the Chinese domain, we establish ADR-Bench for realistic deep research scenarios. Experimental results show that Step-DeepResearch (32B) scores 61.4% on Scale AI Research Rubrics. On ADR-Bench, it significantly outperforms comparable models and rivals SOTA closed-source models like OpenAI and Gemini DeepResearch. These findings prove that refined training enables medium-sized models to achieve expert-level capabilities at industry-leading cost-efficiency.
△ Less
Submitted 29 December, 2025; v1 submitted 23 December, 2025;
originally announced December 2025.
-
JoyVoice: Long-Context Conditioning for Anthropomorphic Multi-Speaker Conversational Synthesis
Authors:
Fan Yu,
Tao Wang,
You Wu,
Lin Zhu,
Wei Deng,
Weisheng Han,
Wenchao Wang,
Lin Hu,
Xiangyu Liang,
Xiaodong He,
Yankun Huang,
Yu Gu,
Yuan Liu,
Yuxuan Wang,
Zhangyu Xiao,
Ziteng Wang,
Boya Dong,
Feng Dang,
Jinming Chen,
Jingdong Li,
Jun Wang,
Yechen Jin,
Yuan Zhang,
Zhengyan Sheng,
Xin Wang
Abstract:
Large speech generation models are evolving from single-speaker, short sentence synthesis to multi-speaker, long conversation geneartion. Current long-form speech generation models are predominately constrained to dyadic, turn-based interactions. To address this, we introduce JoyVoice, a novel anthropomorphic foundation model designed for flexible, boundary-free synthesis of up to eight speakers.…
▽ More
Large speech generation models are evolving from single-speaker, short sentence synthesis to multi-speaker, long conversation geneartion. Current long-form speech generation models are predominately constrained to dyadic, turn-based interactions. To address this, we introduce JoyVoice, a novel anthropomorphic foundation model designed for flexible, boundary-free synthesis of up to eight speakers. Unlike conventional cascaded systems, JoyVoice employs a unified E2E-Transformer-DiT architecture that utilizes autoregressive hidden representations directly for diffusion inputs, enabling holistic end-to-end optimization. We further propose a MM-Tokenizer operating at a low bitrate of 12.5 Hz, which integrates multitask semantic and MMSE losses to effectively model both semantic and acoustic information. Additionally, the model incorporates robust text front-end processing via large-scale data perturbation. Experiments show that JoyVoice achieves state-of-the-art results in multilingual generation (Chinese, English, Japanese, Korean) and zero-shot voice cloning. JoyVoice achieves top-tier results on both the Seed-TTS-Eval Benchmark and multi-speaker long-form conversational voice cloning tasks, demonstrating superior audio quality and generalization. It achieves significant improvements in prosodic continuity for long-form speech, rhythm richness in multi-speaker conversations, paralinguistic naturalness, besides superior intelligibility. We encourage readers to listen to the demo at https://jea-speech.github.io/JoyVoice
△ Less
Submitted 22 December, 2025;
originally announced December 2025.
-
Effective metric for binaries in framework of EOB theory to fifth PM order
Authors:
Jiliang Jing,
Weike Deng,
Sheng Long
Abstract:
To establish a self-consistent effective one-body (EOB) theory that describes the dynamical evolution of binary systems based on the post-Minkowskian (PM) approximation, where the Hamiltonian, radiation reaction force, and waveforms are derived from an effective metric, the primary objective is to obtain the effective metric. Given that third generation gravitational wave detectors require at leas…
▽ More
To establish a self-consistent effective one-body (EOB) theory that describes the dynamical evolution of binary systems based on the post-Minkowskian (PM) approximation, where the Hamiltonian, radiation reaction force, and waveforms are derived from an effective metric, the primary objective is to obtain the effective metric. Given that third generation gravitational wave detectors require at least fifth-order PM accuracy, in this paper we constructed an effective metric in the EOB theory of binaries up to fifth PM order. The effective metric is of type D, allowing for the derivation of decoupled and variable-separable equations for the null tetrad component of the gravitational perturbed Weyl tensor. This presents a basis for us to establish a self-consistent EOB theory up to 5PM order.
△ Less
Submitted 17 December, 2025;
originally announced December 2025.
-
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
Authors:
Wenlong Deng,
Yushu Li,
Boying Gong,
Yi Ren,
Christos Thrampoulidis,
Xiaoxiao Li
Abstract:
Tool-integrated (TI) reinforcement learning (RL) enables large language models (LLMs) to perform multi-step reasoning by interacting with external tools such as search engines and retrievers. Group Relative Policy Optimization (GRPO), exemplified by the recent Search-R1, offers fast convergence and a value-free formulation that makes it appealing for this setting, yet consistently suffers from tra…
▽ More
Tool-integrated (TI) reinforcement learning (RL) enables large language models (LLMs) to perform multi-step reasoning by interacting with external tools such as search engines and retrievers. Group Relative Policy Optimization (GRPO), exemplified by the recent Search-R1, offers fast convergence and a value-free formulation that makes it appealing for this setting, yet consistently suffers from training collapse. We identify Lazy Likelihood Displacement (LLD), a systematic reduction or stagnation in the likelihood of both correct and incorrect responses, as the core mechanism driving this failure. LLD emerges early and triggers a self-reinforcing LLD Death Spiral, where declining likelihood leads to low-confidence responses, inflating gradients, and ultimately causing collapse. We empirically characterize this process across models on a Search-R1-style, search-integrated question answering task, revealing a consistent three-phase trajectory: early stagnation, steady decay, and accelerated collapse. To address this, we propose a lightweight likelihood-preserving regularization LLDS for GRPO that activates only when a trajectory's likelihood decreases, and regularizes only the tokens responsible. This fine-grained structure mitigates LLD with minimal interference to optimization. Across seven open-domain and multi-hop QA benchmarks, our method stabilizes training, prevents gradient explosion, and yields substantial performance improvements, including +37.8% gains on Qwen2.5-3B and +32.0% gains on Qwen2.5-7B. Our results establish LLD as a fundamental bottleneck in GRPO-based TIRL and provide a practical path toward stable, scalable training of tool-integrated LLM.
△ Less
Submitted 3 December, 2025;
originally announced December 2025.
-
Seeing Twice: How Side-by-Side T2I Comparison Changes Auditing Strategies
Authors:
Matheus Kunzler Maldaner,
Wesley Hanwen Deng,
Jason I. Hong,
Kenneth Holstein,
Motahhare Eslami
Abstract:
While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and utility. A small but growing line of research has explored tools and processes to better engage non-AI expert users in auditing generative AI systems. In this work, we present the design and evaluation of MIRAGE, a web-based tool exploring a "contr…
▽ More
While generative AI systems have gained popularity in diverse applications, their potential to produce harmful outputs limits their trustworthiness and utility. A small but growing line of research has explored tools and processes to better engage non-AI expert users in auditing generative AI systems. In this work, we present the design and evaluation of MIRAGE, a web-based tool exploring a "contrast-first" workflow that allows users to pick up to four different text-to-image (T2I) models, view their images side-by-side, and provide feedback on model performance on a single screen. In our user study with fifteen participants, we used four predefined models for consistency, with only a single model initially being shown. We found that most participants shifted from analyzing individual images to general model output patterns once the side-by-side step appeared with all four models; several participants coined persistent "model personalities" (e.g., cartoonish, saturated) that helped them form expectations about how each model would behave on future prompts. Bilingual participants also surfaced a language-fidelity gap, as English prompts produced more accurate images than Portuguese or Chinese, an issue often overlooked when dealing with a single model. These findings suggest that simple comparative interfaces can accelerate bias discovery and reshape how people think about generative models.
△ Less
Submitted 26 November, 2025;
originally announced November 2025.
-
CLIMATEAGENT: Multi-Agent Orchestration for Complex Climate Data Science Workflows
Authors:
Hyeonjae Kim,
Chenyue Li,
Wen Deng,
Mengxi Jin,
Wen Huang,
Mengqian Lu,
Binhang Yuan
Abstract:
Climate science demands automated workflows to transform comprehensive questions into data-driven statements across massive, heterogeneous datasets. However, generic LLM agents and static scripting pipelines lack climate-specific context and flexibility, thus, perform poorly in practice. We present ClimateAgent, an autonomous multi-agent framework that orchestrates end-to-end climate data analytic…
▽ More
Climate science demands automated workflows to transform comprehensive questions into data-driven statements across massive, heterogeneous datasets. However, generic LLM agents and static scripting pipelines lack climate-specific context and flexibility, thus, perform poorly in practice. We present ClimateAgent, an autonomous multi-agent framework that orchestrates end-to-end climate data analytic workflows. ClimateAgent decomposes user questions into executable sub-tasks coordinated by an Orchestrate-Agent and a Plan-Agent; acquires data via specialized Data-Agents that dynamically introspect APIs to synthesize robust download scripts; and completes analysis and reporting with a Coding-Agent that generates Python code, visualizations, and a final report with a built-in self-correction loop. To enable systematic evaluation, we introduce Climate-Agent-Bench-85, a benchmark of 85 real-world tasks spanning atmospheric rivers, drought, extreme precipitation, heat waves, sea surface temperature, and tropical cyclones. On Climate-Agent-Bench-85, ClimateAgent achieves 100% task completion and a report quality score of 8.32, outperforming GitHub-Copilot (6.27) and a GPT-5 baseline (3.26). These results demonstrate that our multi-agent orchestration with dynamic API awareness and self-correcting execution substantially advances reliable, end-to-end automation for climate science analytic tasks.
△ Less
Submitted 25 November, 2025;
originally announced November 2025.
-
Introducing Discipline Score Based on League Overall Swinging Probability
Authors:
Wuhuan Deng,
Scott Nestler
Abstract:
Plate discipline is an important feature of a hitter's success. Hitter who are able to recognize good pitches to swing at and balls to take are generally recognized as disciplined hitters. Although there are some metrics that can provide insight into the patience of a hitter, most do not capture the ability of a batter to take balls. In this research, we introduce two new metrics, Discipline Score…
▽ More
Plate discipline is an important feature of a hitter's success. Hitter who are able to recognize good pitches to swing at and balls to take are generally recognized as disciplined hitters. Although there are some metrics that can provide insight into the patience of a hitter, most do not capture the ability of a batter to take balls. In this research, we introduce two new metrics, Discipline Score (DS) and Adjusted Discipline Score (ADS), which evaluate batters' discipline when the pitch is a ball compared with the predicted tendencies of all batters in the league.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
A Win-Expectancy Framework for Contextualizing Runs Batted In: Introducing ARBI and CRBI
Authors:
Wuhuan Deng
Abstract:
Runs Batted IN (RBI) records the number of runs a hitter directly drives in during their plate appearances and reflects a batter's ability to convert opportunities into scoring. Because producing runs determines game outcomes, RBI has long served as a central statistic in evaluating offensive performance. However, traditional RBI treats all batted-in runs equally and ignores th game context in whi…
▽ More
Runs Batted IN (RBI) records the number of runs a hitter directly drives in during their plate appearances and reflects a batter's ability to convert opportunities into scoring. Because producing runs determines game outcomes, RBI has long served as a central statistic in evaluating offensive performance. However, traditional RBI treats all batted-in runs equally and ignores th game context in which they occur, such as leverage, score state, and the actual impact of a run on a team's chance of winning. In this paper, we introduce two new context-aware metrics-Adjusted RBI (ARBI) and Contextual RBI (CRBI)-that address the fundamental limitations of RBI by incorporating Win Expectancy (WE). ARBI rescales each RBI according to the change in WE before and after the scoring event, assigning more value to runs that meaningfully shift the likelihood of winning and less to runs scored in low-leverage situations. We then extend this framework to CRBI, which further differentiates RBIs with the same WE change by accounting for the terminal WE at the end of the event. This refinement captures the idea that an RBI increasing WE from, for example, 0.45 to 0.65 has a larger competitive impact than one increasing WE from 0.05 to 0.25, even though both represent a 20% increase. Together, ARBI and CRBI provide calibrated, context-sensitive measures of offensive contribution that more accurately reflect the true value of run production. These metrics modernize the interpretation of RBI and have broad applications in player evaluation, forecasting, contract evaluation, and decision-making in baseball analytics.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
Quasinormal modes of scalar, electromagnetic, and gravitational perturbations in slowly rotating Kalb-Ramond black holes
Authors:
Weike Deng,
Wentao Liu,
Kui Xiao,
Jiliang Jing
Abstract:
We investigate quasinormal modes (QNMs) of scalar, electromagnetic, and axial gravitational perturbations in slowly rotating Kalb-Ramond (KR) black holes, where an antisymmetric tensor field induces spontaneous Lorentz symmetry breaking. Working consistently to first order in the dimensionless spin parameter, we derive the corresponding master equations and compute the QNM spectrum using both the…
▽ More
We investigate quasinormal modes (QNMs) of scalar, electromagnetic, and axial gravitational perturbations in slowly rotating Kalb-Ramond (KR) black holes, where an antisymmetric tensor field induces spontaneous Lorentz symmetry breaking. Working consistently to first order in the dimensionless spin parameter, we derive the corresponding master equations and compute the QNM spectrum using both the continued-fraction and matrix methods, finding excellent agreement. Lorentz violation modifies the oscillation and damping rates in a unified manner across all perturbative sectors: the real part of the QNM frequency increases monotonically with the Lorentz-violating parameter $\ell$, while the imaginary part becomes more negative. Axial gravitational modes exhibit the strongest response, revealing an intrinsic theoretical bound $\ell< 0.5$, beyond which the spectrum approaches an extremal behavior. Our results highlight the potential of gravitational-wave spectroscopy to probe Lorentz-violating signatures in KR gravity.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
Curvature Perturbations from Higgs Modulated Reheating
Authors:
Weiyi Deng,
Chengcheng Han,
Zhanhong Lei,
Jin Min Yang
Abstract:
In this work we investigate curvature perturbations and non-Gaussianity arising from Higgs modulated reheating in the early Universe. We employ three different methods -- the period-averaging (PA) method, the exact method, and the non-perturbative $δN$ formalism -- to compute the power spectrum and bispectrum of curvature perturbations. Our results show that the non-perturbative $δN$ method provid…
▽ More
In this work we investigate curvature perturbations and non-Gaussianity arising from Higgs modulated reheating in the early Universe. We employ three different methods -- the period-averaging (PA) method, the exact method, and the non-perturbative $δN$ formalism -- to compute the power spectrum and bispectrum of curvature perturbations. Our results show that the non-perturbative $δN$ method provides a reliable estimate across a wide range of reheating time and Higgs field values, including regimes where the Higgs field oscillates significantly after inflation. We find that a smaller Higgs self-coupling ($λ$) leads to a larger curvature perturbation, with the non-Gaussianity predominantly taking a local shape. This highlights the importance of considering non-perturbative effects in calculating the curvature perturbation during Higgs modulated reheating, especially for smaller values of $λ$. Our findings offer valuable insights into the dynamics of reheating and the generation of primordial perturbations in the early Universe.
△ Less
Submitted 23 November, 2025;
originally announced November 2025.
-
Diffusion Signals Reveal Hidden Connections: A Physics-Inspired Framework for Link Prediction via Personalized PageRank Signals
Authors:
Huilin Wang Wenjun Zhang Weibing Deng
Abstract:
Link prediction in complex networks--identifying the missing or future connections--remains a cornerstone problem for understanding network evolution and function, yet existing methods struggle to balance computational efficiency with theoretical rigor across heterogeneous topologies. This work introduces a physically principled framework, Diffusion Distance with Personalized PageRank (D-PPR), whi…
▽ More
Link prediction in complex networks--identifying the missing or future connections--remains a cornerstone problem for understanding network evolution and function, yet existing methods struggle to balance computational efficiency with theoretical rigor across heterogeneous topologies. This work introduces a physically principled framework, Diffusion Distance with Personalized PageRank (D-PPR), which unifies static topology with dynamic information flow by modeling nodes as signal sources propagating through the network via Personalized PageRank (PPR) vectors. The method quantifies node-pair similarity through the graph Laplacian-governed diffusion distance between their topology-aware signal distributions, thereby bridging microscopic interactions with macroscopic network dynamics. Systematic benchmarking on synthetic (Barabási-Albert, LFR) and seven large-scale real-world networks spanning technology, biology, and social domains demonstrates that D-PPR achieves highly competitive performance, yielding favorable results when compared to representative local and global heuristics, particularly in sparse and modular networks. These findings establish a rigorous foundation for physics-inspired link prediction by revealing that incorporating dynamical processes into structural similarity metrics enables deeper insights into network connectivity patterns, offering both methodological advances and new theoretical perspectives on the interplay between topology and dynamics.
△ Less
Submitted 14 November, 2025;
originally announced November 2025.
-
Error Analysis on a Novel Class of Exponential Integrators with Local Linear Extension Techniques for Highly Oscillatory ODEs
Authors:
Zhihao Qi,
Weibing Deng,
Fuhai Zhu
Abstract:
This paper studies a class of non-autonomous highly oscillatory ordinary differential equations (ODEs) featuring a linear component inversely proportional to a small parameter $\varepsilon$ with purely imaginary eigenvalues, alongside an $\varepsilon$-independent nonlinear component. When $0<\varepsilon\ll 1$, the rapidly oscillatory solution constrains the step size selection and numerical accura…
▽ More
This paper studies a class of non-autonomous highly oscillatory ordinary differential equations (ODEs) featuring a linear component inversely proportional to a small parameter $\varepsilon$ with purely imaginary eigenvalues, alongside an $\varepsilon$-independent nonlinear component. When $0<\varepsilon\ll 1$, the rapidly oscillatory solution constrains the step size selection and numerical accuracy, resulting in significant computational challenges. Motivated by linearization through introducing auxiliary polynomial variables, a new class of explicit exponential integrators (EIs) has recently been developed. The methods do not require the linear part to be diagonal or with all eigenvalues to be integer multiples of a fixed value - a general assumption in multiscale methods - and attain arbitrarily high convergence order without any order conditions. The main contribution of this work is to establish a rigorous error analysis for the new class of methods. To do this, we first demonstrate the equivalence between the high-dimensional system and the original problem by employing algebraic techniques. Building upon these fundamental results, we prove that the numerical schemes have a uniform convergence order of $O(h^{k+1})$ for the solution when using at most $k$-degree auxiliary polynomial variables with time step sizes smaller than $\varepsilon$. For larger step sizes under the bounded oscillatory energy condition, the methods achieve a convergence order of $O(\varepsilon h^k)$ for the solution. These theoretical results are further applied to second-order oscillatory equations, yielding improved uniform accuracy with respect to $\varepsilon$. Finally, numerical experiments confirm the optimality of the derived error estimates.
△ Less
Submitted 18 November, 2025;
originally announced November 2025.
-
Wide-Field X-ray Polarimetry for High Energy Astronomical Transients: First results of the pathfinder CXPD Cubesat Mission
Authors:
Hong-Bang Liu,
Zu-Ke Feng,
Huan-Bo Feng,
Di-Fan Yi,
Li-Rong Xie,
Yan-Jun Xie,
Zong-Wang Fan,
Jin Zhang,
Wen-Jin Xie,
Xue-Feng Huang,
Wei Deng,
Fei Xie,
Dong Wang,
Zi-Li Li,
Hui Wang,
Ran Chen,
Shi-Qiang Zhou,
Kai Chen,
Jin Li,
Qian Liu,
Shi Chen,
Rui-Ting Ma,
Bin-Long Wang,
Zhen-Yu Tang,
Hang-Zhou Li
, et al. (5 additional authors not shown)
Abstract:
The Low Energy Polarization Detector (LPD) is a key component of the next-generation large-scale Gamma-Ray Burst polarimeter, POLAR-2. It is designed for polarization observations of transient sources in the soft X-ray energy range with a wide field of view (FOV). To validate the key technologies required for wide-FOV X-ray polarization measurements, the Cosmic X-ray Polarization Detector (CXPD) C…
▽ More
The Low Energy Polarization Detector (LPD) is a key component of the next-generation large-scale Gamma-Ray Burst polarimeter, POLAR-2. It is designed for polarization observations of transient sources in the soft X-ray energy range with a wide field of view (FOV). To validate the key technologies required for wide-FOV X-ray polarization measurements, the Cosmic X-ray Polarization Detector (CXPD) CubeSat was developed as a prototype for the LPD. The CXPD is equipped with two Gas Microchannel Plate Pixel Detectors (GMPDs) that measure X-ray polarization via the photoelectric effect, where ejected photoelectrons produce ionization tracks in the gas which are imaged to reconstruct their emission directions. Laboratory calibrations of the modulation factor and energy spectra were successfully performed using linear polarized X-ray sources at 2.98 keV, 4.51 keV, 6.40 keV, and 8.05 keV. Since its launch in June 2023, the CXPD has successfully completed critical in-orbit technology verification. It has also performed polarization observations of two bright X-ray sources Sco X-1 and the transient Swift J1727.8-1613 yielding constraints on their polarization degrees and angles. Notably, this was the first time that an anti-coincidence detector had been implemented in an X-ray polarimeter, enabling in-orbit verification of the charged-particle background rejection algorithm. These results demonstrate the feasibility of wide-field soft X-ray polarization measurements and provide essential guidance for the development of the LPD for the POLAR-2 mission, thereby advancing the frontier of X-ray polarization astronomy.
△ Less
Submitted 17 November, 2025;
originally announced November 2025.
-
Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations
Authors:
Eunkyu Park,
Wesley Hanwen Deng,
Vasudha Varadarajan,
Mingxi Yan,
Gunhee Kim,
Maarten Sap,
Motahhare Eslami
Abstract:
Explanations are often promoted as tools for transparency, but they can also foster confirmation bias; users may assume reasoning is correct whenever outputs appear acceptable. We study this double-edged role of Chain-of-Thought (CoT) explanations in multimodal moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones. Specifically, we analyze reasoning errors i…
▽ More
Explanations are often promoted as tools for transparency, but they can also foster confirmation bias; users may assume reasoning is correct whenever outputs appear acceptable. We study this double-edged role of Chain-of-Thought (CoT) explanations in multimodal moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones. Specifically, we analyze reasoning errors in vision language models (VLMs) and how they impact user trust and the ability to detect errors. Our findings reveal two key effects: (1) users often equate trust with outcome agreement, sustaining reliance even when reasoning is flawed, and (2) the confident tone suppresses error detection while maintaining reliance, showing that delivery styles can override correctness. These results highlight how CoT explanations can simultaneously clarify and mislead, underscoring the need for NLP systems to provide explanations that encourage scrutiny and critical thinking rather than blind trust. All code will be released publicly.
△ Less
Submitted 19 November, 2025; v1 submitted 14 November, 2025;
originally announced November 2025.
-
$CPT$-Symmetric Kähler-Dirac Fermions
Authors:
Latham Boyle,
Wei-Ning Deng
Abstract:
Kähler-Dirac (KD) spinors have generated excitement in the lattice gauge theory community, as a way to (i) deal with the ``fermion doubling" problems that plague ordinary (Dirac, Majorana, or Weyl) spinors when discretized on a lattice, and (ii) help explain the structure of the standard model. But if one naively quantizes this theory in Lorentzian signature, problems arise: half the KD fields hav…
▽ More
Kähler-Dirac (KD) spinors have generated excitement in the lattice gauge theory community, as a way to (i) deal with the ``fermion doubling" problems that plague ordinary (Dirac, Majorana, or Weyl) spinors when discretized on a lattice, and (ii) help explain the structure of the standard model. But if one naively quantizes this theory in Lorentzian signature, problems arise: half the KD fields have the ``wrong sign" Lagrangian, and give rise to negative norm states. Here we propose a new resolution/interpretation: the KD field actually lives on a two-sheeted spacetime, with the sheets related by $PT$ symmetry or, alternatively, by $i\leftrightarrow-i$. And, to avoid any unphysical interactions between the two sheets, the KD field obeys a reality condition (which we call the ``KD-Majorana condition"), which forces every particle on one sheet to be accompanied by a mirror (anti-)particle on the other sheet. We discuss how the standard model fits in this framework, how the fermion (kinetic and Yukawa) terms simplify, and how it may relate to the CPT-symmetric universe model.
△ Less
Submitted 14 November, 2025;
originally announced November 2025.
-
A Stabilized Unfitted Space-time Finite Element Method for Parabolic Problems on Moving Domains
Authors:
Ruizhi Wang,
Weibing Deng
Abstract:
This paper presents a space-time finite element method (FEM) based on an unfitted mesh for solving parabolic problems on moving domains. Unlike other unfitted space-time finite element approaches that commonly employ the discontinuous Galerkin (DG) method for time-stepping, the proposed method employs a fully coupled space-time discretization. To stabilize the time-advection term, the streamline u…
▽ More
This paper presents a space-time finite element method (FEM) based on an unfitted mesh for solving parabolic problems on moving domains. Unlike other unfitted space-time finite element approaches that commonly employ the discontinuous Galerkin (DG) method for time-stepping, the proposed method employs a fully coupled space-time discretization. To stabilize the time-advection term, the streamline upwind Petrov-Galerkin (SUPG) scheme is applied in the temporal direction. A ghost penalty stabilization term is further incorporated to mitigate the small cut issue, thereby ensuring the well-conditioning of the stiffness matrix. Moreover, an a priori error estimate is derived in a discrete energy norm, which achieves an optimal convergence rate with respect to the mesh size. In particular, a space-time Poincare-Friedrichs inequality is established to support the condition number analysis. Several numerical examples are provided to validate the theoretical findings.
△ Less
Submitted 13 November, 2025;
originally announced November 2025.
-
A Two-stage Adaptive Lifting PINN Framework for Solving Viscous Approximations to Hyperbolic Conservation Laws
Authors:
Yameng Zhu,
Weibing Deng,
Ran Bi
Abstract:
Training physics informed neural networks PINNs for hyperbolic conservation laws near the inviscid limit presents considerable difficulties because strong form residuals become ill posed at shock discontinuities, while small viscosity regularization introduces narrow boundary layers that exacerbate spectral bias. To address these issues this paper proposes a novel two stage adaptive lifting PINN,…
▽ More
Training physics informed neural networks PINNs for hyperbolic conservation laws near the inviscid limit presents considerable difficulties because strong form residuals become ill posed at shock discontinuities, while small viscosity regularization introduces narrow boundary layers that exacerbate spectral bias. To address these issues this paper proposes a novel two stage adaptive lifting PINN, a lifting based framework designed to mitigate such challenges without requiring a priori knowledge of the interface geometry. The key idea is to augment the physical coordinates by introducing a learned auxiliary field generated through r adaptive coordinate transformations. Theoretically we first derive an a posteriori L2 error estimate to quantify how training difficulty depends on viscosity. Secondly we provide a statistical interpretation revealing that embedded sampling induces variance reduction analogous to importance sampling. Finally we perform an NTK and gradient flow analysis, demonstrating that input augmentation improves conditioning and accelerates residual decay. Supported by these insights our numerical experiments show accelerated and more stable convergence as well as accurate reconstructions near discontinuities.
△ Less
Submitted 6 November, 2025;
originally announced November 2025.
-
The images of Brans-Dicke-Kerr type naked singularities
Authors:
Fen Long,
Weike Deng,
Xin Qin,
Songbai Chen,
Jiliang Jing
Abstract:
We have studied the images of the Brans-Dicke-Kerr spacetime with a dimensionless Brans-Dicke parameter $ω$, which belongs to axisymmetric rotating solutions in the Brans-Dicke theory. Our results show that the Brans-Dicke-Kerr spacetime with the parameter $ω>-3/2$ represents naked singularities with distinct structures. For the case with $a \leq M$, the shadow in the Brans-Dicke-Kerr spacetime pe…
▽ More
We have studied the images of the Brans-Dicke-Kerr spacetime with a dimensionless Brans-Dicke parameter $ω$, which belongs to axisymmetric rotating solutions in the Brans-Dicke theory. Our results show that the Brans-Dicke-Kerr spacetime with the parameter $ω>-3/2$ represents naked singularities with distinct structures. For the case with $a \leq M$, the shadow in the Brans-Dicke-Kerr spacetime persists, gradually becomes flatter and smaller as $ω$ decreases. Especially when $ω<1/2$, the shadow in the image exhibit a very special ``jellyfish" shape and possesses a self-similar fractal structure. For the case with $a > M$, a distinct gray region consisting of two separate patches appears in the image observed by equatorial observers. This indicating that the Brans-Dicke-Kerr spacetime can be distinguished from the Kerr and Kerr-de Sitter cases based on its image. These effects of the Brans-Dicke parameter could help us to reveal the intrinsic structure of the Brans-Dicke-Kerr spacetimes and provide a foundation for testing Brans-Dicke theory through future high-precision observations.
△ Less
Submitted 17 December, 2025; v1 submitted 3 November, 2025;
originally announced November 2025.
-
Sensor operating point calibration and monitoring of the ALICE Inner Tracking System during LHC Run 3
Authors:
D. Agguiaro,
G. Aglieri Rinella,
L. Aglietta,
M. Agnello,
F. Agnese,
B. Alessandro,
G. Alfarone,
J. Alme,
E. Anderssen,
D. Andreou,
M. Angeletti,
N. Apadula,
P. Atkinson,
C. Azzan,
R. Baccomi,
A. Badalà,
A. Balbino,
P. Barberis,
F. Barile,
L. Barioglio,
R. Barthel,
F. Baruffaldi,
N. K. Behera,
I. Belikov,
A. Benato
, et al. (262 additional authors not shown)
Abstract:
The new Inner Tracking System (ITS2) of the ALICE experiment began operation in 2021 with the start of LHC Run 3. Compared to its predecessor, ITS2 offers substantial improvements in pointing resolution, tracking efficiency at low transverse momenta, and readout-rate capabilities. The detector employs silicon Monolithic Active Pixel Sensors (MAPS) featuring a pixel size of 26.88$\times$29.24 $μ$m…
▽ More
The new Inner Tracking System (ITS2) of the ALICE experiment began operation in 2021 with the start of LHC Run 3. Compared to its predecessor, ITS2 offers substantial improvements in pointing resolution, tracking efficiency at low transverse momenta, and readout-rate capabilities. The detector employs silicon Monolithic Active Pixel Sensors (MAPS) featuring a pixel size of 26.88$\times$29.24 $μ$m$^2$ and an intrinsic spatial resolution of approximately 5 $μ$m. With a remarkably low material budget of 0.36% of radiation length ($X_{0}$) per layer in the three innermost layers and a total sensitive area of about 10 m$^2$, the ITS2 constitutes the largest-scale application of MAPS technology in a high-energy physics experiment and the first of its kind operated at the LHC. For stable data taking, it is crucial to calibrate different parameters of the detector, such as in-pixel charge thresholds and the masking of noisy pixels. The calibration of 24120 monolithic sensors, comprising a total of 12.6$\times$10$^{9}$ pixels, represents a major operational challenge. This paper presents the methods developed for the calibration of the ITS2 and outlines the strategies for monitoring and dynamically adjusting the detector's key performance parameters over time.
△ Less
Submitted 31 October, 2025;
originally announced October 2025.
-
Gravitational waveforms from periodic orbits around a charged black hole with scalar hair
Authors:
Weike Deng,
Sheng Long,
Qin Tan,
Jiliang Jing
Abstract:
We investigate geodesic motion and gravitational-wave signatures of charged black holes with scalar hair. Using the effective potential approach, we analyze marginally bound orbits and innermost stable circular orbits, showing how their positions and energy thresholds are modified by the scalar hair parameter $r_B$. These results demonstrate scalar hair's role in altering the boundary of stable mo…
▽ More
We investigate geodesic motion and gravitational-wave signatures of charged black holes with scalar hair. Using the effective potential approach, we analyze marginally bound orbits and innermost stable circular orbits, showing how their positions and energy thresholds are modified by the scalar hair parameter $r_B$. These results demonstrate scalar hair's role in altering the boundary of stable motion. We further explore periodic orbits characterized by rational frequency ratios, labeled by the index $(z,w,v)$, and quantify how scalar hair affects their orbital energy and angular momentum. Based on these orbital properties, we compute gravitational waveforms from extreme mass-ratio inspirals where a stellar-mass compact object orbits a supermassive charged black hole with scalar hair. Using the numerical kludge method, we generate waveforms that exhibit clear zoom-whirl patterns with morphology visibly affected by $r_B$. Our results show that scalar hair leaves distinguishable imprints on waveforms, suggesting future space-based detectors could probe deviations from classical black hole spacetimes through extreme mass-ratio inspirals observations.
△ Less
Submitted 28 October, 2025;
originally announced October 2025.
-
Delving into Cascaded Instability: A Lipschitz Continuity View on Image Restoration and Object Detection Synergy
Authors:
Qing Zhao,
Weijian Deng,
Pengxu Wei,
ZiYi Dong,
Hannan Lu,
Xiangyang Ji,
Liang Lin
Abstract:
To improve detection robustness in adverse conditions (e.g., haze and low light), image restoration is commonly applied as a pre-processing step to enhance image quality for the detector. However, the functional mismatch between restoration and detection networks can introduce instability and hinder effective integration -- an issue that remains underexplored. We revisit this limitation through th…
▽ More
To improve detection robustness in adverse conditions (e.g., haze and low light), image restoration is commonly applied as a pre-processing step to enhance image quality for the detector. However, the functional mismatch between restoration and detection networks can introduce instability and hinder effective integration -- an issue that remains underexplored. We revisit this limitation through the lens of Lipschitz continuity, analyzing the functional differences between restoration and detection networks in both the input space and the parameter space. Our analysis shows that restoration networks perform smooth, continuous transformations, while object detectors operate with discontinuous decision boundaries, making them highly sensitive to minor perturbations. This mismatch introduces instability in traditional cascade frameworks, where even imperceptible noise from restoration is amplified during detection, disrupting gradient flow and hindering optimization. To address this, we propose Lipschitz-regularized object detection (LROD), a simple yet effective framework that integrates image restoration directly into the detector's feature learning, harmonizing the Lipschitz continuity of both tasks during training. We implement this framework as Lipschitz-regularized YOLO (LR-YOLO), extending seamlessly to existing YOLO detectors. Extensive experiments on haze and low-light benchmarks demonstrate that LR-YOLO consistently improves detection stability, optimization smoothness, and overall accuracy.
△ Less
Submitted 28 October, 2025;
originally announced October 2025.
-
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
Authors:
Christos Thrampoulidis,
Sadegh Mahdavi,
Wenlong Deng
Abstract:
This note reconciles two seemingly distinct approaches to policy gradient optimization for the Pass@K objective in reinforcement learning with verifiable rewards: (1) direct REINFORCE-style methods, and (2) advantage-shaping techniques that directly modify GRPO. We show that these are two sides of the same coin. By reverse-engineering existing advantage-shaping algorithms, we reveal that they impl…
▽ More
This note reconciles two seemingly distinct approaches to policy gradient optimization for the Pass@K objective in reinforcement learning with verifiable rewards: (1) direct REINFORCE-style methods, and (2) advantage-shaping techniques that directly modify GRPO. We show that these are two sides of the same coin. By reverse-engineering existing advantage-shaping algorithms, we reveal that they implicitly optimize surrogate rewards. We specifically interpret practical "hard-example up-weighting" modifications to GRPO as reward-level regularization. Conversely, starting from surrogate reward objectives, we provide a simple recipe for deriving both existing and new advantage-shaping methods. This perspective provides a lens for RLVR policy gradient optimization beyond our original motivation of Pass@K.
△ Less
Submitted 14 November, 2025; v1 submitted 27 October, 2025;
originally announced October 2025.
-
Lossy phononic metamaterials for valley manipulation
Authors:
Shunda Yin,
Qiuyan Zhou,
Yuxiang Xi,
Weiyin Deng,
Wei Chen,
Jiuyang Lu,
Manzhu Ke,
Zhengyou Liu
Abstract:
Non-Hermitian physics characterized by complex band spectra has established a new paradigm in condensed matter systems and metamaterials. Recently, non-Hermitian gain and nonreciprocity are deliberately introduced to valley manipulation, leading to various phenomena beyond the Hermitian scenarios, such as the amplified topological whispering gallery modes as an acoustic laser. In contrast, pure lo…
▽ More
Non-Hermitian physics characterized by complex band spectra has established a new paradigm in condensed matter systems and metamaterials. Recently, non-Hermitian gain and nonreciprocity are deliberately introduced to valley manipulation, leading to various phenomena beyond the Hermitian scenarios, such as the amplified topological whispering gallery modes as an acoustic laser. In contrast, pure loss is inevitable in practice and generally regarded as a detrimental factor. Here, we reveal that the coupling loss can manipulate valley degrees of freedom in a phononic metamaterial. Three distinct valley-related effects, including valley-resolved nonreciprocity that functions as a valley filter, valley-dependent skin effects where bulk states from different valleys localize at opposite boundaries, and valley-projected edge states with boundary-dependent lifetimes that leads to an anomalous beam splitting, are demonstrated through theoretical analysis and airborne sound experiments. Owing to the easy preparation of loss, our findings shed light on both non-Hermitian and valley physics and may facilitate innovative applications of valley-related devices.
△ Less
Submitted 24 October, 2025;
originally announced October 2025.
-
Fake-in-Facext: Towards Fine-Grained Explainable DeepFake Analysis
Authors:
Lixiong Qin,
Yang Zhang,
Mei Wang,
Jiani Hu,
Weihong Deng,
Weiran Xu
Abstract:
The advancement of Multimodal Large Language Models (MLLMs) has bridged the gap between vision and language tasks, enabling the implementation of Explainable DeepFake Analysis (XDFA). However, current methods suffer from a lack of fine-grained awareness: the description of artifacts in data annotation is unreliable and coarse-grained, and the models fail to support the output of connections betwee…
▽ More
The advancement of Multimodal Large Language Models (MLLMs) has bridged the gap between vision and language tasks, enabling the implementation of Explainable DeepFake Analysis (XDFA). However, current methods suffer from a lack of fine-grained awareness: the description of artifacts in data annotation is unreliable and coarse-grained, and the models fail to support the output of connections between textual forgery explanations and the visual evidence of artifacts, as well as the input of queries for arbitrary facial regions. As a result, their responses are not sufficiently grounded in Face Visual Context (Facext). To address this limitation, we propose the Fake-in-Facext (FiFa) framework, with contributions focusing on data annotation and model construction. We first define a Facial Image Concept Tree (FICT) to divide facial images into fine-grained regional concepts, thereby obtaining a more reliable data annotation pipeline, FiFa-Annotator, for forgery explanation. Based on this dedicated data annotation, we introduce a novel Artifact-Grounding Explanation (AGE) task, which generates textual forgery explanations interleaved with segmentation masks of manipulated artifacts. We propose a unified multi-task learning architecture, FiFa-MLLM, to simultaneously support abundant multimodal inputs and outputs for fine-grained Explainable DeepFake Analysis. With multiple auxiliary supervision tasks, FiFa-MLLM can outperform strong baselines on the AGE task and achieve SOTA performance on existing XDFA datasets. The code and data will be made open-source at https://github.com/lxq1000/Fake-in-Facext.
△ Less
Submitted 23 October, 2025;
originally announced October 2025.
-
Quasinormal Modes of Massive Scalar Perturbations in Slow-Rotation Bumblebee Black Holes with Traceless Conformal Electrodynamics
Authors:
Yassine Sekhmani,
Wentao Liu,
Weike Deng,
Kuantay Boshkayev
Abstract:
We study electrically charged, slowly rotating black hole solutions in Einstein-Bumblebee gravity coupled to the traceless (conformal) ModMax nonlinear electrodynamics. By adopting a quadratic bumblebee potential that fixes the vacuum expectation value of the Lorentz-violating vector, we derive both the static configuration and its first-order rotating extension and demonstrate how the bumblebee p…
▽ More
We study electrically charged, slowly rotating black hole solutions in Einstein-Bumblebee gravity coupled to the traceless (conformal) ModMax nonlinear electrodynamics. By adopting a quadratic bumblebee potential that fixes the vacuum expectation value of the Lorentz-violating vector, we derive both the static configuration and its first-order rotating extension and demonstrate how the bumblebee parameter $\ell$ and the ModMax deformation $γ$ modify the horizon structure and the effective electric charge. We further investigate the dynamical properties of this spacetime by considering a massive scalar field perturbation. Using two independent numerical techniques, we compute the quasinormal mode (QNM) spectra and perform a comprehensive analysis of the influence of all relevant parameters, including the black hole spin, the Lorentz-violating coupling, the ModMax deformation, and the scalar field mass. Our results reveal coherent trends in the QNM frequencies, highlighting the interplay between Lorentz-symmetry breaking and nonlinear electrodynamics effects in black hole dynamics.
△ Less
Submitted 18 October, 2025;
originally announced October 2025.
-
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction
Authors:
Tian Xia,
Tianrun Gao,
Wenhao Deng,
Long Wei,
Xiaowei Qian,
Yixian Jiang,
Chenglei Yu,
Tailin Wu
Abstract:
Engineering construction automation aims to transform natural language specifications into physically viable structures, requiring complex integrated reasoning under strict physical constraints. While modern LLMs possess broad knowledge and strong reasoning capabilities that make them promising candidates for this domain, their construction competencies remain largely unevaluated. To address this…
▽ More
Engineering construction automation aims to transform natural language specifications into physically viable structures, requiring complex integrated reasoning under strict physical constraints. While modern LLMs possess broad knowledge and strong reasoning capabilities that make them promising candidates for this domain, their construction competencies remain largely unevaluated. To address this gap, we introduce BuildArena, the first physics-aligned interactive benchmark designed for language-driven engineering construction. It contributes to the community in four aspects: (1) a highly customizable benchmarking framework for in-depth comparison and analysis of LLMs; (2) an extendable task design strategy spanning static and dynamic mechanics across multiple difficulty tiers; (3) a 3D Spatial Geometric Computation Library for supporting construction based on language instructions; (4) a baseline LLM agentic workflow that effectively evaluates diverse model capabilities. On eight frontier LLMs, BuildArena comprehensively evaluates their capabilities for language-driven and physics-grounded construction automation. The project page is at https://build-arena.github.io/.
△ Less
Submitted 31 October, 2025; v1 submitted 18 October, 2025;
originally announced October 2025.
-
High Bandwidth and Ultra-low Dark Current Ge Photodetector Enabled by Frequency Domain Equalization
Authors:
Wenxin Deng,
Hengsong Yue,
Xiaoyan Liu,
Jianhong Liang,
Jianbin Fu,
Shilong Pan,
Tao Chu
Abstract:
High bandwidth and low dark current germanium (Ge) photodetectors are crucial in silicon photonic integrated circuits. The bandwidth of Ge photodetectors is restricted by carrier transit time and parasitic parameters. And thermal generation of carriers within the Ge P-N junction results in an inherent dark current, typically in nA-μA range. Here, we propose an equalization photodetector (EqPD) uti…
▽ More
High bandwidth and low dark current germanium (Ge) photodetectors are crucial in silicon photonic integrated circuits. The bandwidth of Ge photodetectors is restricted by carrier transit time and parasitic parameters. And thermal generation of carriers within the Ge P-N junction results in an inherent dark current, typically in nA-μA range. Here, we propose an equalization photodetector (EqPD) utilizing the frequency response of a high-bandwidth photodetector PDA to subtract the frequency response of a low-bandwidth photodetector PDB. With the response of PDB attenuating more severely than PDA at high frequency, the differential response (the response of EqPD) can get higher values at high-frequency than at low-frequency. The dark current of EqPD can also be significantly reduced with PDB balancing the dark current of PDA. Experimental results show that the bandwidth of our proposed photodetector can be expanded to over 110 GHz with a dark current of 1 pA simultaneously, and its Non-Return-to-Zero (NRZ) transmission speed can reach 100 Gbaud without digital signal processing. To the best of our knowledge, this represents the highest bandwidth and lowest dark current in a vertical Ge photodetector. The high-performance EqPD provides a promising solution for high-speed and ultra-low noise photodetection in next-generation optical communication.
△ Less
Submitted 15 October, 2025;
originally announced October 2025.
-
Characterisation of the first wafer-scale prototype for the ALICE ITS3 upgrade: the monolithic stitched sensor (MOSS)
Authors:
Omar Abdelrahman,
Gianluca Aglieri Rinella,
Luca Aglietta,
Giacomo Alocco,
Matias Antonelli,
Roberto Baccomi,
Francesco Barile,
Pascal Becht,
Franco Benotto,
Stefania Maria Beolè,
Marcello Borri,
Daniela Bortoletto,
Naseem Bouchhar,
Giuseppe Eugenio Bruno,
Matthew Daniel Buckland,
Szymon Bugiel,
Paolo Camerini,
Francesca Carnesecchi,
Marielle Chartier,
Domenico Colella,
Angelo Colelli,
Giacomo Contin,
Giuseppe De Robertis,
Wenjing Deng,
Antonello Di Mauro
, et al. (114 additional authors not shown)
Abstract:
This paper presents the characterisation and testing of the first wafer-scale monolithic stitched sensor (MOSS) prototype developed for the ALICE ITS3 upgrade that is to be installed during the LHC Long Shutdown 3 (2026-2030). The MOSS chip design is driven by the truly cylindrical detector geometry that imposes that each layer is built out of two wafer-sized, bent silicon chips. The stitching tec…
▽ More
This paper presents the characterisation and testing of the first wafer-scale monolithic stitched sensor (MOSS) prototype developed for the ALICE ITS3 upgrade that is to be installed during the LHC Long Shutdown 3 (2026-2030). The MOSS chip design is driven by the truly cylindrical detector geometry that imposes that each layer is built out of two wafer-sized, bent silicon chips. The stitching technique is employed to fabricate sensors with dimensions of 1.4 $\times$ 25.9 cm, thinned to 50 $μ$m. The chip architecture, in-pixel front-end, laboratory and in-beam characterisation, susceptibility to single-event effects, and series testing are discussed. The testing campaign validates the design of a wafer-scale stitched sensor and the performance of the pixel matrix to be within the ITS3 requirements. The MOSS chip demonstrates the feasibility of the ITS3 detector concept and provides insights for further optimisation and development.
△ Less
Submitted 19 December, 2025; v1 submitted 13 October, 2025;
originally announced October 2025.
-
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
Authors:
Kevin Rojas,
Jiahe Lin,
Kashif Rasul,
Anderson Schneider,
Yuriy Nevmyvaka,
Molei Tao,
Wei Deng
Abstract:
Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning to DLMs remains an open challenge because of the intractable likelihood. Pioneering work such as diffu-GRPO estimated token-level likelihoods via one-step unma…
▽ More
Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning to DLMs remains an open challenge because of the intractable likelihood. Pioneering work such as diffu-GRPO estimated token-level likelihoods via one-step unmasking. While computationally efficient, this approach is severely biased. A more principled foundation lies in sequence-level likelihoods, where the evidence lower bound (ELBO) serves as a surrogate. Yet, despite this clean mathematical connection, ELBO-based methods have seen limited adoption due to the prohibitive cost of likelihood evaluation. In this work, we revisit ELBO estimation and disentangle its sources of variance. This decomposition motivates reducing variance through fast, deterministic integral approximations along a few pivotal dimensions. Building on this insight, we introduce Group Diffusion Policy Optimization (GDPO), a new RL algorithm tailored for DLMs. GDPO leverages simple yet effective Semi-deterministic Monte Carlo schemes to mitigate the variance explosion of ELBO estimators under vanilla double Monte Carlo sampling, yielding a provably lower-variance estimator under tight evaluation budgets. Empirically, GDPO achieves consistent gains over pretrained checkpoints and outperforms diffu-GRPO, one of the state-of-the-art baselines, on the majority of math, reasoning, and coding benchmarks.
△ Less
Submitted 29 December, 2025; v1 submitted 9 October, 2025;
originally announced October 2025.
-
Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks
Authors:
Wentao Deng,
Jiahuan Pei,
Zhiwei Xu,
Zhaochun Ren,
Zhumin Chen,
Pengjie Ren
Abstract:
A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. M…
▽ More
A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. Moreover, these methods often involve agents updating their results through indiscriminate collaboration with every other agent. Such uniform interaction fails to identify the optimal collaborators for each agent, hindering the emergence of a stable consensus. To address these challenges, we provide a theoretical framework for selecting optimal collaborators that maximize consensus stability. Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results by 2.23% and 3.95% of accuracy on challenging tasks, respectively. Our code and data are available at https://github.com/dengwentao99/BCCS.
△ Less
Submitted 7 October, 2025;
originally announced October 2025.
-
Vipera: Blending Visual and LLM-Driven Guidance for Systematic Auditing of Text-to-Image Generative AI
Authors:
Yanwei Huang,
Wesley Hanwen Deng,
Sijia Xiao,
Motahhare Eslami,
Jason I. Hong,
Arpit Narechania,
Adam Perer
Abstract:
Despite their increasing capabilities, text-to-image generative AI systems are known to produce biased, offensive, and otherwise problematic outputs. While recent advancements have supported testing and auditing of generative AI, existing auditing methods still face challenges in supporting effectively explore the vast space of AI-generated outputs in a structured way. To address this gap, we cond…
▽ More
Despite their increasing capabilities, text-to-image generative AI systems are known to produce biased, offensive, and otherwise problematic outputs. While recent advancements have supported testing and auditing of generative AI, existing auditing methods still face challenges in supporting effectively explore the vast space of AI-generated outputs in a structured way. To address this gap, we conducted formative studies with five AI auditors and synthesized five design goals for supporting systematic AI audits. Based on these insights, we developed Vipera, an interactive auditing interface that employs multiple visual cues including a scene graph to facilitate image sensemaking and inspire auditors to explore and hierarchically organize the auditing criteria. Additionally, Vipera leverages LLM-powered suggestions to facilitate exploration of unexplored auditing directions. Through a controlled experiment with 24 participants experienced in AI auditing, we demonstrate Vipera's effectiveness in helping auditors navigate large AI output spaces and organize their analyses while engaging with diverse criteria.
△ Less
Submitted 7 October, 2025;
originally announced October 2025.
-
Rethinking Langevin Thompson Sampling from A Stochastic Approximation Perspective
Authors:
Weixin Wang,
Haoyang Zheng,
Guang Lin,
Wei Deng,
Pan Xu
Abstract:
Most existing approximate Thompson Sampling (TS) algorithms for multi-armed bandits use Stochastic Gradient Langevin Dynamics (SGLD) or its variants in each round to sample from the posterior, relaxing the need for conjugacy assumptions between priors and reward distributions in vanilla TS. However, they often require approximating a different posterior distribution in different round of the bandi…
▽ More
Most existing approximate Thompson Sampling (TS) algorithms for multi-armed bandits use Stochastic Gradient Langevin Dynamics (SGLD) or its variants in each round to sample from the posterior, relaxing the need for conjugacy assumptions between priors and reward distributions in vanilla TS. However, they often require approximating a different posterior distribution in different round of the bandit problem. This requires tricky, round-specific tuning of hyperparameters such as dynamic learning rates, causing challenges in both theoretical analysis and practical implementation. To alleviate this non-stationarity, we introduce TS-SA, which incorporates stochastic approximation (SA) within the TS framework. In each round, TS-SA constructs a posterior approximation only using the most recent reward(s), performs a Langevin Monte Carlo (LMC) update, and applies an SA step to average noisy proposals over time. This can be interpreted as approximating a stationary posterior target throughout the entire algorithm, which further yields a fixed step-size, a unified convergence analysis framework, and improved posterior estimates through temporal averaging. We establish near-optimal regret bounds for TS-SA, with a simplified and more intuitive theoretical analysis enabled by interpreting the entire algorithm as a simulation of a stationary SGLD process. Our empirical results demonstrate that even a single-step Langevin update with certain warm-up outperforms existing methods substantially on bandit tasks.
△ Less
Submitted 6 October, 2025;
originally announced October 2025.
-
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
Authors:
Wenhao Deng,
Long Wei,
Chenglei Yu,
Tailin Wu
Abstract:
Reinforcement learning with verifiable rewards (RLVR) has recently enhanced the reasoning capabilities of large language models (LLMs), particularly for mathematical problem solving. However, a fundamental limitation remains: as the sampling budget increases, the advantage of RLVR-trained models over their pretrained bases often diminishes or even vanishes, revealing a strong dependence on the bas…
▽ More
Reinforcement learning with verifiable rewards (RLVR) has recently enhanced the reasoning capabilities of large language models (LLMs), particularly for mathematical problem solving. However, a fundamental limitation remains: as the sampling budget increases, the advantage of RLVR-trained models over their pretrained bases often diminishes or even vanishes, revealing a strong dependence on the base model's restricted search space. We attribute this phenomenon to the widespread use of the reverse Kullback-Leibler (KL) divergence regularizer, whose mode-seeking behavior keeps the policy trapped inside the base model's support region and hampers wider exploration. To address this issue, we propose RAPO (Rewards-Aware Policy Optimization), an algorithm to promote broader yet focused exploration. Our method (i) utilizes the forward KL penalty to replace the reverse KL penalty for out-of-distribution exploration, and (ii) reweights the reference policy to facilitate adaptive in-distribution exploration. We train Qwen2.5-3B and 7B models with RAPO on the 8K SimpleRL-Zero dataset, without supervised fine-tuning, and evaluate them on AIME2024 and AIME2025. Results show that RAPO consistently improves problem-solving performance. Notably, RAPO enables models to surpass the base model's performance ceiling and solves previously intractable problems, advancing the frontier of RLVR for challenging reasoning tasks.
△ Less
Submitted 31 October, 2025; v1 submitted 4 October, 2025;
originally announced October 2025.
-
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
Authors:
Wenlong Deng,
Yi Ren,
Yushu Li,
Boying Gong,
Danica J. Sutherland,
Xiaoxiao Li,
Christos Thrampoulidis
Abstract:
Reinforcement learning with verifiable rewards has significantly advanced the reasoning capabilities of large language models, yet how to explicitly steer training toward exploration or exploitation remains an open problem. We introduce Token Hidden Reward (THR), a token-level metric that quantifies each token's influence on the likelihood of correct responses under Group Relative Policy Optimizat…
▽ More
Reinforcement learning with verifiable rewards has significantly advanced the reasoning capabilities of large language models, yet how to explicitly steer training toward exploration or exploitation remains an open problem. We introduce Token Hidden Reward (THR), a token-level metric that quantifies each token's influence on the likelihood of correct responses under Group Relative Policy Optimization (GRPO). We find that training dynamics are dominated by a small subset of tokens with high absolute THR values. Most interestingly, tokens with positive THR strengthen confidence in correct outputs, thus favoring exploitation, while tokens with negative THR preserve probability mass for alternative outputs, enabling exploration. This insight suggests a natural intervention: a THR-guided reweighting algorithm that modulates GRPO's learning signals to explicitly bias training toward exploitation or exploration. We validate the efficacy of this algorithm on diverse math reasoning benchmarks. By amplifying tokens with positive THR value and weakening negative ones, our algorithm improves greedy-decoding accuracy, favoring exploitation. The reverse strategy yields consistent gains in Pass@K accuracy, favoring exploration. We further demonstrate that our algorithm integrates seamlessly with other RL objectives such as GSPO and generalizes across architectures including Llama. These findings establish THR as a principled and fine-grained mechanism for dynamically controlling exploration and exploitation in RL-tuned LLMs, providing new tools for targeted fine-tuning in reasoning-intensive applications.
△ Less
Submitted 11 November, 2025; v1 submitted 4 October, 2025;
originally announced October 2025.
-
Passive harmonic mode-locked laser on lithium niobate integrated photonics
Authors:
Yu Wang,
Guanyu Han,
Jan-Philipp Koester,
Hans Wenzel,
Wei Wang,
Wenjun Deng,
Ziyao Feng,
Meng Tian,
Andrea Alù,
Andrea Knigge,
Qiushi Guo
Abstract:
Mode-locked lasers (MLLs) are essential for a wide range of photonic applications, such as frequency metrology, biological imaging, and high-bandwidth coherent communications. The growing demand for compact and scalable photonic systems is driving the development of MLLs on various integrated photonics material platforms. Along these lines, developing MLLs on the emerging thin-film lithium niobate…
▽ More
Mode-locked lasers (MLLs) are essential for a wide range of photonic applications, such as frequency metrology, biological imaging, and high-bandwidth coherent communications. The growing demand for compact and scalable photonic systems is driving the development of MLLs on various integrated photonics material platforms. Along these lines, developing MLLs on the emerging thin-film lithium niobate (TFLN) platform holds the promise to greatly broaden the application space of MLLs by harnessing TFLN 's unique electro-optic (E-O) response and quadratic optical nonlinearity. Here, we demonstrate the first electrically pumped, self-starting passive MLL in lithium niobate integrated photonics based on its hybrid integration with a GaAs quantum-well gain medium and saturable absorber. Our demonstrated MLL generates 4.3-ps optical pulses centered around 1060 nm with on-chip peak power exceeding 44 mW. The pulse duration can be further compressed to 1.75 ps via linear dispersion compensation. Remarkably, passive mode-locking occurs exclusively at the second harmonic of the cavity free spectral range, exhibiting a high pulse repetition rate $\sim$20 GHz. We elucidate the temporal dynamics underlying this self-starting passive harmonic mode-locking behavior using a traveling-wave model. Our work offers new insights into the realization of compact, high-repetition-rate MLLs in the TFLN platform, with promising applications for monolithic ultrafast microwave waveform sampling and analog-to-digital conversion.
△ Less
Submitted 7 October, 2025; v1 submitted 3 October, 2025;
originally announced October 2025.
-
Confidence and Dispersity as Signals: Unsupervised Model Evaluation and Ranking
Authors:
Weijian Deng,
Weijie Tu,
Ibrahim Radwan,
Mohammad Abu Alsheikh,
Stephen Gould,
Liang Zheng
Abstract:
Assessing model generalization under distribution shift is essential for real-world deployment, particularly when labeled test data is unavailable. This paper presents a unified and practical framework for unsupervised model evaluation and ranking in two common deployment settings: (1) estimating the accuracy of a fixed model on multiple unlabeled test sets (dataset-centric evaluation), and (2) ra…
▽ More
Assessing model generalization under distribution shift is essential for real-world deployment, particularly when labeled test data is unavailable. This paper presents a unified and practical framework for unsupervised model evaluation and ranking in two common deployment settings: (1) estimating the accuracy of a fixed model on multiple unlabeled test sets (dataset-centric evaluation), and (2) ranking a set of candidate models on a single unlabeled test set (model-centric evaluation). We demonstrate that two intrinsic properties of model predictions, namely confidence (which reflects prediction certainty) and dispersity (which captures the diversity of predicted classes), together provide strong and complementary signals for generalization. We systematically benchmark a set of confidence-based, dispersity-based, and hybrid metrics across a wide range of model architectures, datasets, and distribution shift types. Our results show that hybrid metrics consistently outperform single-aspect metrics on both dataset-centric and model-centric evaluation settings. In particular, the nuclear norm of the prediction matrix provides robust and accurate performance across tasks, including real-world datasets, and maintains reliability under moderate class imbalance. These findings offer a practical and generalizable basis for unsupervised model assessment in deployment scenarios.
△ Less
Submitted 3 October, 2025;
originally announced October 2025.
-
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
Authors:
Haoyang Zheng,
Xinyang Liu,
Cindy Xiangrui Kong,
Nan Jiang,
Zheyuan Hu,
Weijian Luo,
Wei Deng,
Guang Lin
Abstract:
Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained (masked) discrete diffusion language model (dLLM) and distills a few-step student for fast generation. The resulting DiDi-Instruct model achieves comparable or…
▽ More
Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained (masked) discrete diffusion language model (dLLM) and distills a few-step student for fast generation. The resulting DiDi-Instruct model achieves comparable or superior performance to its dLLM teacher and the GPT-2 baseline while enabling up to 64$\times$ acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which yields a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler that significantly improve training stability, model coverage, and inference quality. On OpenWebText, DiDi-Instruct achieves perplexity from 62.2 (8 NFEs) to 18.4 (128 NFEs), which outperforms prior accelerated dLLMs and GPT-2 baseline. These gains come with a negligible entropy loss (around $1\%$) and reduce additional training wall-clock time by more than $20\times$ compared to competing dLLM distillation methods. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, and the generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye. We will release both code and models at github.com/haoyangzheng-ai/didi-instruct.
△ Less
Submitted 1 October, 2025; v1 submitted 29 September, 2025;
originally announced September 2025.
-
"I Don't Think RAI Applies to My Model'' -- Engaging Non-champions with Sticky Stories for Responsible AI Work
Authors:
Nadia Nahar,
Chenyang Yang,
Yanxin Chen,
Wesley Hanwen Deng,
Ken Holstein,
Motahhare Eslami,
Christian Kästner
Abstract:
Responsible AI (RAI) tools -- checklists, templates, and governance processes -- often engage RAI champions, individuals intrinsically motivated to advocate ethical practices, but fail to reach non-champions, who frequently dismiss them as bureaucratic tasks. To explore this gap, we shadowed meetings and interviewed data scientists at an organization, finding that practitioners perceived RAI as ir…
▽ More
Responsible AI (RAI) tools -- checklists, templates, and governance processes -- often engage RAI champions, individuals intrinsically motivated to advocate ethical practices, but fail to reach non-champions, who frequently dismiss them as bureaucratic tasks. To explore this gap, we shadowed meetings and interviewed data scientists at an organization, finding that practitioners perceived RAI as irrelevant to their work. Building on these insights and theoretical foundations, we derived design principles for engaging non-champions, and introduced sticky stories -- narratives of unexpected ML harms designed to be concrete, severe, surprising, diverse, and relevant, unlike widely circulated media to which practitioners are desensitized. Using a compound AI system, we generated and evaluated sticky stories through human and LLM assessments at scale, confirming they embodied the intended qualities. In a study with 29 practitioners, we found that, compared to regular stories, sticky stories significantly increased time spent on harm identification, broadened the range of harms recognized, and fostered deeper reflection.
△ Less
Submitted 26 September, 2025;
originally announced September 2025.
-
Index-MSR: A high-efficiency multimodal fusion framework for speech recognition
Authors:
Jinming Chen,
Lu Wang,
Zheshu Song,
Wei Deng
Abstract:
Driven by large scale datasets and LLM based architectures, automatic speech recognition (ASR) systems have achieved remarkable improvements in accuracy. However, challenges persist for domain-specific terminology, and short utterances lacking semantic coherence, where recognition performance often degrades significantly. In this work, we present Index-MSR, an efficient multimodal speech recogniti…
▽ More
Driven by large scale datasets and LLM based architectures, automatic speech recognition (ASR) systems have achieved remarkable improvements in accuracy. However, challenges persist for domain-specific terminology, and short utterances lacking semantic coherence, where recognition performance often degrades significantly. In this work, we present Index-MSR, an efficient multimodal speech recognition framework. At its core is a novel Multimodal Fusion Decoder (MFD), which effectively incorporates text-related information from videos (e.g., subtitles and presentation slides) into the speech recognition. This cross-modal integration not only enhances overall ASR accuracy but also yields substantial reductions in substitution errors. Extensive evaluations on both an in-house subtitle dataset and a public AVSR dataset demonstrate that Index-MSR achieves sota accuracy, with substitution errors reduced by 20,50%. These results demonstrate that our approach efficiently exploits text-related cues from video to improve speech recognition accuracy, showing strong potential in applications requiring strict audio text synchronization, such as audio translation.
△ Less
Submitted 25 September, 2025;
originally announced September 2025.
-
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
Authors:
Hang Du,
Jiayang Zhang,
Guoshun Nan,
Wendi Deng,
Zhenyan Chen,
Chenyang Zhang,
Wang Xiao,
Shan Huang,
Yuqi Pan,
Tao Qi,
Sicong Leng
Abstract:
Multi-image Interleaved Reasoning aims to improve Multi-modal Large Language Models (MLLMs) ability to jointly comprehend and reason across multiple images and their associated textual contexts, introducing unique challenges beyond single-image or non-interleaved multi-image tasks. While current multi-image benchmarks overlook interleaved textual contexts and neglect distinct relationships between…
▽ More
Multi-image Interleaved Reasoning aims to improve Multi-modal Large Language Models (MLLMs) ability to jointly comprehend and reason across multiple images and their associated textual contexts, introducing unique challenges beyond single-image or non-interleaved multi-image tasks. While current multi-image benchmarks overlook interleaved textual contexts and neglect distinct relationships between individual images and their associated texts, enabling models to reason over multi-image interleaved data may significantly enhance their comprehension of complex scenes and better capture cross-modal correlations. To bridge this gap, we introduce a novel benchmark MIR, requiring joint reasoning over multiple images accompanied by interleaved textual contexts to accurately associate image regions with corresponding texts and logically connect information across images. To enhance MLLMs ability to comprehend multi-image interleaved data, we introduce reasoning steps for each instance within the benchmark and propose a stage-wise curriculum learning strategy. This strategy follows an "easy to hard" approach, progressively guiding models from simple to complex scenarios, thereby enhancing their ability to handle challenging tasks. Extensive experiments benchmarking multiple MLLMs demonstrate that our method significantly enhances models reasoning performance on MIR and other established benchmarks. We believe that MIR will encourage further research into multi-image interleaved reasoning, facilitating advancements in MLLMs capability to handle complex inter-modal tasks.
△ Less
Submitted 15 October, 2025; v1 submitted 21 September, 2025;
originally announced September 2025.
-
Vorticity blow-up for the 3D incompressible Euler equations
Authors:
Wenjie Deng,
Song Jiang,
Minling Li,
Zhaonan Luo
Abstract:
In this paper, we study the finite-time blow-up for classical solutions of the 3D incompressible Euler equations with low-regularity initial vorticity. Applying the self-similar method and stability analysis of the self-similar system in critical Sobolev space, we prove that the vorticity of the axi-symmetric 3D Euler equations develops a finite-time singularity with certain scaling indices. Furth…
▽ More
In this paper, we study the finite-time blow-up for classical solutions of the 3D incompressible Euler equations with low-regularity initial vorticity. Applying the self-similar method and stability analysis of the self-similar system in critical Sobolev space, we prove that the vorticity of the axi-symmetric 3D Euler equations develops a finite-time singularity with certain scaling indices. Furthermore, we investigate the time integrability of the solutions. The proof is based on the new observations for the null structure of the transport term, and the parameter stability of the fundamental self-similar models.
△ Less
Submitted 6 October, 2025; v1 submitted 16 September, 2025;
originally announced September 2025.
-
CMB Constraints on Quantized Spatial Curvature $Ω_K$ in globally CPT-symmetric universes
Authors:
Wei-Ning Deng,
Will Handley
Abstract:
The periodic solution of the Friedmann equation in conformal time, implies that only cosmological perturbations exhibiting corresponding symmetries are physically permissible, leading to a discrete spectrum of allowed wave vectors. Furthermore, in a spatially closed universe, these wave vectors are independently constrained to be integers. Matching these two distinct quantization conditions provid…
▽ More
The periodic solution of the Friedmann equation in conformal time, implies that only cosmological perturbations exhibiting corresponding symmetries are physically permissible, leading to a discrete spectrum of allowed wave vectors. Furthermore, in a spatially closed universe, these wave vectors are independently constrained to be integers. Matching these two distinct quantization conditions provides a novel theoretical constraint on the possible values of spatial curvature. In this work, we numerically solve the cosmological perturbation equations, incorporating radiation anisotropy and higher-order Boltzmann terms, to calculate these discrete wave vectors with improved precision. Subsequently, we generate Cosmic Microwave Background (CMB) power spectra for different characteristic spacings of these quantized wave vectors. Finally, we apply the constraint to Planck 2018 observational data to determine the cosmological parameters. This analysis yields a discrete set of allowed values for the spatial curvature, $Ω_K$, including $[-0.076,-0.039, -0.024, -0.016, -0.012, \dots]$.
△ Less
Submitted 12 September, 2025;
originally announced September 2025.
-
Effective one-body theory of spinless binary evolution dynamics
Authors:
Jiliang Jing,
Sheng Long,
Weike Deng,
Jieci Wang
Abstract:
The effective one-body (EOB) theory provides an innovative framework for analyzing the dynamics of binary systems, as articulated by Hamilton's equations. This paper investigates a self-consistent EOB theory specifically tailored for the dynamics of such systems. Our methodology begins by emphasizing how to effectively utilize the metrics derived from scattering angles in the analysis of binary bl…
▽ More
The effective one-body (EOB) theory provides an innovative framework for analyzing the dynamics of binary systems, as articulated by Hamilton's equations. This paper investigates a self-consistent EOB theory specifically tailored for the dynamics of such systems. Our methodology begins by emphasizing how to effectively utilize the metrics derived from scattering angles in the analysis of binary black hole mergers. We then construct an effective Hamiltonian and formulate a decoupled, variable-separated Teukolsky-like equation for $ψ^B_4$. Furthermore, we present the formal solution to this equation, detailing the energy flux, radiation-reaction force (RRF), and waveforms for the ``plus" and ``cross" modes generated by spinless binaries. Finally, we carry out numerical calculations using the EOB theory and compare the results with numerical relativity (NR) data from the SXS collaboration. The results indicate that to the innermost stable circular orbit, the binding energy -- angular momentum relation differs from the NR results by less than $5$\textperthousand, with a larger mass ratio yielding better agreement.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
Reconstruction and Reenactment Separated Method for Realistic Gaussian Head
Authors:
Zhiling Ye,
Cong Zhou,
Xiubao Zhang,
Haifeng Shen,
Weihong Deng,
Quan Lu
Abstract:
In this paper, we explore a reconstruction and reenactment separated framework for 3D Gaussians head, which requires only a single portrait image as input to generate controllable avatar. Specifically, we developed a large-scale one-shot gaussian head generator built upon WebSSL and employed a two-stage training approach that significantly enhances the capabilities of generalization and high-frequ…
▽ More
In this paper, we explore a reconstruction and reenactment separated framework for 3D Gaussians head, which requires only a single portrait image as input to generate controllable avatar. Specifically, we developed a large-scale one-shot gaussian head generator built upon WebSSL and employed a two-stage training approach that significantly enhances the capabilities of generalization and high-frequency texture reconstruction. During inference, an ultra-lightweight gaussian avatar driven by control signals enables high frame-rate rendering, achieving 90 FPS at a resolution of 512x512. We further demonstrate that the proposed framework follows the scaling law, whereby increasing the parameter scale of the reconstruction module leads to improved performance. Moreover, thanks to the separation design, driving efficiency remains unaffected. Finally, extensive quantitative and qualitative experiments validate that our approach outperforms current state-of-the-art methods.
△ Less
Submitted 17 September, 2025; v1 submitted 5 September, 2025;
originally announced September 2025.
-
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
Authors:
Wesley Hanwen Deng,
Sunnie S. Y. Kim,
Akshita Jha,
Ken Holstein,
Motahhare Eslami,
Lauren Wilcox,
Leon A Gatys
Abstract:
Recent developments in AI governance and safety research have called for red-teaming methods that can effectively surface potential risks posed by AI models. Many of these calls have emphasized how the identities and backgrounds of red-teamers can shape their red-teaming strategies, and thus the kinds of risks they are likely to uncover. While automated red-teaming approaches promise to complement…
▽ More
Recent developments in AI governance and safety research have called for red-teaming methods that can effectively surface potential risks posed by AI models. Many of these calls have emphasized how the identities and backgrounds of red-teamers can shape their red-teaming strategies, and thus the kinds of risks they are likely to uncover. While automated red-teaming approaches promise to complement human red-teaming by enabling larger-scale exploration of model behavior, current approaches do not consider the role of identity. As an initial step towards incorporating people's background and identities in automated red-teaming, we develop and evaluate a novel method, PersonaTeaming, that introduces personas in the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. In particular, we first introduce a methodology for mutating prompts based on either "red-teaming expert" personas or "regular AI user" personas. We then develop a dynamic persona-generating algorithm that automatically generates various persona types adaptive to different seed prompts. In addition, we develop a set of new metrics to explicitly measure the "mutation distance" to complement existing diversity measurements of adversarial prompts. Our experiments show promising improvements (up to 144.1%) in the attack success rates of adversarial prompts through persona mutation, while maintaining prompt diversity, compared to RainbowPlus, a state-of-the-art automated red-teaming method. We discuss the strengths and limitations of different persona types and mutation methods, shedding light on future opportunities to explore complementarities between automated and human red-teaming approaches.
△ Less
Submitted 27 October, 2025; v1 submitted 3 September, 2025;
originally announced September 2025.
-
Optimizing In-Context Learning for Efficient Full Conformal Prediction
Authors:
Weicao Deng,
Sangwoo Park,
Min Li,
Osvaldo Simeone
Abstract:
Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent ap…
▽ More
Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent approaches based on meta-learning or in-context learning (ICL) partially mitigate these drawbacks. However, they rely on training procedures not specifically tailored to CP, which may yield large prediction sets. We introduce an efficient FCP framework, termed enhanced ICL-based FCP (E-ICL+FCP), which employs a permutation-invariant Transformer-based ICL model trained with a CP-aware loss. By simulating the multiple retrained models required by FCP without actual retraining, E-ICL+FCP preserves coverage while markedly reducing both inefficiency and computational overhead. Experiments on synthetic and real tasks demonstrate that E-ICL+FCP attains superior efficiency-coverage trade-offs compared to existing SCP and FCP baselines.
△ Less
Submitted 18 November, 2025; v1 submitted 1 September, 2025;
originally announced September 2025.
-
An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment
Authors:
Can Cui,
Zilong Fu,
Penghe Huang,
Yuanyuan Li,
Wu Deng,
Dongyan Li
Abstract:
Knowledge distillation (KD) is crucial for deploying deep learning models in resource-constrained edge environments, particularly within the consumer electronics sector, including smart home devices, wearable technology, and mobile terminals. These applications place higher demands on model compression and inference speed, necessitating the transfer of knowledge from Graph Neural Networks (GNNs) t…
▽ More
Knowledge distillation (KD) is crucial for deploying deep learning models in resource-constrained edge environments, particularly within the consumer electronics sector, including smart home devices, wearable technology, and mobile terminals. These applications place higher demands on model compression and inference speed, necessitating the transfer of knowledge from Graph Neural Networks (GNNs) to more efficient Multi-Layer Perceptron (MLP) models. However, due to their fixed activation functions and fully connected architecture, MLPs face challenges in rapidly capturing the complex neighborhood dependencies learned by GNNs, thereby limiting their performance in edge environments. To address these limitations, this paper introduces an innovative from GNNs to Kolmogorov-Arnold Networks (KANs) knowledge distillation framework-Self Attention Dynamic Sampling Distillation (SA-DSD). This study improved Fourier KAN (FR-KAN) and replaced MLP with the improved FR-KAN+ as the student model. Through the incorporation of learnable frequency bases and phase-shift mechanisms, along with algorithmic optimization, FR-KAN significantly improves its nonlinear fitting capability while effectively reducing computational complexity. Building on this, a margin-level sampling probability matrix, based on teacher-student prediction consistency, is constructed, and an adaptive weighted loss mechanism is designed to mitigate performance degradation in the student model due to the lack of explicit neighborhood aggregation. Extensive experiments conducted on six real-world datasets demonstrate that SA-DSD achieves performance improvements of 3.05%-3.62% over three GNN teacher models and 15.61% over the FR-KAN+ model. Moreover, when compared with key benchmark models, SA-DSD achieves a 16.96x reduction in parameter count and a 55.75% decrease in inference time.
△ Less
Submitted 30 August, 2025;
originally announced September 2025.
-
Dimensional hierarchy of topological bound states in the continuum
Authors:
Shunda Yin,
Zhenyu Wang,
Liping Ye,
Hailong He,
Manzhu Ke,
Weiyin Deng,
Jiuyang Lu,
Zhengyou Liu
Abstract:
Bound states in the continuum (BICs), with the ability of trapping and manipulating waves within the radiation continuum, have gained significant attention for their potential applications in optics and acoustics. However, challenges arise in reducing wave leakage and noise from fabrication imperfections. The emergence of robust wave manipulations based on topological BICs (TBICs) offers promising…
▽ More
Bound states in the continuum (BICs), with the ability of trapping and manipulating waves within the radiation continuum, have gained significant attention for their potential applications in optics and acoustics. However, challenges arise in reducing wave leakage and noise from fabrication imperfections. The emergence of robust wave manipulations based on topological BICs (TBICs) offers promising solutions. Traditionally, TBICs of different dimensions are observed separately in distinct systems. Here, we report the experimental discovery of the coexistence of two-dimensional surface TBICs and one-dimensional hinge TBICs in a single three-dimensional phononic crystal system. Such an unprecedented dimensional hierarchy of TBICs is triggered by the mechanism of separability and protected by the valley Chern numbers. Notably, these TBICs inherit dispersive propagation characteristics from valley topology and can propagate robustly against defects without leakage. Our findings offer an efficient approach to multidimensional TBICs and can be applied in designing highly efficient acoustic devices for wave trapping and manipulation in multidimensional environments.
△ Less
Submitted 29 August, 2025;
originally announced September 2025.
-
From Drone Imagery to Livability Mapping: AI-powered Environment Perception in Rural China
Authors:
Weihuan Deng,
Yaofu Huang,
Luan Chen,
Xun Li,
Yu Gu,
Yao Yao
Abstract:
The high cost of acquiring rural street view images has constrained comprehensive environmental perception in rural areas. Drone photographs, with their advantages of easy acquisition, broad coverage, and high spatial resolution, offer a viable approach for large-scale rural environmental perception. However, a systematic methodology for identifying key environmental elements from drone photograph…
▽ More
The high cost of acquiring rural street view images has constrained comprehensive environmental perception in rural areas. Drone photographs, with their advantages of easy acquisition, broad coverage, and high spatial resolution, offer a viable approach for large-scale rural environmental perception. However, a systematic methodology for identifying key environmental elements from drone photographs and quantifying their impact on environmental perception remains lacking. To address this gap, a Vision-Language Contrastive Ranking Framework (VLCR) is designed for rural livability assessment in China. The framework employs chain-of-thought prompting strategies to guide multimodal large language models (MLLMs) in identifying visual features related to quality of life and ecological habitability from drone photographs. Subsequently, to address the instability in pairwise village comparison, a text description-constrained drone photograph comparison strategy is proposed. Finally, to overcome the efficiency bottleneck in nationwide pairwise village comparisons, an innovation ranking algorithm based on binary search interpolation is developed, which reduces the number of comparisons through automated selection of comparison targets. The proposed framework achieves superior performance with a Spearman Footrule distance of 0.74, outperforming mainstream commercial MLLMs by approximately 0.1. Moreover, the mechanism of concurrent comparison and ranking demonstrates a threefold enhancement in computational efficiency. Our framework has achieved data innovation and methodological breakthroughs in village livability assessment, providing strong support for large-scale village livability analysis.
Keywords: Drone photographs, Environmental perception, Rural livability assessment, Multimodal large language models, Chain-of-thought prompting.
△ Less
Submitted 2 November, 2025; v1 submitted 29 August, 2025;
originally announced August 2025.