-
Accessing the performance of CC2 for excited state dynamics: a benchmark study with pyrazine
Authors:
Rui-Hao Bi,
Chongxiao Zhao,
Ruixin Sun,
Wenjie Dou
Abstract:
In this work, we access the performance of RI-CC2 for ultrafast internal conversion using pyrazine as a benchmark system. We implement analytical gradients and nonadiabatic coupling vectors for RI-CC2 in the Q-Chem package and employ them in two complementary approaches: a reduced-dimensionality vibronic coupling (VC) model and full-dimensional ab initio on-the-fly trajectory surface hopping simul…
▽ More
In this work, we access the performance of RI-CC2 for ultrafast internal conversion using pyrazine as a benchmark system. We implement analytical gradients and nonadiabatic coupling vectors for RI-CC2 in the Q-Chem package and employ them in two complementary approaches: a reduced-dimensionality vibronic coupling (VC) model and full-dimensional ab initio on-the-fly trajectory surface hopping simulations. To accelerate the on-the-fly dynamics, we employ a diabatic artificial neural network model trained on RI-CC2 data. Both the VC model and the full-dimensional dynamics reveal that the dark $A_\text{1u}$ state actively participates in the internal conversion process. RI-CC2 identifies the $Q_\text{9a}$ and $Q_\text{8a}$ vibrational modes as key drivers of the coherent population transfer between the $A_\text{1u}$ and $B_\text{3u}$. The on-the-fly dynamics reproduce the experimental $B_\text{2u}$ population decay time of 26 fs, consistent with the measured value of $22\pm3$ fs. The high-quality dataset of energies, forces, and nonadiabatic couplings generated here provides a valuable resource for future machine-learning developments, while the stochastic variant sRI-CC2 promises to extend such dynamics to larger molecular systems.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
AnyPro: Preference-Preserving Anycast Optimization based on Strategic AS-Path Prepending
Authors:
Minyuan Zhou,
Yuning Chen,
Jiaqi Zheng,
Yifei Xu,
Pan Hu,
Yongping Tang,
Wendong Yin,
Jie Lin,
Qingyan Yu,
Yuanchao Su,
Guihai Chen,
Wanchun Dou,
Songwu Lu,
Wan Du
Abstract:
Operating large-scale anycast networks is challenging because client-to-site mappings often misalign with operator's expectation due to opaque inter-domain routing. We present AnyPro, the first system to unlock the full potential of AS-path prepending (ASPP), efficiently deriving globally optimal configurations to steer clients toward performance-optimal sites at scale. AnyPro first employs an eff…
▽ More
Operating large-scale anycast networks is challenging because client-to-site mappings often misalign with operator's expectation due to opaque inter-domain routing. We present AnyPro, the first system to unlock the full potential of AS-path prepending (ASPP), efficiently deriving globally optimal configurations to steer clients toward performance-optimal sites at scale. AnyPro first employs an efficient polling mechanism to identify all clients sensitive to ASPP. By analyzing the routing changes during the process, the system derives a set of ASPP constraints that guide client traffic toward the desired sites. We then formulate the anycast optimization problem as a constraint-based program and compute optimal ASPP configurations. Extensive evaluation on a global testbed with 20 PoPs demonstrates the effectiveness of AnyPro: it reduces the 90th percentile latency by 37.7% compared to baseline configurations without ASPP. Furthermore, we show that AnyPro can be integrated with PoP-level anycast optimization techniques to achieve additional performance gains.
△ Less
Submitted 22 March, 2026;
originally announced March 2026.
-
Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos
Authors:
Weijia Dou,
Wenzhao Zheng,
Weiliang Chen,
Yu Zheng,
Jie Zhou,
Jiwen Lu
Abstract:
Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a m…
▽ More
Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.
△ Less
Submitted 19 March, 2026;
originally announced March 2026.
-
Generalized quantum master equation from memory kernel coupling theory
Authors:
Rui-Hao Bi,
Wei Liu,
Wenjie Dou
Abstract:
The generalized quantum master equation provides a powerful framework for non-Markovian dynamics of open quantum systems. However, the accurate and efficient evaluation of the memory kernel remains a challenge. In this work, we introduce a comprehensive tensorial extension to the Memory Kernel Coupling Theory (MKCT) to overcome this bottleneck. By elevating the original scalar formalism to a tenso…
▽ More
The generalized quantum master equation provides a powerful framework for non-Markovian dynamics of open quantum systems. However, the accurate and efficient evaluation of the memory kernel remains a challenge. In this work, we introduce a comprehensive tensorial extension to the Memory Kernel Coupling Theory (MKCT) to overcome this bottleneck. By elevating the original scalar formalism to a tensorial framework, the extended MKCT enables the calculation of general expectation values and cross-correlation functions. We demonstrate the numerical accuracy and efficiency of this method across multiple benchmark systems: capturing transient populations and coherences in the spin-boson model, resolving the excitonic absorption spectrum of the Fenna-Matthews-Olson complex, and simulating charge mobility in one-dimensional lattice models. These successful applications establish the tensorial MKCT as a highly efficient tool for investigating complex dynamics in open quantum systems.
△ Less
Submitted 2 March, 2026;
originally announced March 2026.
-
Projection-Based Memory Kernel Coupling Theory for Quantum Dynamics: A Stable Framework for Non-Markovian Simulations
Authors:
Wei Liu,
Rui-Hao Bi,
Yu Su,
Limin Xu,
Zhennan Zhou,
Yao Wang,
Wenjie Dou
Abstract:
We present a projection-based, stability-preserving methodology for computing time correlation functions in open quantum systems governed by generalized quantum master equations with non-Markovian effects. Building upon the memory kernel coupling theory framework, our approach transforms the memory kernel hierarchy into a system of coupled linear differential equations through Mori-Zwanzig project…
▽ More
We present a projection-based, stability-preserving methodology for computing time correlation functions in open quantum systems governed by generalized quantum master equations with non-Markovian effects. Building upon the memory kernel coupling theory framework, our approach transforms the memory kernel hierarchy into a system of coupled linear differential equations through Mori-Zwanzig projection, followed by spectral projection onto stable eigenmodes to ensure numerical stability. By systematically eliminating unstable modes while preserving the physically relevant dynamics, our method guaranties long-time convergence without introducing artificial damping or ad hoc modifications. The theoretical framework maintains mathematical rigor through orthogonal projection operators and spectral decomposition. Benchmark calculations on the spin-boson model show excellent agreement with exact hierarchical equations of motion results while achieving significant computational efficiency. This approach provides a versatile and reliable framework for simulating non-Markovian dynamics in complex systems.
△ Less
Submitted 11 February, 2026;
originally announced February 2026.
-
DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
Authors:
Fuliang Liu,
Xue Li,
Ketai Zhao,
Yinxi Gao,
Ziyan Zhou,
Zhonghui Zhang,
Zhibin Wang,
Wanchun Dou,
Sheng Zhong,
Chen Tian
Abstract:
Speculative decoding is an effective and lossless approach for accelerating LLM inference. However, existing widely adopted model-based draft designs, such as EAGLE3, improve accuracy at the cost of multi-step autoregressive inference, resulting in high drafting latency and ultimately rendering the drafting stage itself a performance bottleneck. Inspired by diffusion-based large language models (d…
▽ More
Speculative decoding is an effective and lossless approach for accelerating LLM inference. However, existing widely adopted model-based draft designs, such as EAGLE3, improve accuracy at the cost of multi-step autoregressive inference, resulting in high drafting latency and ultimately rendering the drafting stage itself a performance bottleneck. Inspired by diffusion-based large language models (dLLMs), we propose DART, which leverages parallel generation to reduce drafting latency. DART predicts logits for multiple future masked positions in parallel within a single forward pass based on hidden states of the target model, thereby eliminating autoregressive rollouts in the draft model while preserving a lightweight design. Based on these parallel logit predictions, we further introduce an efficient tree pruning algorithm that constructs high-quality draft token trees with N-gram-enforced semantic continuity. DART substantially reduces draft-stage overhead while preserving high draft accuracy, leading to significantly improved end-to-end decoding speed. Experimental results demonstrate that DART achieves a 2.03x--3.44x wall-clock time speedup across multiple datasets, surpassing EAGLE3 by 30% on average and offering a practical speculative decoding framework. Code is released at https://github.com/fvliang/DART.
△ Less
Submitted 27 January, 2026;
originally announced January 2026.
-
RISE: Rule-Driven SQL Dialect Translation via Query Reduction
Authors:
Xudong Xie,
Yuwei Zhang,
Wensheng Dou,
Yu Gao,
Ziyu Cui,
Jiansen Song,
Rui Yang,
Jun Wei
Abstract:
Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation tools rely on manually-crafted rules, necessitating significant manual effort to support new RDBMSs and dialects. Although large language models (LLMs) can assist in translating SQL dialects, they often struggl…
▽ More
Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation tools rely on manually-crafted rules, necessitating significant manual effort to support new RDBMSs and dialects. Although large language models (LLMs) can assist in translating SQL dialects, they often struggle with lengthy and complex SQL queries.
In this paper, we propose RISE, a novel LLM-based SQL dialect translation approach that can accurately handle lengthy and complex SQL queries. Given a complex source query $Q_c$ that contains a SQL dialect $d$, we first employ a dialect-aware query reduction technique to derive a simplified query $Q_{s}$ by removing $d$-irrelevant SQL elements from $Q_c$. Subsequently, we utilize LLMs to translate $Q_{s}$ into $Q_{s^{'}}$, and automatically extract the translation rule $r_d$ for dialect $d$ based on the relationship between $Q_{s}$ and $Q_{s^{'}}$. By applying $r_d$ to $Q_c$, we can effectively translate the dialect $d$ within $Q_c$, thereby bypassing the complexity of the source query $Q_c$. We evaluate RISE on two real-world benchmarks, i.e., TPC-DS and SQLProcBench, comparing its performance against both the traditional rule-based tools and the LLM-based approaches with respect to translation accuracy. RISE achieves accuracies of 97.98% on TPC-DS and 100% on SQLProcBench, outperforming the baselines by an average improvement of 24.62% and 238.41%, respectively.
△ Less
Submitted 9 January, 2026;
originally announced January 2026.
-
Two-Mode Floquet Fewest Switches Surface Hopping for Nonadiabatic Dynamics Driven by Two-Frequency Laser Fields
Authors:
Jiayue Han,
Vahid Mosallanejad,
Ruihao Bi,
Wenjie Dou
Abstract:
Two-frequency (two-color) laser fields provide a powerful and flexible means for steering molecular dynamics. However, quantitatively reliable and scalable theoretical tools for simulating laser-driven nonadiabatic processes under such fields remain limited. Here, we develop a two-mode Floquet fewest switches surface hopping (two-mode F-FSSH) approach for two-frequency driving within a mixed quant…
▽ More
Two-frequency (two-color) laser fields provide a powerful and flexible means for steering molecular dynamics. However, quantitatively reliable and scalable theoretical tools for simulating laser-driven nonadiabatic processes under such fields remain limited. Here, we develop a two-mode Floquet fewest switches surface hopping (two-mode F-FSSH) approach for two-frequency driving within a mixed quantum-classical framework. We validate the algorithm on three driven one-dimensional two-state models: a Rabi model and two avoided-crossing scattering models. The electronic and nuclear dynamics are benchmarked against numerically exact results from split-operator calculations, showing good agreement across a broad range of field parameters and initial conditions. These results establish two-mode F-FSSH as a practical framework for simulating and designing two-frequency control protocols and motivate extensions to more realistic experimental settings.
△ Less
Submitted 7 January, 2026;
originally announced January 2026.
-
Implicitly Restarted Lanczos Enables Chemically-Accurate Shallow Neural Quantum States
Authors:
Wei Liu,
Wenjie Dou
Abstract:
The variational optimization of high-dimensional neural network models, such as those used in neural quantum states (NQS), presents a significant challenge in machine intelligence. Conventional first-order stochastic methods (e.g., Adam) are plagued by slow convergence, sensitivity to hyperparameters, and numerical instability, preventing NQS from reaching the high accuracy required for fundamenta…
▽ More
The variational optimization of high-dimensional neural network models, such as those used in neural quantum states (NQS), presents a significant challenge in machine intelligence. Conventional first-order stochastic methods (e.g., Adam) are plagued by slow convergence, sensitivity to hyperparameters, and numerical instability, preventing NQS from reaching the high accuracy required for fundamental science. We address this fundamental optimization bottleneck by introducing the implicitly restarted Lanczos (IRL) method as the core engine for NQS training. Our key innovation is an inherently stable second-order optimization framework that recasts the ill-conditioned parameter update problem into a small, well-posed Hermitian eigenvalue problem. By solving this problem efficiently and robustly with IRL, our approach automatically determines the optimal descent direction and step size, circumventing the need for demanding hyperparameter tuning and eliminating the numerical instabilities common in standard iterative solvers. We demonstrate that IRL enables shallow NQS architectures (with orders of magnitude fewer parameters) to consistently achieve extreme precision (1e-12 kcal/mol) in just 3 to 5 optimization steps. For the F2 molecule, this translates to an approximate 17,900-fold speed-up in total runtime compared to Adam. This work establishes IRL as a superior, robust, and efficient second-order optimization strategy for variational quantum models, paving the way for the practical, high-fidelity application of neural networks in quantum physics and chemistry.
△ Less
Submitted 4 January, 2026;
originally announced January 2026.
-
Reference Recommendation based Membership Inference Attack against Hybrid-based Recommender Systems
Authors:
Xiaoxiao Chi,
Xuyun Zhang,
Yan Wang,
Hongsheng Hu,
Wanchun Dou
Abstract:
Recommender systems have been widely deployed across various domains such as e-commerce and social media, and intelligently suggest items like products and potential friends to users based on their preferences and interaction history, which are often privacy-sensitive. Recent studies have revealed that recommender systems are prone to membership inference attacks (MIAs), where an attacker aims to…
▽ More
Recommender systems have been widely deployed across various domains such as e-commerce and social media, and intelligently suggest items like products and potential friends to users based on their preferences and interaction history, which are often privacy-sensitive. Recent studies have revealed that recommender systems are prone to membership inference attacks (MIAs), where an attacker aims to infer whether or not a user's data has been used for training a target recommender system. However, existing MIAs fail to exploit the unique characteristic of recommender systems, and therefore are only applicable to mixed recommender systems consisting of two recommendation algorithms. This leaves a gap in investigating MIAs against hybrid-based recommender systems where the same algorithm utilizing user-item historical interactions and attributes of users and items serves and produces personalised recommendations. To investigate how the personalisation in hybrid-based recommender systems influences MIA, we propose a novel metric-based MIA. Specifically, we leverage the characteristic of personalisation to obtain reference recommendation for any target users. Then, a relative membership metric is proposed to exploit a target user's historical interactions, target recommendation, and reference recommendation to infer the membership of the target user's data. Finally, we theoretically and empirically demonstrate the efficacy of the proposed metric-based MIA on hybrid-based recommender systems.
△ Less
Submitted 10 December, 2025;
originally announced December 2025.
-
Orbital Surface Hopping with an Electron Thermostat Yields Accurate Dynamics and Detailed Balance
Authors:
Yongtao Ma,
Wenjie Dou
Abstract:
In mixed quantum-classical simulations of molecule-metal surface interactions, the discretization of the metallic electronic continuum typically results in a closed-system representation that fails to capture the open-system nature of the true physical process. This approximation can introduce significant artifacts, including deviations in the dynamical evolution and a violation of the principle o…
▽ More
In mixed quantum-classical simulations of molecule-metal surface interactions, the discretization of the metallic electronic continuum typically results in a closed-system representation that fails to capture the open-system nature of the true physical process. This approximation can introduce significant artifacts, including deviations in the dynamical evolution and a violation of the principle of detailed balance. To address this fundamental challenge, we introduce an electronic thermostat into our previously developed orbital surface hopping (OSH) framework, generalizing the method to efficiently handle many discrete electronic states. We first outline the derivation of electronic thermostat orbital surface hopping, where the amplitude of the electronic thermostat is well justified. We then demonstrate that this method can reproduce accurate dynamics and detailed balance in long time, whereas without electronic thermostat the detailed balance is violated. Thus, this method offers a reliable tool for studying nonadiabatic dynamics near metal surfaces.
△ Less
Submitted 20 November, 2025;
originally announced November 2025.
-
First measurement of reactor neutrino oscillations at JUNO
Authors:
Angel Abusleme,
Thomas Adam,
Kai Adamowicz,
David Adey,
Shakeel Ahmad,
Rizwan Ahmed,
Timo Ahola,
Sebastiano Aiello,
Fengpeng An,
Guangpeng An,
Costas Andreopoulos,
Giuseppe Andronico,
João Pedro Athayde Marcondes de André,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
Burin Asavapibhop,
Didier Auguste,
Margherita Buizza Avanzini,
Andrej Babic,
Jingzhi Bai,
Weidong Bai,
Nikita Balashov,
Roberto Barbera,
Andrea Barresi
, et al. (1114 additional authors not shown)
Abstract:
Neutrino oscillations, a quantum effect manifesting at macroscopic scales, are governed by lepton flavor mixing angles and neutrino mass-squared differences that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavor framework, determining the mass or…
▽ More
Neutrino oscillations, a quantum effect manifesting at macroscopic scales, are governed by lepton flavor mixing angles and neutrino mass-squared differences that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavor framework, determining the mass ordering of neutrinos, and probing possible new physics. The Jiangmen Underground Neutrino Observatory (JUNO) is a 20 kton liquid-scintillator detector located 52.5 km from multiple reactor cores, designed to resolve the interference pattern of reactor neutrinos with sub-percent precision. Here we report, using the first 59.1 days of data collected since detector completion in August 2025, the first simultaneous high-precision determination of two neutrino oscillation parameters, $\sin^2 θ_{12} = 0.3092\,\pm\,0.0087$ and $Δm^2_{21} = (7.50\,\pm\,0.12)\times10^{-5}\;{\rm eV}^2$ for the normal mass ordering scenario, improving the precision by a factor of 1.6 relative to the combination of all previous measurements. These results advance the basic understanding of neutrinos, validate the detector's design, and confirm JUNO's readiness for its primary goal of resolving the neutrino mass ordering with a larger dataset. The rapid achievement with a short exposure highlights JUNO's potential to push the frontiers of precision neutrino physics and paves the way for its broad scientific program.
△ Less
Submitted 18 November, 2025;
originally announced November 2025.
-
Initial performance results of the JUNO detector
Authors:
Angel Abusleme,
Thomas Adam,
Kai Adamowicz,
David Adey,
Shakeel Ahmad,
Rizwan Ahmed,
Timo Ahola,
Sebastiano Aiello,
Fengpeng An,
Guangpeng An,
Costas Andreopoulos,
Giuseppe Andronico,
João Pedro Athayde Marcondes de André,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
Burin Asavapibhop,
Didier Auguste,
Margherita Buizza Avanzini,
Andrej Babic,
Jingzhi Bai,
Weidong Bai,
Nikita Balashov,
Roberto Barbera,
Andrea Barresi
, et al. (1114 additional authors not shown)
Abstract:
The Jiangmen Underground Neutrino Observatory (JUNO) started physics data taking on 26 August 2025. JUNO consists of a 20-kton liquid scintillator central detector, surrounded by a 35 kton water pool serving as a Cherenkov veto, and almost 1000 m$^2$ of plastic scintillator veto on top. The detector is located in a shallow underground laboratory with an overburden of 1800 m.w.e. This paper present…
▽ More
The Jiangmen Underground Neutrino Observatory (JUNO) started physics data taking on 26 August 2025. JUNO consists of a 20-kton liquid scintillator central detector, surrounded by a 35 kton water pool serving as a Cherenkov veto, and almost 1000 m$^2$ of plastic scintillator veto on top. The detector is located in a shallow underground laboratory with an overburden of 1800 m.w.e. This paper presents the performance results of the detector, extensively studied during the commissioning of the water phase, the subsequent liquid scintillator filling phase, and the first physics runs. The liquid scintillator achieved an attenuation length of 20.6 m at 430 nm, while the high coverage PMT system and scintillator together yielded about 1785 photoelectrons per MeV of energy deposit at the detector centre, measured using the 2.223 MeV $γ$ from neutron captures on hydrogen with an Am-C calibration source. The reconstructed energy resolution is 3.4% for two 0.511 MeV $γ$ at the detector centre and 2.9% for the 0.93 MeV quenched Po-214 alpha decays from natural radioactive sources. The energy nonlinearity is calibrated to better than 1%. Intrinsic contaminations of U-238 and Th-232 in the liquid scintillator are below 10$^{-16}$ g/g, assuming secular equilibrium. The water Cherenkov detector achieves a muon detection efficiency better than 99.9% for muons traversing the liquid scintillator volume. During the initial science runs, the data acquisition duty cycle exceeded 97.8%, demonstrating the excellent stability and readiness of JUNO for high-precision neutrino physics.
△ Less
Submitted 18 November, 2025;
originally announced November 2025.
-
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
Authors:
MiroMind Team,
Song Bai,
Lidong Bing,
Carson Chen,
Guanzheng Chen,
Yuntao Chen,
Zhe Chen,
Ziyi Chen,
Jifeng Dai,
Xuan Dong,
Wenhan Dou,
Yue Deng,
Yunjie Fu,
Junqi Ge,
Chenxia Han,
Tammy Huang,
Zhenhang Huang,
Jerry Jiao,
Shilei Jiang,
Tianyu Jiao,
Xiaoqi Jian,
Lei Lei,
Ruilin Li,
Ryan Luo,
Tiantong Li
, et al. (30 additional authors not shown)
Abstract:
We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of p…
▽ More
We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement. Unlike LLM test-time scaling, which operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. Through reinforcement learning, the model achieves efficient interaction scaling: with a 256K context window, it can perform up to 600 tool calls per task, enabling sustained multi-turn reasoning and complex real-world research workflows. Across four representative benchmarks-GAIA, HLE, BrowseComp, and BrowseComp-ZH-the 72B variant achieves up to 81.9%, 37.7%, 47.1%, and 55.6% accuracy respectively, surpassing previous open-source agents and approaching commercial counterparts such as GPT-5-high. Our analysis reveals that MiroThinker benefits from interactive scaling consistently: research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions, demonstrating that interaction depth exhibits scaling behaviors analogous to model size and context length. These findings establish interaction scaling as a third critical dimension for building next-generation open research agents, complementing model capacity and context windows.
△ Less
Submitted 18 November, 2025; v1 submitted 14 November, 2025;
originally announced November 2025.
-
P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models
Authors:
Mingliang Hou,
Yinuo Wang,
Teng Guo,
Zitao Liu,
Wenzhou Dou,
Jiaqi Zheng,
Renqiang Luo,
Mi Tian,
Weiqi Luo
Abstract:
Cognitive diagnosis models (CDMs) are pivotal for creating fine-grained learner profiles in modern intelligent education platforms. However, these models are trained on sensitive student data, raising significant privacy concerns. While membership inference attacks (MIA) have been studied in various domains, their application to CDMs remains a critical research gap, leaving their privacy risks unq…
▽ More
Cognitive diagnosis models (CDMs) are pivotal for creating fine-grained learner profiles in modern intelligent education platforms. However, these models are trained on sensitive student data, raising significant privacy concerns. While membership inference attacks (MIA) have been studied in various domains, their application to CDMs remains a critical research gap, leaving their privacy risks unquantified. This paper is the first to systematically investigate MIA against CDMs. We introduce a novel and realistic grey box threat model that exploits the explainability features of these platforms, where a model's internal knowledge state vectors are exposed to users through visualizations such as radar charts. We demonstrate that these vectors can be accurately reverse-engineered from such visualizations, creating a potent attack surface. Based on this threat model, we propose a profile-based MIA (P-MIA) framework that leverages both the model's final prediction probabilities and the exposed internal knowledge state vectors as features. Extensive experiments on three real-world datasets against mainstream CDMs show that our grey-box attack significantly outperforms standard black-box baselines. Furthermore, we showcase the utility of P-MIA as an auditing tool by successfully evaluating the efficacy of machine unlearning techniques and revealing their limitations.
△ Less
Submitted 5 November, 2025;
originally announced November 2025.
-
PrivacyCD: Hierarchical Unlearning for Protecting Student Privacy in Cognitive Diagnosis
Authors:
Mingliang Hou,
Yinuo Wang,
Teng Guo,
Zitao Liu,
Wenzhou Dou,
Jiaqi Zheng,
Renqiang Luo,
Mi Tian,
Weiqi Luo
Abstract:
The need to remove specific student data from cognitive diagnosis (CD) models has become a pressing requirement, driven by users' growing assertion of their "right to be forgotten". However, existing CD models are largely designed without privacy considerations and lack effective data unlearning mechanisms. Directly applying general purpose unlearning algorithms is suboptimal, as they struggle to…
▽ More
The need to remove specific student data from cognitive diagnosis (CD) models has become a pressing requirement, driven by users' growing assertion of their "right to be forgotten". However, existing CD models are largely designed without privacy considerations and lack effective data unlearning mechanisms. Directly applying general purpose unlearning algorithms is suboptimal, as they struggle to balance unlearning completeness, model utility, and efficiency when confronted with the unique heterogeneous structure of CD models. To address this, our paper presents the first systematic study of the data unlearning problem for CD models, proposing a novel and efficient algorithm: hierarchical importanceguided forgetting (HIF). Our key insight is that parameter importance in CD models exhibits distinct layer wise characteristics. HIF leverages this via an innovative smoothing mechanism that combines individual and layer, level importance, enabling a more precise distinction of parameters associated with the data to be unlearned. Experiments on three real world datasets show that HIF significantly outperforms baselines on key metrics, offering the first effective solution for CD models to respond to user data removal requests and for deploying high-performance, privacy preserving AI systems
△ Less
Submitted 5 November, 2025;
originally announced November 2025.
-
Investigating Production of TeV-scale Muons in Extensive Air Shower at 2400 Meters Underground
Authors:
Xinshun Zhang,
Shaomin Chen,
Wei Dou,
Haoyang Fu,
Guanghua Gong,
Lei Guo,
Ziyi Guo,
XiangPan Ji,
Jianmin Li,
Jinjing Li,
Bo Liang,
Ye Liang,
Qian Liu,
Wentai Luo,
Ming Qi,
Wenhui Shao,
Haozhe Sun,
Jian Tang,
Yuyi Wang,
Zhe Wang,
Changxu Wei,
Jun Weng,
Yiyang Wu,
Benda Xu,
Chuang Xu
, et al. (10 additional authors not shown)
Abstract:
Deep underground experiments present a new avenue to probe the first interactions in extensive air showers or hadronic interactions in the extreme forward phase space. The China Jinping Underground Laboratory, characterized by a vertical rock overburden of 2,400~m, provides an exceptionally effective shield against cosmic muons with energies below 3~TeV. The surviving high-energy muons, produced i…
▽ More
Deep underground experiments present a new avenue to probe the first interactions in extensive air showers or hadronic interactions in the extreme forward phase space. The China Jinping Underground Laboratory, characterized by a vertical rock overburden of 2,400~m, provides an exceptionally effective shield against cosmic muons with energies below 3~TeV. The surviving high-energy muons, produced in the first interactions of extensive air showers, open a unique observational window into primary cosmic rays from tens of TeV up to the PeV scale and beyond. This distinctive feature also enables detailed studies of charged hadron production in the earliest stages of shower development. Using 1,338.6 live days of data collected with a one-ton prototype detector for the Jinping Neutrino Experiment, we measured the underground muon flux originating from air showers. The results show discrepancies of about 40\% corresponding to significances of more than 2$σ$, relative to predictions from several leading hadronic interaction models. We interpret these findings from two complementary perspectives: (i) by adopting the expected cosmic-ray spectra, we constrain the modeling of the first hadronic interactions in air showers and provide novel insights into resolving the long-standing \textit{muon puzzle}; and (ii) by assuming specific hadronic interaction models, we infer the mass composition of cosmic rays, and our data favor a lighter component in the corresponding energy range. Our study demonstrates the potential of deep underground laboratories to provide new experimental insights into air shower physics and cosmic rays.
△ Less
Submitted 1 April, 2026; v1 submitted 18 October, 2025;
originally announced October 2025.
-
ControlAudio: Tackling Text-Guided, Timing-Indicated and Intelligible Audio Generation via Progressive Diffusion Modeling
Authors:
Yuxuan Jiang,
Zehua Chen,
Zeqian Ju,
Yusheng Dai,
Weibei Dou,
Jun Zhu
Abstract:
Text-to-audio (TTA) generation with fine-grained control signals, e.g., precise timing control or intelligible speech content, has been explored in recent works. However, constrained by data scarcity, their generation performance at scale is still compromised. In this study, we recast controllable TTA generation as a multi-task learning problem and introduce a progressive diffusion modeling approa…
▽ More
Text-to-audio (TTA) generation with fine-grained control signals, e.g., precise timing control or intelligible speech content, has been explored in recent works. However, constrained by data scarcity, their generation performance at scale is still compromised. In this study, we recast controllable TTA generation as a multi-task learning problem and introduce a progressive diffusion modeling approach, ControlAudio. Our method adeptly fits distributions conditioned on more fine-grained information, including text, timing, and phoneme features, through a step-by-step strategy. First, we propose a data construction method spanning both annotation and simulation, augmenting condition information in the sequence of text, timing, and phoneme. Second, at the model training stage, we pretrain a diffusion transformer (DiT) on large-scale text-audio pairs, achieving scalable TTA generation, and then incrementally integrate the timing and phoneme features with unified semantic representations, expanding controllability. Finally, at the inference stage, we propose progressively guided generation, which sequentially emphasizes more fine-grained information, aligning inherently with the coarse-to-fine sampling nature of DiT. Extensive experiments show that ControlAudio achieves state-of-the-art performance in terms of temporal accuracy and speech clarity, significantly outperforming existing methods on both objective and subjective evaluations. Demo samples are available at: https://control-audio.github.io/Control-Audio.
△ Less
Submitted 25 December, 2025; v1 submitted 9 October, 2025;
originally announced October 2025.
-
GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation
Authors:
Weijia Dou,
Xu Zhang,
Yi Bin,
Jian Liu,
Bo Peng,
Guoqing Wang,
Yang Yang,
Heng Tao Shen
Abstract:
Recent attempts to transfer features from 2D Vision-Language Models (VLMs) to 3D semantic segmentation expose a persistent trade-off. Directly projecting 2D features into 3D yields noisy and fragmented predictions, whereas enforcing geometric coherence necessitates costly training pipelines and large-scale annotated 3D data. We argue that this limitation stems from the dominant segmentation-and-ma…
▽ More
Recent attempts to transfer features from 2D Vision-Language Models (VLMs) to 3D semantic segmentation expose a persistent trade-off. Directly projecting 2D features into 3D yields noisy and fragmented predictions, whereas enforcing geometric coherence necessitates costly training pipelines and large-scale annotated 3D data. We argue that this limitation stems from the dominant segmentation-and-matching paradigm, which fails to reconcile 2D semantics with 3D geometric structure. The geometric cues are not eliminated during the 2D-to-3D transfer but remain latent within the noisy and view-aggregated features. To exploit this property, we propose GeoPurify that applies a small Student Affinity Network to purify 2D VLM-generated 3D point features using geometric priors distilled from a 3D self-supervised teacher model. During inference, we devise a Geometry-Guided Pooling module to further denoise the point cloud and ensure the semantic and structural consistency. Benefiting from latent geometric information and the learned affinity network, GeoPurify effectively mitigates the trade-off and achieves superior data efficiency. Extensive experiments on major 3D benchmarks demonstrate that GeoPurify achieves or surpasses state-of-the-art performance while utilizing only about 1.5% of the training data.
△ Less
Submitted 11 February, 2026; v1 submitted 2 October, 2025;
originally announced October 2025.
-
Noise-reduced stochastic resolution of identity to CC2 for large-scale calculations via tensor hypercontraction
Authors:
Chongxiao Zhao,
Wenjie Dou
Abstract:
The stochastic resolution of identity (sRI) approximation significantly reduces the computational scaling of CC2 from O(N^5) to O(N^3), where N is a measure of system size. However, the inherent stochastic noise, while controllable, can introduce substantial errors in energy derivatives, limiting its reliability for molecular dynamics simulations. To mitigate this limitation, we introduce a noise-…
▽ More
The stochastic resolution of identity (sRI) approximation significantly reduces the computational scaling of CC2 from O(N^5) to O(N^3), where N is a measure of system size. However, the inherent stochastic noise, while controllable, can introduce substantial errors in energy derivatives, limiting its reliability for molecular dynamics simulations. To mitigate this limitation, we introduce a noise-reduced approach, termed THC-sRI-CC2, which synergistically combines the sRI framework with tensor hypercontraction (THC). In this formulation, the expensive Coulomb term, which scales as O(N^4), is decoupled via THC, while the time-determining exchange term with an O(N^5) cost is addressed through the sRI scheme, collectively yielding an overall O(N^3) scaling. Benchmarks demonstrate that our THC-sRI-CC2 implementation achieves greater accuracy and markedly reduced stochastic noise compared to conventional sRI-CC2 with identical computational samplings. The resulting O(N^3) scaling substantially extends the applicability of CC2 for excited-state energy calculations and nonadiabatic dynamics simulations of large molecular systems. Furthermore, this work establishes a general THC-sRI hybrid strategy for the development of reduced-scaling electronic structure methods.
△ Less
Submitted 30 September, 2025; v1 submitted 26 September, 2025;
originally announced September 2025.
-
Mixed Quantum-Classical Approaches to Spin Current and Polarization Dynamics in Chiral Molecular Junctions
Authors:
Yu Wang,
Ruihao Bi,
Wei Liu,
Jiayue Han,
Wenjie Dou
Abstract:
Chiral molecular junctions offer a promising platform for realizing chiral-induced spin selectivity (CISS), where spin filtering occurs without external magnetic fields. Here, we investigate spin transport in such junctions by combining quantum master equation (QME) methods for purely electronic dynamics with surface hopping (SH) and mean-field Ehrenfest (MF) approaches to incorporate electron-pho…
▽ More
Chiral molecular junctions offer a promising platform for realizing chiral-induced spin selectivity (CISS), where spin filtering occurs without external magnetic fields. Here, we investigate spin transport in such junctions by combining quantum master equation (QME) methods for purely electronic dynamics with surface hopping (SH) and mean-field Ehrenfest (MF) approaches to incorporate electron-phonon coupling. Our results show that transient spin polarization arises but ultimately decays to zero at long times. We find that bias voltage, molecular length, and spin-orbit coupling (SOC) strongly influence the spin current dynamics: higher bias enhances spin current but reduces polarization, while longer molecules and stronger SOC amplify transient polarization. Including electron-phonon coupling modifies current-voltage characteristics, enhancing spin currents at intermediate bias but suppressing them at high bias, while leaving the polarization dynamics largely unchanged. These findings highlight the interplay between electronic and vibrational effects in CISS and provide guidance for designing molecular spintronic devices.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
From higher-order moments to time correlation functions in strongly correlated systems: A DMRG-based memory kernel coupling theory
Authors:
Yunhao Liu,
Wenjie Dou
Abstract:
We introduce a hybrid approach for computing dynamical observables in strongly correlated systems using higher-order moments. This method integrates memory kernel coupling theory (MKCT) with the density matrix renormalization group (DMRG), extending our recent work on MKCT to strongly correlated systems. The method establishes that correlation functions can be derived from the moments. Within our…
▽ More
We introduce a hybrid approach for computing dynamical observables in strongly correlated systems using higher-order moments. This method integrates memory kernel coupling theory (MKCT) with the density matrix renormalization group (DMRG), extending our recent work on MKCT to strongly correlated systems. The method establishes that correlation functions can be derived from the moments. Within our framework, operators and wavefunctions are represented as matrix product operators (MPOs) and matrix product states (MPSs), respectively. Crucially, the repeated application of the Liouville operator is achieved through an iterative procedure analogous to the DMRG algorithm itself. We demonstrate the effectiveness and efficiency of MKCT-DMRG by computing the spectral function of the Hubbard model. Furthermore, we successfully apply the method to compute the electronic friction in the Hubbard-Holstein model. In all cases, the results show excellent agreement with time-dependent DMRG (TD-DMRG) benchmarks. The advantage of MKCT-DMRG over TD-DMRG is the computational efficiency, which avoids expensive real-time propagation in TD-DMRG. These findings establish MKCT-DMRG as a promising and accurate framework for simulating challenging dynamical properties in strongly correlated quantum systems.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
Chameleon: Taming Dynamic Operator Sequences for Memory-Intensive LLM Training
Authors:
Zibo Wang,
Yuhang Zhou,
Zhibin Wang,
Shipeng Li,
Xinjing Huang,
Chendong Cai,
Bingxu Mu,
Yuqing Sun,
Zhiheng Hu,
Bin She,
Shu You,
Guanghuan Fang,
Rong Gu,
Wanchun Dou,
Guihai Chen,
Chen Tian
Abstract:
The increasing size of large language models (LLMs) has led to a surge in memory requirements during training, often exceeding the capacity of high-bandwidth memory (HBM). Swap-based memory optimization incurs neither accuracy loss nor additional end-to-end overhead when effectively overlapped, thus being an attractive solution. However, existing swap methods assume consistent operator sequences,…
▽ More
The increasing size of large language models (LLMs) has led to a surge in memory requirements during training, often exceeding the capacity of high-bandwidth memory (HBM). Swap-based memory optimization incurs neither accuracy loss nor additional end-to-end overhead when effectively overlapped, thus being an attractive solution. However, existing swap methods assume consistent operator sequences, which is impractical in Eager Mode, where operator sequences can vary during change.
We propose Chameleon, which redesigns the end-to-end process of swap-based memory optimization and is the first work to consider varying operator sequences in Eager Mode. Chameleon (i) introduces a lightweight online profiler to enable continuous profiling for monitoring operator sequences, (ii) generates effective swap policies with limited operator information, and (iii) optimizes the policy execution module for accurate policy application and better performance. Experimental results demonstrate that Chameleon reduces profiling overhead by 84.25%, enables training models up to 4x larger than hardware memory while adapting to changes in operator sequences, improves performance by up to 38.94% compared to recomputation or high-degree parallelism.
△ Less
Submitted 13 September, 2025;
originally announced September 2025.
-
Stochastic resolution of identity to CC2 for large systems: Excited-state gradients and derivative couplings
Authors:
Chongxiao Zhao,
Chenyang Li,
Wenjie Dou
Abstract:
Excited-state gradients and derivative couplings are critical for simulating excited-state dynamics. However, their calculations are very expensive within the coupled-cluster framework due to the steep scaling. In this work, we present two implementations of stochastic resolution of identity to CC2 (sRI-CC2) for excited-state analytical gradients and derivative couplings. The first method employs…
▽ More
Excited-state gradients and derivative couplings are critical for simulating excited-state dynamics. However, their calculations are very expensive within the coupled-cluster framework due to the steep scaling. In this work, we present two implementations of stochastic resolution of identity to CC2 (sRI-CC2) for excited-state analytical gradients and derivative couplings. The first method employs sRI for both Coulomb and exchange terms, reducing the formal scaling to cubic. However, this method has a significant stochastic noise. Consequently, we introduce a substitute, termed partial sRI-CC2, which applies sRI selectively to the exchange terms only. The partial sRI-CC2 shows a quartic scaling with a modest prefactor, rendering it a practical alternative. Compared to conventional RI-CC2, the partial sRI-CC2 can handle systems with hundreds or even thousands of electrons. This work is an extension to our previous implementation of sRI-CC2 method and provides essential ingredients for large-scale nonadiabatic dynamics.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
Electronic frictional effects near metal surfaces with strong correlations
Authors:
Yunhao Liu,
Wenjie Dou
Abstract:
The electronic friction-Langevin dynamics (EF-LD) offers a simplified framework for describing nonadiabatic effects at metal surfaces, particularly in electrochemical and molecular electronic applications. We investigate the electronic friction behavior for the Hubbard-Holstein model using density matrix renormalization group (DMRG) theory. We show that electron-electron interactions lead to the f…
▽ More
The electronic friction-Langevin dynamics (EF-LD) offers a simplified framework for describing nonadiabatic effects at metal surfaces, particularly in electrochemical and molecular electronic applications. We investigate the electronic friction behavior for the Hubbard-Holstein model using density matrix renormalization group (DMRG) theory. We show that electron-electron interactions lead to the formation of two energy levels in the impurity, resulting in two peaks in the electronic friction at the resonances of electron attachment or detachment with the metal's Fermi level. We further benchmark our results against mean field theory (MFT) and exact diagonalization (ED). The results calculated by ED and DMRG show strong agreement at high temperatures, suggesting the results from DMRG are reliable; however, at low temperatures, ED exhibits significant deviations relative to DMRG due to the finite-size limitations inherent in ED calculations. MFT completely fails to recover Fermi resonance in electronic friction. Moreover, we investigate the dynamics of the electronic friction using EF-LD. Simulations reveal differences between the electronic population and kinetic energy dynamics predicted by MFT and DMRG approaches, suggesting that MFT approach is unreliable for nonadiabatic dynamics of strongly correlated systems.
△ Less
Submitted 30 August, 2025;
originally announced September 2025.
-
Human-in-the-Loop Simulation for Real-Time Exploration of HVAC Demand Flexibility
Authors:
Xinlei Zhou,
Han Du,
Emily W. Yap,
Wanbin Dou,
Mingyang Huang,
Zhenjun Ma
Abstract:
The increasing integration of renewable energy into the power grid has highlighted the critical importance of demand-side flexibility. Among flexible loads, heating, ventilation, and air-conditioning (HVAC) systems are particularly significant due to their high energy consumption and controllability. This study presents the development of an interactive simulation platform that integrates a high-f…
▽ More
The increasing integration of renewable energy into the power grid has highlighted the critical importance of demand-side flexibility. Among flexible loads, heating, ventilation, and air-conditioning (HVAC) systems are particularly significant due to their high energy consumption and controllability. This study presents the development of an interactive simulation platform that integrates a high-fidelity simulation engine with a user-facing dashboard, specifically designed to explore and demonstrate the demand flexibility capacity of HVAC systems. Unlike conventional simulations, where users are passive observers of simulation results with no ability to intervene in the embedded control during the simulation, this platform transforms them into active participants. Users can override system default control settings, such as zone temperature setpoints and HVAC schedules, at any point during the simulation runtime to implement demand response strategies of their choice. This human-in-the-loop capability enables real-time interaction and allows users to observe the immediate impact of their actions, emulating the practical decision-making process of a building or system operator. By exploring different demand flexibility scenarios and system behaviour in a manner that reflects real-world operation, users gain a deeper understanding of demand flexibility and their impacts. This interactive experience builds confidence and supports more informed decision-making in the practical adoption of demand-side flexibility. This paper presents the architecture of the simulation platform, user-oriented dashboard design, and user case showcase. The introduced human-in-the-loop simulation paradigm offers a more intuitive and interactive means of engaging with grid-interactive building operations, extending beyond HVAC demand flexibility exploration.
△ Less
Submitted 10 August, 2025;
originally announced August 2025.
-
Correcting Misperceptions at a Glance: Using Data Visualizations to Reduce Political Sectarianism
Authors:
Douglas Markant,
Subham Sah,
Alireza Karduni,
Milad Rogha,
My Thai,
Wenwen Dou
Abstract:
Political sectarianism is fueled in part by misperceptions of political opponents: People commonly overestimate the support for extreme policies among members of the other party. Research suggests that correcting partisan misperceptions by informing people about the actual views of outparty members may reduce one's own expressed support for political extremism, including partisan violence and anti…
▽ More
Political sectarianism is fueled in part by misperceptions of political opponents: People commonly overestimate the support for extreme policies among members of the other party. Research suggests that correcting partisan misperceptions by informing people about the actual views of outparty members may reduce one's own expressed support for political extremism, including partisan violence and anti-democratic actions. The present study investigated how correction effects depend on different representations of outparty views communicated through data visualizations. We conducted an experiment with U.S. based participants from Prolific (N=239 Democrats, N=244 Republicans). Participants made predictions about support for political violence and undemocratic practices among members of their political outparty. They were then presented with data from an earlier survey on the actual views of outparty members. Some participants viewed only the average response (Mean-Only condition), while other groups were shown visual representations of the range of views from 75% of the outparty (Mean+Interval condition) or the full distribution of responses (Mean+Points condition). Compared to a control group that was not informed about outparty views, we observed the strongest correction effects among participants in the Mean-only and Mean+Points condition, while correction effects were weaker in the Mean+Interval condition. In addition, participants who observed the full distribution of out-party views (Mean+Points condition) were most accurate at later recalling the degree of support among the outparty. Our findings suggest that data visualizations can be an important tool for correcting pervasive distortions in beliefs about other groups. However, the way in which variability in outparty views is visualized can significantly shape how people interpret and respond to corrective information.
△ Less
Submitted 31 July, 2025;
originally announced August 2025.
-
Rethink Domain Generalization in Heterogeneous Sequence MRI Segmentation
Authors:
Zheyuan Zhang,
Linkai Peng,
Wanying Dou,
Cuiling Sun,
Halil Ertugrul Aktas,
Andrea M. Bejar,
Elif Keles,
Gorkem Durak,
Ulas Bagci
Abstract:
Clinical magnetic-resonance (MR) protocols generate many T1 and T2 sequences whose appearance differs more than the acquisition sites that produce them. Existing domain-generalization benchmarks focus almost on cross-center shifts and overlook this dominant source of variability. Pancreas segmentation remains a major challenge in abdominal imaging: the gland is small, irregularly, surrounded by or…
▽ More
Clinical magnetic-resonance (MR) protocols generate many T1 and T2 sequences whose appearance differs more than the acquisition sites that produce them. Existing domain-generalization benchmarks focus almost on cross-center shifts and overlook this dominant source of variability. Pancreas segmentation remains a major challenge in abdominal imaging: the gland is small, irregularly, surrounded by organs and fat, and often suffers from low T1 contrast. State-of-the-art deep networks that already achieve >90% Dice on the liver or kidneys still miss 20-30% of the pancreas. The organ is also systematically under-represented in public cross-domain benchmarks, despite its clinical importance in early cancer detection, surgery, and diabetes research. To close this gap, we present PancreasDG, a large-scale multi-center 3D MRI pancreas segmentation dataset for investigating domain generalization in medical imaging. The dataset comprises 563 MRI scans from six institutions, spanning both venous phase and out-of-phase sequences, enabling study of both cross-center and cross-sequence variations with pixel-accurate pancreas masks created by a double-blind, two-pass protocol. Through comprehensive analysis, we reveal three insights: (i) limited sampling introduces significant variance that may be mistaken for distribution shifts, (ii) cross-center performance correlates with source domain performance for identical sequences, and (iii) cross-sequence shifts require specialized solutions. We also propose a semi-supervised approach that leverages anatomical invariances, significantly outperforming state-of-the-art domain generalization techniques with 61.63% Dice score improvements and 87.00% on two test centers for cross-sequence segmentation. PancreasDG sets a new benchmark for domain generalization in medical imaging. Dataset, code, and models will be available at https://pancreasdg.netlify.app.
△ Less
Submitted 30 July, 2025;
originally announced July 2025.
-
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models
Authors:
Gen Luo,
Wenhan Dou,
Wenhao Li,
Zhaokai Wang,
Xue Yang,
Changyao Tian,
Hao Li,
Weiyun Wang,
Wenhai Wang,
Xizhou Zhu,
Yu Qiao,
Jifeng Dai
Abstract:
This paper focuses on monolithic Multimodal Large Language Models (MLLMs), which integrate visual encoding and language decoding into a single model. Existing structures and pre-training strategies for monolithic MLLMs often suffer from unstable optimization and catastrophic forgetting. To address these challenges, our key idea is to embed a new visual parameter space into a pre-trained LLM, enabl…
▽ More
This paper focuses on monolithic Multimodal Large Language Models (MLLMs), which integrate visual encoding and language decoding into a single model. Existing structures and pre-training strategies for monolithic MLLMs often suffer from unstable optimization and catastrophic forgetting. To address these challenges, our key idea is to embed a new visual parameter space into a pre-trained LLM, enabling stable learning of visual knowledge from noisy data via delta tuning. Based on this principle, we first introduce Mono-InternVL, an advanced monolithic MLLM that incorporates a set of visual experts through a multimodal mixture-of-experts architecture. In addition, we design an innovative Endogenous Visual Pre-training (EViP) for Mono-InternVL to maximize its visual capabilities via progressive learning. Mono-InternVL achieves competitive performance against existing MLLMs but also leads to relatively expensive data cost. Therefore, we further present Mono-InternVL-1.5, a cheaper and stronger monolithic MLLM equipped with an improved EViP (EViP++). EViP++ introduces additional visual attention experts to Mono-InternVL-1.5 and re-organizes the pre-training process in an efficient manner. During inference, it includes a fused CUDA kernel to speed up its MoE operations. With these designs, Mono-InternVL-1.5 significantly reduces training and inference costs, while still maintaining competitive performance with Mono-InternVL. To evaluate our approach, we conduct extensive experiments across 15 benchmarks. Results demonstrate that Mono-InternVL outperforms existing monolithic MLLMs on 12 out of 15 benchmarks, e.g., +114-point improvement over Emu3 on OCRBench. Compared to its modular counterpart, i.e., InternVL-1.5, Mono-InternVL-1.5 achieves similar multimodal performance while reducing first-token latency by up to 69%. Code and models are released at https://github.com/OpenGVLab/Mono-InternVL.
△ Less
Submitted 16 July, 2025;
originally announced July 2025.
-
FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation
Authors:
Yuxuan Jiang,
Zehua Chen,
Zeqian Ju,
Chang Li,
Weibei Dou,
Jun Zhu
Abstract:
Text-to-audio (T2A) generation has achieved promising results with the recent advances in generative models. However, because of the limited quality and quantity of temporally-aligned audio-text pairs, existing T2A methods struggle to handle the complex text prompts that contain precise timing control, e.g., "owl hooted at 2.4s-5.2s". Recent works have explored data augmentation techniques or intr…
▽ More
Text-to-audio (T2A) generation has achieved promising results with the recent advances in generative models. However, because of the limited quality and quantity of temporally-aligned audio-text pairs, existing T2A methods struggle to handle the complex text prompts that contain precise timing control, e.g., "owl hooted at 2.4s-5.2s". Recent works have explored data augmentation techniques or introduced timing conditions as model inputs to enable timing-conditioned 10-second T2A generation, while their synthesis quality is still limited. In this work, we propose a novel training-free timing-controlled T2A framework, FreeAudio, making the first attempt to enable timing-controlled long-form T2A generation, e.g., "owl hooted at 2.4s-5.2s and crickets chirping at 0s-24s". Specifically, we first employ an LLM to plan non-overlapping time windows and recaption each with a refined natural language description, based on the input text and timing prompts. Then we introduce: 1) Decoupling and Aggregating Attention Control for precise timing control; 2) Contextual Latent Composition for local smoothness and Reference Guidance for global consistency. Extensive experiments show that: 1) FreeAudio achieves state-of-the-art timing-conditioned T2A synthesis quality among training-free methods and is comparable to leading training-based methods; 2) FreeAudio demonstrates comparable long-form generation quality with training-based Stable Audio and paves the way for timing-controlled long-form T2A synthesis. Demo samples are available at: https://freeaudio.github.io/FreeAudio/
△ Less
Submitted 17 September, 2025; v1 submitted 11 July, 2025;
originally announced July 2025.
-
Scalable Neural Quantum State based Kernel Polynomial Method for Optical Properties from the First Principle
Authors:
Wei Liu,
Rui-Hao Bi,
Wenjie Dou
Abstract:
Variational optimization of neural-network quantum state representations has achieved FCI-level accuracy for ground state calculations, yet computing optical properties involving excited states remains challenging. In this work, we present a neural-network-based variational quantum Monte Carlo approach for ab-initio absorption spectra. We leverage parallel batch autoregressive sampling and GPU-sup…
▽ More
Variational optimization of neural-network quantum state representations has achieved FCI-level accuracy for ground state calculations, yet computing optical properties involving excited states remains challenging. In this work, we present a neural-network-based variational quantum Monte Carlo approach for ab-initio absorption spectra. We leverage parallel batch autoregressive sampling and GPU-supported local energy parallelism to efficiently compute ground states of complex systems. By integrating neural quantum ground states with the kernel polynomial method, our approach accurately calculates absorption spectra for large molecules with over 50 electrons, achieving FCI-level precision. The proposed algorithm demonstrates superior scalability and reduced runtime compared to FCI, marking a significant step forward in optical property calculations for large-scale quantum systems.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Tunable spin-phonon polarons in a chiral molecular qubit framework
Authors:
Aimei Zhou,
Ruihao Bi,
Zhenghan Zhang,
Luming Yang,
Xudong Tian,
Denan Li,
Yingchao Wang,
Mingshu Tan,
Weibin Ni,
Haozhou Sun,
Jinkun Guo,
Xiaohe Miao,
Xinxing Zhao,
Zhifu Shi,
Wei Tong,
Zhitao Zhang,
Jiandong Feng,
Jin-Hu Dou,
Feng Jin,
Shi Liu,
Mircea Dinca,
Tijana Rajh,
Jian Li,
Wenjie Dou,
Lei Sun
Abstract:
Chiral structures that produce asymmetric spin-phonon coupling can theoretically generate spin-phonon polarons -- quasiparticles exhibiting non-degenerate spin states with phonon displacements. These quasiparticles are speculated to be the origin of chirality-induced spin selectivity and presumably can display exotic dynamic behaviors. However, direct experimental evidence of spin-phonon polarons…
▽ More
Chiral structures that produce asymmetric spin-phonon coupling can theoretically generate spin-phonon polarons -- quasiparticles exhibiting non-degenerate spin states with phonon displacements. These quasiparticles are speculated to be the origin of chirality-induced spin selectivity and presumably can display exotic dynamic behaviors. However, direct experimental evidence of spin-phonon polarons has been lacking. Using a chiral molecular qubit framework embedding stable semiquinone-like radicals, we report spin dynamic signatures that indicate the formation of spin-phonon polarons for the first time. Our non-adiabatic model reveals that these quasiparticles introduce an active spin relaxation channel when polaron reorganization energy approaches Zeeman splitting. This new channel manifests itself as anomalous, temperature-independent spin relaxation, which can be suppressed by high magnetic fields or pore-filling solvents (e.g. CH2Cl2, CS2). Such field- and guest-tunable relaxation is unattainable in conventional spin systems. Harnessing this mechanism could boost repetition rates in spin-based quantum information technologies without compromising coherence or quantum sensing performance.
△ Less
Submitted 20 January, 2026; v1 submitted 5 June, 2025;
originally announced June 2025.
-
Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction
Authors:
Wei Shen,
Xiaonan He,
Chuheng Zhang,
Xuyun Zhang,
Xiaolong Xu,
Wanchun Dou
Abstract:
Reward-driven proactive dialogue agents require precise estimation of user satisfaction as an intrinsic reward signal to determine optimal interaction strategies. Specifically, this framework triggers clarification questions when detecting potential user dissatisfaction during interactions in the industrial dialogue system. Traditional works typically rely on training a neural network model based…
▽ More
Reward-driven proactive dialogue agents require precise estimation of user satisfaction as an intrinsic reward signal to determine optimal interaction strategies. Specifically, this framework triggers clarification questions when detecting potential user dissatisfaction during interactions in the industrial dialogue system. Traditional works typically rely on training a neural network model based on weak labels which are generated by a simple model trained on user actions after current turn. However, existing methods suffer from two critical limitations in real-world scenarios: (1) Noisy Reward Supervision, dependence on weak labels derived from post-hoc user actions introduces bias, particularly failing to capture satisfaction signals in ASR-error-induced utterances; (2) Long-Tail Feedback Sparsity, the power-law distribution of user queries causes reward prediction accuracy to drop in low-frequency domains. The noise in the weak labels and a power-law distribution of user utterances results in that the model is hard to learn good representation of user utterances and sessions. To address these limitations, we propose two auxiliary tasks to improve the representation learning of user utterances and sessions that enhance user satisfaction prediction. The first one is a contrastive self-supervised learning task, which helps the model learn the representation of rare user utterances and identify ASR errors. The second one is a domain-intent classification task, which aids the model in learning the representation of user sessions from long-tailed domains and improving the model's performance on such domains. The proposed method is evaluated on DuerOS, demonstrating significant improvements in the accuracy of error recognition on rare user utterances and long-tailed domains.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
Predicting Risk of Pulmonary Fibrosis Formation in PASC Patients
Authors:
Wanying Dou,
Gorkem Durak,
Koushik Biswas,
Ziliang Hong,
Andrea Mia Bejar,
Elif Keles,
Kaan Akin,
Sukru Mehmet Erturk,
Alpay Medetalibeyoglu,
Marc Sala,
Alexander Misharin,
Hatice Savas,
Mary Salvatore,
Sachin Jambawalikar,
Drew Torigian,
Jayaram K. Udupa,
Ulas Bagci
Abstract:
While the acute phase of the COVID-19 pandemic has subsided, its long-term effects persist through Post-Acute Sequelae of COVID-19 (PASC), commonly known as Long COVID. There remains substantial uncertainty regarding both its duration and optimal management strategies. PASC manifests as a diverse array of persistent or newly emerging symptoms--ranging from fatigue, dyspnea, and neurologic impairme…
▽ More
While the acute phase of the COVID-19 pandemic has subsided, its long-term effects persist through Post-Acute Sequelae of COVID-19 (PASC), commonly known as Long COVID. There remains substantial uncertainty regarding both its duration and optimal management strategies. PASC manifests as a diverse array of persistent or newly emerging symptoms--ranging from fatigue, dyspnea, and neurologic impairments (e.g., brain fog), to cardiovascular, pulmonary, and musculoskeletal abnormalities--that extend beyond the acute infection phase. This heterogeneous presentation poses substantial challenges for clinical assessment, diagnosis, and treatment planning. In this paper, we focus on imaging findings that may suggest fibrotic damage in the lungs, a critical manifestation characterized by scarring of lung tissue, which can potentially affect long-term respiratory function in patients with PASC. This study introduces a novel multi-center chest CT analysis framework that combines deep learning and radiomics for fibrosis prediction. Our approach leverages convolutional neural networks (CNNs) and interpretable feature extraction, achieving 82.2% accuracy and 85.5% AUC in classification tasks. We demonstrate the effectiveness of Grad-CAM visualization and radiomics-based feature analysis in providing clinically relevant insights for PASC-related lung fibrosis prediction. Our findings highlight the potential of deep learning-driven computational methods for early detection and risk assessment of PASC-related lung fibrosis--presented for the first time in the literature.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Proving Cypher Query Equivalence
Authors:
Lei Tang,
Wensheng Dou,
Yingying Zheng,
Lijie Xu,
Wei Wang,
Jun Wei,
Tao Huang
Abstract:
Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important foundation for optimizing graph query performance, ensuring graph query reliability, etc. Although researchers have proposed many SQL query equivalence provers for relational database syste…
▽ More
Graph database systems store graph data as nodes and relationships, and utilize graph query languages (e.g., Cypher) for efficiently querying graph data. Proving the equivalence of graph queries is an important foundation for optimizing graph query performance, ensuring graph query reliability, etc. Although researchers have proposed many SQL query equivalence provers for relational database systems, these provers cannot be directly applied to prove the equivalence of graph queries. The difficulty lies in the fact that graph query languages (e.g., Cypher) adopt significantly different data models (property graph model vs. relational model) and query patterns (graph pattern matching vs. tabular tuple calculus) from SQL.
In this paper, we propose GraphQE, an automated prover to determine whether two Cypher queries are semantically equivalent. We design a U-semiring based Cypher algebraic representation to model the semantics of Cypher queries. Our Cypher algebraic representation is built on the algebraic structure of unbounded semirings, and can sufficiently express nodes and relationships in property graphs and complex Cypher queries. Then, determining the equivalence of two Cypher queries is transformed into determining the equivalence of the corresponding Cypher algebraic representations, which can be verified by SMT solvers. To evaluate the effectiveness of GraphQE, we construct a dataset consisting of 148 pairs of equivalent Cypher queries. Among them, we have successfully proven 138 pairs of equivalent Cypher queries, demonstrating the effectiveness of GraphQE.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
Authors:
Xin Li,
Yeying Jin,
Xin Jin,
Zongwei Wu,
Bingchen Li,
Yufei Wang,
Wenhan Yang,
Yu Li,
Zhibo Chen,
Bihan Wen,
Robby T. Tan,
Radu Timofte,
Qiyu Rong,
Hongyuan Jing,
Mengmeng Zhang,
Jinglong Li,
Xiangyu Lu,
Yi Ren,
Yuting Liu,
Meng Zhang,
Xiang Chen,
Qiyuan Guan,
Jiangxin Dong,
Jinshan Pan,
Conglin Gou
, et al. (112 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ…
▽ More
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/.
△ Less
Submitted 19 April, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
Space-averaged non-equilibrium Green's function approach for quantum transport in 3D
Authors:
Vahid Mosallanejad,
Kuei-Lin Chiu,
Wenjie Dou
Abstract:
The non-equilibrium Green's function (NEGF) approach offers a practical framework for simulating various phenomena in mesoscopic systems. As the dimension of electronic devices shrinks to just a few nanometers, the need for new effective-mass based 3D implementations of NEGF has become increasingly apparent. This work extends our previous Finite-Volume implementation -- originally developed for th…
▽ More
The non-equilibrium Green's function (NEGF) approach offers a practical framework for simulating various phenomena in mesoscopic systems. As the dimension of electronic devices shrinks to just a few nanometers, the need for new effective-mass based 3D implementations of NEGF has become increasingly apparent. This work extends our previous Finite-Volume implementation -- originally developed for the self-consistent solution of the Schrödinger and Poisson equations in 2D -- into a full 3D NEGF framework. Our implementation begins with exploring a few problems with the common textbook Finite Difference implementations of NEGF. We then concisely demonstrate how Finite-Volume discretization addresses few key implementation challenges. Importantly, we explain how this type of discretization enables evaluating the self-energies, which account for the effects of reservoirs. The potential applications of this new method are illustrated through two examples. We anticipate that this implementation will be broadly applicable to open quantum systems, especially in cases where a fully three-dimensional domain is essential.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Universal Structure of Computing Moments for Exact Quantum Dynamics: Application to Arbitrary System-Bath Couplings
Authors:
Rui-Hao Bi,
Wei Liu,
Wenjie Dou
Abstract:
We introduce a general procedure for computing higher-order moments of correlation functions in open quantum systems, extending the scope of our recent work on Memory Kernel Coupling Theory (MKCT) [W. Liu, Y. Su, Y. Wang, and W. Dou, arXiv:2407.01923 (2024)]. This approach is demonstrated for arbitrary system-bath coupling that can be expressed as polynomial,…
▽ More
We introduce a general procedure for computing higher-order moments of correlation functions in open quantum systems, extending the scope of our recent work on Memory Kernel Coupling Theory (MKCT) [W. Liu, Y. Su, Y. Wang, and W. Dou, arXiv:2407.01923 (2024)]. This approach is demonstrated for arbitrary system-bath coupling that can be expressed as polynomial, $H_{SB} = \hat{V} (α_0 + α_1 \hat{q} + α_2 \hat{q}^2+ \dots)$, where we show that the recursive commutators of a system operator obey a universal hierarchy. Exploiting this structure, the higher-order moments are obtained by evaluating the expectation values of the system and bath operators separately, with bath expectation values derived from the derivatives of a generating function. We further apply MKCT to compute the dipole autocorrelation function for the spin-boson model with both linear and quadratic coupling, achieving agreement with the hierarchical equations of motion approach. Our findings suggest a promising path toward accurate dynamics for complex open quantum systems.
△ Less
Submitted 13 May, 2025; v1 submitted 1 April, 2025;
originally announced April 2025.
-
HALHF: a hybrid, asymmetric, linear Higgs factory using plasma- and RF-based acceleration. Backup Document
Authors:
Erik Adli,
Joshua Appleby,
Timothy L. Barklow,
Marica Biagini,
Jonas Björklund Svensson,
Mikael Berggren,
Simone Bettoni,
Stewart Boogert,
Philip Burrows,
Allen Caldwell,
Jian Bin Ben Chen,
Vera Cilento,
Laura Corner,
Richard D'Arcy,
Steffen Doebert,
Wang Dou,
Pierre Drobniak,
Calvin Dyson,
Sinead Farrington,
John Farmer,
Angeles Faus-Golfe,
Manuel Formela,
Arianne Formenti,
Louis Forrester,
Brian Foster
, et al. (37 additional authors not shown)
Abstract:
This document expands on the Comprehensive Summary submitted to the EPPSU 2026. It contains details on aspects of the HALHF project that could not be fitted into the Summary. Some sections contain work that is still preliminary and/or status reports on current progress.
This document expands on the Comprehensive Summary submitted to the EPPSU 2026. It contains details on aspects of the HALHF project that could not be fitted into the Summary. Some sections contain work that is still preliminary and/or status reports on current progress.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Two-mode Floquet Redfield quantum master approach for quantum transport
Authors:
Vahid Mosallanejad,
Wenjie Dou
Abstract:
Simultaneous driving by two periodic oscillations yields a practical technique for further engineering quantum systems. For quantum transport through mesoscopic systems driven by two strong periodic terms, a non-perturbative Floquet-based quantum master equation (QME) approach is developed using a set of dissipative time-dependent terms and the reduced density matrix of the system. This work exten…
▽ More
Simultaneous driving by two periodic oscillations yields a practical technique for further engineering quantum systems. For quantum transport through mesoscopic systems driven by two strong periodic terms, a non-perturbative Floquet-based quantum master equation (QME) approach is developed using a set of dissipative time-dependent terms and the reduced density matrix of the system. This work extends our previous Floquet approach for transport through quantum dots (at finite temperature and arbitrary bias) driven periodically by a single frequency. In a pedagogical way, we derive explicit time-dependent dissipative terms. Our theory begins with the derivation of the two-mode Floquet Liouville-von Neumann equation. We then explain the second-order Wangsness-Bloch-Redfield QME with a slightly modified definition of the interaction picture. Subsequently, the two-mode Shirley time evolution formula is applied, allowing for the integration of reservoir dynamics. Consequently, the established formalism has a wide range of applications in open quantum systems driven by two modes in the weak coupling regime. The formalism's potential applications are demonstrated through various examples.
△ Less
Submitted 3 November, 2025; v1 submitted 25 March, 2025;
originally announced March 2025.
-
HALHF: a hybrid, asymmetric, linear Higgs factory using plasma- and RF-based acceleration
Authors:
Erik Adli,
Joshua Appleby,
Timothy L. Barklow,
Marica Biagini,
Jonas Björklund Svensson,
Mikael Berggren,
Simone Bettoni,
Stewart Boogert,
Philip Burrows,
Allen Caldwell,
Jian Bin Ben Chen,
Vera Cilento,
Laura Corner,
Richard D'Arcy,
Steffen Doebert,
Wang Dou,
Pierre Drobniak,
Calvin Dyson,
Sinead Farrington,
John Farmer,
Angeles Faus-Golfe,
Manuel Formela,
Arianne Formenti,
Louis Forrester,
Brian Foster
, et al. (37 additional authors not shown)
Abstract:
HALHF is a hybrid linear collider that uses electron-driven plasma-wakefield acceleration to accelerate electrons to high energy while using radio-frequency cavity technology to accelerate positrons. The most cost-effective solution collides low-energy positrons with high-energy electrons, producing a boost to the final state in the electron direction with $γ= 1.67$. The current HALHF baseline des…
▽ More
HALHF is a hybrid linear collider that uses electron-driven plasma-wakefield acceleration to accelerate electrons to high energy while using radio-frequency cavity technology to accelerate positrons. The most cost-effective solution collides low-energy positrons with high-energy electrons, producing a boost to the final state in the electron direction with $γ= 1.67$. The current HALHF baseline design produces a luminosity comparable to that of the baseline ILC but with a greatly reduced construction and carbon footprint and hence much lower cost than the mature linear-collider designs ILC and CLIC. Costs for HALHF are evaluated, together with that for the approximate 15-year R\&D programme necessary to realise HALHF. Time scales and cost for the R\&D are estimated. Upgrade paths for HALHF technology from a 250~GeV Higgs factory, through 380 and 550~GeV, up to 10~TeV are sketched.
△ Less
Submitted 30 March, 2025; v1 submitted 25 March, 2025;
originally announced March 2025.
-
Stochastic resolution of identity to CC2 for large systems: Oscillator strength and ground state gradient calculations
Authors:
Chongxiao Zhao,
Qi Ou,
Chenyang Li,
Wenjie Dou
Abstract:
An implementation of stochastic resolution of identity (sRI) approximation to CC2 oscillator strengths as well as ground state analytical gradients is presented. The essential 4-index electron repulsion integrals (ERIs) are contracted with a set of stochastic orbitals on the basis of the RI technique and the orbital energy differences in the denominators are decoupled with the Laplace transform. T…
▽ More
An implementation of stochastic resolution of identity (sRI) approximation to CC2 oscillator strengths as well as ground state analytical gradients is presented. The essential 4-index electron repulsion integrals (ERIs) are contracted with a set of stochastic orbitals on the basis of the RI technique and the orbital energy differences in the denominators are decoupled with the Laplace transform. These lead to a significant scaling reduction from O(N^5) to O(N^3) for oscillator strengths and gradients with the size of the basis set, N. The gradients need a large number of stochastic orbitals with O(N^3), so we provide an additional O(N^4) version with better accuracy and smaller prefactor by adopting sRI partially. Such steep computational acceleration of nearly two or one order of magnitude is very attractive for large systems. This work is an extension to our previous implementations of sRI-CC2 ground and excited state energies and shows the feasibility of introducing sRI to CC2 properties beyond energies.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Nonperturbative Open Quantum Dynamics Bypass Influence Functional
Authors:
Yu Su,
Yao Wang,
Wenjie Dou
Abstract:
An ordered moment approach to exact open quantum dynamics is presented, which bypasses the Feynman-Vernon influence functional formalism. The hierarchical equations of motion are constructed using Wick's contraction, which follows specific orderings of the bath's creation and annihilation operators. Our approach moves beyond the traditional influence functional formalism, offering a more intuitive…
▽ More
An ordered moment approach to exact open quantum dynamics is presented, which bypasses the Feynman-Vernon influence functional formalism. The hierarchical equations of motion are constructed using Wick's contraction, which follows specific orderings of the bath's creation and annihilation operators. Our approach moves beyond the traditional influence functional formalism, offering a more intuitive and direct framework, and extends the applicability of theory to nonlinear system--bath coupling scenarios.
△ Less
Submitted 28 April, 2025; v1 submitted 28 February, 2025;
originally announced March 2025.
-
CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge
Authors:
Yuwei Zhang,
Qingyuan Lu,
Kai Liu,
Wensheng Dou,
Jiaxin Zhu,
Li Qian,
Chunxi Zhang,
Zheng Lin,
Jun Wei
Abstract:
Unit testing plays a pivotal role in the software development lifecycle, as it ensures code quality. However, writing high-quality unit tests remains a time-consuming task for developers in practice. More recently, the application of large language models (LLMs) in automated unit test generation has demonstrated promising results. Existing approaches primarily focus on interpreted programming lang…
▽ More
Unit testing plays a pivotal role in the software development lifecycle, as it ensures code quality. However, writing high-quality unit tests remains a time-consuming task for developers in practice. More recently, the application of large language models (LLMs) in automated unit test generation has demonstrated promising results. Existing approaches primarily focus on interpreted programming languages (e.g., Java), while mature solutions tailored to compiled programming languages like C++ are yet to be explored. The intricate language features of C++, such as pointers, templates, and virtual functions, pose particular challenges for LLMs in generating both executable and high-coverage unit tests. To tackle the aforementioned problems, this paper introduces CITYWALK, a novel LLM-based framework for C++ unit test generation. CITYWALK enhances LLMs by providing a comprehensive understanding of the dependency relationships within the project under test via program analysis. Furthermore, CITYWALK incorporates language-specific knowledge about C++ derived from project documentation and empirical observations, significantly improving the correctness of the LLM-generated unit tests. We implement CITYWALK by employing the widely popular LLM GPT-4o. The experimental results show that CITYWALK outperforms current state-of-the-art approaches on a collection of ten popular C++ projects. Our findings demonstrate the effectiveness of CITYWALK in generating high-quality C++ unit tests.
△ Less
Submitted 11 August, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing
Authors:
Yuwei Zhang,
Zhi Jin,
Ying Xing,
Ge Li,
Fang Liu,
Jiaxin Zhu,
Wensheng Dou,
Jun Wei
Abstract:
Bug fixing holds significant importance in software development and maintenance. Recent research has made substantial strides in exploring the potential of large language models (LLMs) for automatically resolving software bugs. However, a noticeable gap in existing approaches lies in the oversight of collaborative facets intrinsic to bug resolution, treating the process as a single-stage endeavor.…
▽ More
Bug fixing holds significant importance in software development and maintenance. Recent research has made substantial strides in exploring the potential of large language models (LLMs) for automatically resolving software bugs. However, a noticeable gap in existing approaches lies in the oversight of collaborative facets intrinsic to bug resolution, treating the process as a single-stage endeavor. Moreover, most approaches solely take the buggy code snippet as input for LLMs during the patch generation stage. To mitigate the aforementioned limitations, we introduce a novel stage-wise framework named PATCH. Specifically, we first augment the buggy code snippet with corresponding dependence context and intent information to better guide LLMs in generating the correct candidate patches. Additionally, by taking inspiration from bug management practices, we decompose the bug-fixing task into four distinct stages: bug reporting, bug diagnosis, patch generation, and patch verification. These stages are performed interactively by LLMs, aiming to simulate the collaborative behavior of programmers during the resolution of software bugs. By harnessing these collective contributions, PATCH effectively enhances the bug-fixing capability of LLMs. We implement PATCH by employing the powerful dialogue-based LLM ChatGPT. Our evaluation on the widely used bug-fixing benchmark BFP demonstrates that PATCH has achieved better performance than state-of-the-art LLMs.
△ Less
Submitted 16 February, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
Manipulating nonadiabatic dynamics by plasmonic nanocavity
Authors:
Yu Wang,
Ruihao Bi,
Wenjie Dou
Abstract:
In recent years, plasmonic nanocavities have emerged as powerful tools for controlling and enhancing light-matter interactions at the nanoscale. This study explores the role of plasmonic nanocavities in manipulating nonadiabatic dynamics, particularly in systems where fast electronic transitions are crucial. By coupling molecular states to the plasmonic resonances of metallic nanocavities, we demo…
▽ More
In recent years, plasmonic nanocavities have emerged as powerful tools for controlling and enhancing light-matter interactions at the nanoscale. This study explores the role of plasmonic nanocavities in manipulating nonadiabatic dynamics, particularly in systems where fast electronic transitions are crucial. By coupling molecular states to the plasmonic resonances of metallic nanocavities, we demonstrate that the local electromagnetic fields generated by plasmons can significantly influence the rates and pathways of nonadiabatic transitions, including electron transfer and excitation relaxation processes. Using the Floquet quantum master equation (FQME) and Floquet surface hopping (FSH) methods that we previously developed, we find that plasmonic nanocavities can enhance nonadiabatic effects by tuning the plasmonic coupling strength, the molecule-metal interaction strength, and the material properties. These approaches offer a new perspective for predicting molecular dynamics in ultrafast processes. Our findings pave the way for designing novel plasmonic devices capable of controlling electron and energy transfer in chemical reactions, optoelectronic applications, and quantum information processing.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
Authors:
Zhaokai Wang,
Xizhou Zhu,
Xue Yang,
Gen Luo,
Hao Li,
Changyao Tian,
Wenhan Dou,
Junqi Ge,
Lewei Lu,
Yu Qiao,
Jifeng Dai
Abstract:
Image pyramids are widely adopted in top-performing methods to obtain multi-scale features for precise visual perception and understanding. However, current image pyramids use the same large-scale model to process multiple resolutions of images, leading to significant computational cost. To address this challenge, we propose a novel network architecture, called Parameter-Inverted Image Pyramid Net…
▽ More
Image pyramids are widely adopted in top-performing methods to obtain multi-scale features for precise visual perception and understanding. However, current image pyramids use the same large-scale model to process multiple resolutions of images, leading to significant computational cost. To address this challenge, we propose a novel network architecture, called Parameter-Inverted Image Pyramid Networks (PIIP). Specifically, PIIP uses pretrained models (ViTs or CNNs) as branches to process multi-scale images, where images of higher resolutions are processed by smaller network branches to balance computational cost and performance. To integrate information from different spatial scales, we further propose a novel cross-branch feature interaction mechanism. To validate PIIP, we apply it to various perception models and a representative multimodal large language model called LLaVA, and conduct extensive experiments on various tasks such as object detection, segmentation, image classification and multimodal understanding. PIIP achieves superior performance compared to single-branch and existing multi-resolution approaches with lower computational cost. When applied to InternViT-6B, a large-scale vision foundation model, PIIP can improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation, finally achieving 60.0 box AP on MS COCO and 59.7 mIoU on ADE20K. For multimodal understanding, our PIIP-LLaVA achieves 73.0% accuracy on TextVQA and 74.5% on MMBench with only 2.8M training data. Our code is released at https://github.com/OpenGVLab/PIIP.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
Orbital Surface Hopping from Orbital Quantum-Classical Liouville Equation for Nonadiabatic Dynamics of Many-electron Systems
Authors:
Yong-Tao Ma,
Rui-Hao Bi,
Wenjie Dou
Abstract:
Accurate simulation the many-electronic nonadiabatic dynamics process at metal surfaces remains as a significant task. In this work, we present an orbital surface hopping (OSH) algorithm rigorously derived from the orbital quantum classical Liouville equation (o-QCLE) to deal with nonadiabatic dynamics for many-electron systems. This OSH algorithm closely connects with the popular Independent Elec…
▽ More
Accurate simulation the many-electronic nonadiabatic dynamics process at metal surfaces remains as a significant task. In this work, we present an orbital surface hopping (OSH) algorithm rigorously derived from the orbital quantum classical Liouville equation (o-QCLE) to deal with nonadiabatic dynamics for many-electron systems. This OSH algorithm closely connects with the popular Independent Electron Surface Hopping (IESH) method, which has shown remarkable success in addressing these nonadiabatic phenomena, except that electrons hop between orbitals. We compare OSH with IESH approach and benchmark these two algorithms against the surface hopping method with a full Configuration Interaction (FCI) wavefunction. Our approach shows strong agreement with IESH and FCI-SH results for molecular orbital populations and kinetic energy relaxation and in high efficiency, demonstrating the ability of the new OSH method in capturing key aspects of many-electronic nonadiabatic dynamics.
△ Less
Submitted 26 December, 2024;
originally announced December 2024.
-
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Authors:
Hao Li,
Changyao Tian,
Jie Shao,
Xizhou Zhu,
Zhaokai Wang,
Jinguo Zhu,
Wenhan Dou,
Xiaogang Wang,
Hongsheng Li,
Lewei Lu,
Jifeng Dai
Abstract:
The remarkable success of Large Language Models (LLMs) has extended to the multimodal domain, achieving outstanding performance in image understanding and generation. Recent efforts to develop unified Multimodal Large Language Models (MLLMs) that integrate these capabilities have shown promising results. However, existing approaches often involve complex designs in model architecture or training p…
▽ More
The remarkable success of Large Language Models (LLMs) has extended to the multimodal domain, achieving outstanding performance in image understanding and generation. Recent efforts to develop unified Multimodal Large Language Models (MLLMs) that integrate these capabilities have shown promising results. However, existing approaches often involve complex designs in model architecture or training pipeline, increasing the difficulty of model training and scaling. In this paper, we propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation. To address challenges identified in existing encoder-free unified MLLMs, we introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding while reducing training complexity. After being trained on large-scale mixed image-text data with a unified next-token prediction objective, SynerGen-VL achieves or surpasses the performance of existing encoder-free unified MLLMs with comparable or smaller parameter sizes, and narrows the gap with task-specific state-of-the-art models, highlighting a promising path toward future unified MLLMs. Our code and models shall be released.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
DapperFL: Domain Adaptive Federated Learning with Model Fusion Pruning for Edge Devices
Authors:
Yongzhe Jia,
Xuyun Zhang,
Hongsheng Hu,
Kim-Kwang Raymond Choo,
Lianyong Qi,
Xiaolong Xu,
Amin Beheshti,
Wanchun Dou
Abstract:
Federated learning (FL) has emerged as a prominent machine learning paradigm in edge computing environments, enabling edge devices to collaboratively optimize a global model without sharing their private data. However, existing FL frameworks suffer from efficacy deterioration due to the system heterogeneity inherent in edge computing, especially in the presence of domain shifts across local data.…
▽ More
Federated learning (FL) has emerged as a prominent machine learning paradigm in edge computing environments, enabling edge devices to collaboratively optimize a global model without sharing their private data. However, existing FL frameworks suffer from efficacy deterioration due to the system heterogeneity inherent in edge computing, especially in the presence of domain shifts across local data. In this paper, we propose a heterogeneous FL framework DapperFL, to enhance model performance across multiple domains. In DapperFL, we introduce a dedicated Model Fusion Pruning (MFP) module to produce personalized compact local models for clients to address the system heterogeneity challenges. The MFP module prunes local models with fused knowledge obtained from both local and remaining domains, ensuring robustness to domain shifts. Additionally, we design a Domain Adaptive Regularization (DAR) module to further improve the overall performance of DapperFL. The DAR module employs regularization generated by the pruned model, aiming to learn robust representations across domains. Furthermore, we introduce a specific aggregation algorithm for aggregating heterogeneous local models with tailored architectures and weights. We implement DapperFL on a realworld FL platform with heterogeneous clients. Experimental results on benchmark datasets with multiple domains demonstrate that DapperFL outperforms several state-of-the-art FL frameworks by up to 2.28%, while significantly achieving model volume reductions ranging from 20% to 80%. Our code is available at: https://github.com/jyzgh/DapperFL.
△ Less
Submitted 8 December, 2024;
originally announced December 2024.