Sub-micromolar imaging of intrinsic chromophores by two-photon photothermal microscopy captures mitochondrial response to chemotherapy

Authors: Nathaniel Hai, Chinmayee Vallabh Prabhu Dessai, Dingcheng Sun, Jianpeng Ao, Pin-Tian Lyu, Yifan Zhu, Ji-Xin Cheng

Abstract: Intracellular chromophores (e.g., NADH and FAD) play a central role in regulation of cellular metabolism. Though autofluorescence has been extensively used for label-free mapping of chromophores inside a cell, its sensitivity and molecular specificity are constrained by the low quantum yield and the fluorescence spectral overlap. Here, we address these challenges by employing a photothermal approa… ▽ More Intracellular chromophores (e.g., NADH and FAD) play a central role in regulation of cellular metabolism. Though autofluorescence has been extensively used for label-free mapping of chromophores inside a cell, its sensitivity and molecular specificity are constrained by the low quantum yield and the fluorescence spectral overlap. Here, we address these challenges by employing a photothermal approach to measure the optical absorption of chromophores rather than its autofluorescence. By combining near-infrared pump and visible probe beams, our two-photon photothermal (2PPT) microscope exploits localized thermal transients generated through two-photon absorption, enabling detection of chromophore-specific signatures beyond the reach of autofluorescence. We demonstrate sub-micromolar limit of detection for the metabolic coenzymes NADH and FAD of 0.87 uM and 0.99 uM, respectively. Such high sensitivity enables differentiating the influence of different mitochondria shapes on metabolism activity. Importantly, the fluorescence crosstalk-free 2PPT can identify the biomolecular source of contrast from cellular mitochondria in a label-free manner based on spectroscopy. 2PPT microscopy is utilized to study metabolic alterations of mitochondria in cancer under chemotherapy at the single organelle level. △ Less

Submitted 15 April, 2026; originally announced April 2026.

arXiv:2604.12390 [pdf]

Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models

Authors: Lei Lin, Jizhao Zhu, Yong Liu, Donghong Sun, Hongbo He, Yihua Du

Abstract: This paper addresses two limitations of large language models (LLMs) in solving complex problems: (1) their reasoning processes exhibit Bayesian-like stochastic generation, where each token is sampled from a context-dependent probability distribution, leading to inherently random decision trajectories rather than deterministic planning; (2) the reasoning and decision-making mechanisms are statical… ▽ More This paper addresses two limitations of large language models (LLMs) in solving complex problems: (1) their reasoning processes exhibit Bayesian-like stochastic generation, where each token is sampled from a context-dependent probability distribution, leading to inherently random decision trajectories rather than deterministic planning; (2) the reasoning and decision-making mechanisms are statically decoupled, meaning dynamically retrieved domain knowledge fails to dynamically adjust the underlying reasoning strategy. These dual deficiencies result in initial decisions lacking strategic anchoring and reasoning chains often failing to converge on correct solutions, as stochastic generation lacks mechanisms for trajectory correction or knowledge-guided optimization during sequential reasoning. To resolve these issues, we propose a problem-solving method integrated into the LLM's generation process to guide reasoning. This method, compatible with numerous LLMs and featuring reusable solutions, is grounded in a novel Heuristic-Classification-of-Thoughts prompting schema (HCoT). HCoT synergizes the LLM's reasoning ability with a structured problem space via a heuristic classification model that controls the reasoning process and provides reusable abstract solutions. Evaluated on two complex inductive reasoning tasks with ill-defined search spaces, HCoT outperforms existing approaches (e.g., Tree-of-Thoughts and Chain-of-Thoughts prompting) in performance. On the well-structured 24 Game task, HCoT demonstrates significantly higher token efficiency compared to the state-of-the-art Tree-of-Thoughts-Breadth-First-Search. In terms of both accuracy and token usage, HCoT achieves a Pareto frontier balance, offering a strong trade-off between performance and computational cost. △ Less

Submitted 14 April, 2026; originally announced April 2026.

arXiv:2604.08810 [pdf, ps, other]

R2G: A Multi-View Circuit Graph Benchmark Suite from RTL to GDSII

Authors: Zewei Zhou, Jiajun Zou, Jiajia Zhang, Ao Yang, Ruichao He, Haozheng Zhou, Ao Liu, Jiawei Liu, Leilei Jin, Shan Shen, Daying Sun

Abstract: Graph neural networks (GNNs) are increasingly applied to physical design tasks such as congestion prediction and wirelength estimation, yet progress is hindered by inconsistent circuit representations and the absence of controlled evaluation protocols. We present R2G (RTL-to-GDSII), a multi-view circuit-graph benchmark suite that standardizes five stage-aware views with information parity (every v… ▽ More Graph neural networks (GNNs) are increasingly applied to physical design tasks such as congestion prediction and wirelength estimation, yet progress is hindered by inconsistent circuit representations and the absence of controlled evaluation protocols. We present R2G (RTL-to-GDSII), a multi-view circuit-graph benchmark suite that standardizes five stage-aware views with information parity (every view encodes the same attribute set, differing only in where features attach) over 30 open-source IP cores (up to $10^6$ nodes/edges). R2G provides an end-to-end DEF-to-graph pipeline spanning synthesis, placement, and routing stages, together with loaders, unified splits, domain metrics, and reproducible baselines. By decoupling representation choice from model choice, R2G isolates a confound that prior EDA and graph-ML benchmarks leave uncontrolled. In systematic studies with GINE, GAT, and ResGatedGCN, we find: (i) view choice dominates model choice, with Test R$^2$ varying by more than 0.3 across representations for a fixed GNN; (ii) node-centric views generalize best across both placement and routing; and (iii) decoder-head depth (3--4 layers) is the primary accuracy driver, turning divergent training into near-perfect predictions (R$^2$$>$0.99). Code and datasets are available at https://github.com/ShenShan123/R2G. △ Less

Submitted 9 April, 2026; originally announced April 2026.

Comments: Accepted as a poster by CVPR2026

arXiv:2604.05396 [pdf, ps, other]

Reason Analogically via Cross-domain Prior Knowledge: An Empirical Study of Cross-domain Knowledge Transfer for In-Context Learning

Authors: Le Liu, Zhiming Li, Jianzhi Yan, Zike Yuan, Shiwei Chen, Youcheng Pan, Buzhou Tang, Qingcai Chen, Yang Xiang, Danny Dongning Sun

Abstract: Despite its success, existing in-context learning (ICL) relies on in-domain expert demonstrations, limiting its applicability when expert annotations are scarce. We posit that different domains may share underlying reasoning structures, enabling source-domain demonstrations to improve target-domain inference despite semantic mismatch. To test this hypothesis, we conduct a comprehensive empirical s… ▽ More Despite its success, existing in-context learning (ICL) relies on in-domain expert demonstrations, limiting its applicability when expert annotations are scarce. We posit that different domains may share underlying reasoning structures, enabling source-domain demonstrations to improve target-domain inference despite semantic mismatch. To test this hypothesis, we conduct a comprehensive empirical study of different retrieval methods to validate the feasibility of achieving cross-domain knowledge transfer under the in-context learning setting. Our results demonstrate conditional positive transfer in cross-domain ICL. We identify a clear example absorption threshold: beyond it, positive transfer becomes more likely, and additional demonstrations yield larger gains. Further analysis suggests that these gains stem from reasoning structure repair by retrieved cross-domain examples, rather than semantic cues. Overall, our study validates the feasibility of leveraging cross-domain knowledge transfer to improve cross-domain ICL performance, motivating the community to explore designing more effective retrieval approaches for this novel direction.\footnote{Our implementation is available at https://github.com/littlelaska/ICL-TF4LR} △ Less

Submitted 6 April, 2026; originally announced April 2026.

arXiv:2604.05383 [pdf, ps, other]

Towards Effective In-context Cross-domain Knowledge Transfer via Domain-invariant-neurons-based Retrieval

Authors: Jianzhi Yan, Zhiming Li, Le Liu, Zike Yuan, Shiwei Chen, Youcheng Pan, Buzhou Tang, Yang Xiang, Danny Dongning Sun

Abstract: Large language models (LLMs) have made notable progress in logical reasoning, yet still fall short of human-level performance. Current boosting strategies rely on expert-crafted in-domain demonstrations, limiting their applicability in expertise-scarce domains, such as specialized mathematical reasoning, formal logic, or legal analysis. In this work, we demonstrate the feasibility of leveraging cr… ▽ More Large language models (LLMs) have made notable progress in logical reasoning, yet still fall short of human-level performance. Current boosting strategies rely on expert-crafted in-domain demonstrations, limiting their applicability in expertise-scarce domains, such as specialized mathematical reasoning, formal logic, or legal analysis. In this work, we demonstrate the feasibility of leveraging cross-domain demonstrating examples to boost the LLMs' reasoning performance. Despite substantial domain differences, many reusable implicit logical structures are shared across domains. In order to effectively retrieve cross-domain examples for unseen domains under investigation, in this work, we further propose an effective retrieval method, called domain-invariant neurons-based retrieval (\textbf{DIN-Retrieval}). Concisely, DIN-Retrieval first summarizes a hidden representation that is universal across different domains. Then, during the inference stage, we use the DIN vector to retrieve structurally compatible cross-domain demonstrations for the in-context learning. Experimental results in multiple settings for the transfer of mathematical and logical reasoning demonstrate that our method achieves an average improvement of 1.8 over the state-of-the-art methods \footnote{Our implementation is available at https://github.com/Leon221220/DIN-Retrieval}. △ Less

Submitted 6 April, 2026; originally announced April 2026.

Comments: ACL 2026 Findings

arXiv:2604.04388 [pdf]

Ultrafast Non-Volatile Weyl LuminoMem for Mid-Infrared In-Memory Computing

Authors: Delang Liang, Shiyu Wang, Yan Wang, Dong Li, Yuchun Chen, Bin Cheng, Mingyang Qin, Dehong Yang, Jie Sheng, Lin Li, Changgan Zeng, Dong Sun, Anlian Pan, Jing Liu

Abstract: Integrated optoelectronic systems strive to combine the logic/memory density of electronics with the bandwidth of photonics, but monolithic realization is impeded by the inefficient electronic-to-photonic interface. Current architectures rely on separate readout circuitry and modulators, creating bottlenecks in energy and latency, while existing direct transduction methods often compromise on swit… ▽ More Integrated optoelectronic systems strive to combine the logic/memory density of electronics with the bandwidth of photonics, but monolithic realization is impeded by the inefficient electronic-to-photonic interface. Current architectures rely on separate readout circuitry and modulators, creating bottlenecks in energy and latency, while existing direct transduction methods often compromise on switching speed or non-volatility. Here, we report an ultrafast, non-volatile optoelectronic memory, named LuminoMem, that integrates electrical storage and mid-infrared light emission in a single device. The device utilizes a floating-gate architecture, in which the Weyl semiconductor tellurium serves simultaneously as a charge-trapping storage layer and an emissive medium. This design enables nanosecond-scale electrical programming of non-volatile photoluminescence at 3.4 um, allowing direct optical access to stored states without external modulation. We demonstrate 4-bit (16-level) optical storage capacity and validate the device's performance through neural network simulations that achieve high accuracy on the Fashion-MNIST dataset. By effectively bridging the gap between electronic storage and mid-infrared photonics, the demonstrated mid-infrared LuminoMem provides a hardware foundation for promoting current computation efficiency and potential intelligent platforms that co-integrate computing, memory, and sensing capabilities. △ Less

Submitted 5 April, 2026; originally announced April 2026.

arXiv:2604.04335 [pdf, ps, other]

GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads

Authors: Fanjiang Ye, Zhangke Li, Xinrui Zhong, Ethan Ma, Russell Chen, Kaijian Wang, Jingwei Zuo, Desen Sun, Ye Cao, Triston Cao, Myungjin Lee, Arvind Krishnamurthy, Yuke Wang

Abstract: Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clusters while meeting stringent latency SLOs. Co-serving such heterogeneous workloads is challenging: T2I and T2V requests exhibit vastly different compute demands, parallelism characteristics, and laten… ▽ More Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clusters while meeting stringent latency SLOs. Co-serving such heterogeneous workloads is challenging: T2I and T2V requests exhibit vastly different compute demands, parallelism characteristics, and latency requirements, leading to significant SLO violations in existing serving systems. We present GENSERVE, a co-serving system that leverages the inherent predictability of the diffusion process to optimize serving efficiency. A central insight is that diffusion inference proceeds in discrete, predictable steps and is naturally preemptible at step boundaries, opening a new design space for heterogeneity-aware resource management. GENSERVE introduces step-level resource adaptation through three coordinated mechanisms: intelligent video preemption, elastic sequence parallelism with dynamic batching, and an SLO-aware scheduler that jointly optimizes resource allocation across all concurrent requests. Experimental results show that GENSERVE improves the SLO attainment rate by up to 44% over the strongest baseline across diverse configurations. △ Less

Submitted 8 April, 2026; v1 submitted 5 April, 2026; originally announced April 2026.

arXiv:2604.03929 [pdf]

doi 10.1002/advs.202519333

Direct Photocurrent Detection of Optical Vortex Based on the Orbital Photo Galvanic Effect: Progress, Challenge and Perspective

Authors: Jinluo Cheng, Dehong Yang, Weiming Wang, Chang Xu, Zipu Fan, Dong Sun

Abstract: A photodetector that can directly distinguish the orbital angular momentum (OAM) of light is highly desirable for integrated on-chip OAM detection and focal plane array devices. The recent development of OAM detectors based on the intrinsic orbital photo galvanic effects (OPGE) of materials provide a new route for direct OAM detection that is on-chip scalable with high resolution and speed. In thi… ▽ More A photodetector that can directly distinguish the orbital angular momentum (OAM) of light is highly desirable for integrated on-chip OAM detection and focal plane array devices. The recent development of OAM detectors based on the intrinsic orbital photo galvanic effects (OPGE) of materials provide a new route for direct OAM detection that is on-chip scalable with high resolution and speed. In this paper, we summarize the current progress in direct photodetection of OAM via OPGE. We begin with a short review of the basic operation scheme of the OAM detector and provide a comprehensive symmetry analysis to sort out the favorable characteristics of the materials, incorporating considerations from device schemes based on various device performance characteristics and specific application circumstances. From that, we review the current experimental progress and technical challenges, then oversee the possible solutions to these challenges and provide a perspective on the future opportunities of this OAM detection route. △ Less

Submitted 4 April, 2026; originally announced April 2026.

Comments: 28 pages, 5 figures, 3 tables; Accepted by Advanced Science

arXiv:2603.29966 [pdf, ps, other]

Scaling Video Pretraining for Surgical Foundation Models

Authors: Sicheng Lu, Zikai Xiao, Jianhui Wei, Danyu Sun, Qi Lu, Keli Hu, Yang Feng, Jian Wu, Zongxin Yang, Zuozhu Liu

Abstract: Surgical video understanding is essential for computer-assisted interventions, yet existing surgical foundation models remain constrained by limited data scale, procedural diversity, and inconsistent evaluation, often lacking a reproducible training pipeline. We propose SurgRec, a scalable and reproducible pretraining recipe for surgical video understanding, instantiated with two variants: SurgRec… ▽ More Surgical video understanding is essential for computer-assisted interventions, yet existing surgical foundation models remain constrained by limited data scale, procedural diversity, and inconsistent evaluation, often lacking a reproducible training pipeline. We propose SurgRec, a scalable and reproducible pretraining recipe for surgical video understanding, instantiated with two variants: SurgRec-MAE and SurgRec-JEPA. We curate a large multi-source corpus of 10,535 videos and 214.5M frames spanning endoscopy, laparoscopy, cataract, and robotic surgery. Building on this corpus, we develop a unified pretraining pipeline with balanced sampling and standardize a reproducible benchmark across 16 downstream datasets and four clinical domains with consistent data splits. Across extensive comparisons against SSL baselines and vision-language models, SurgRec consistently achieves superior performance across downstream datasets. In contrast, VLMs prove unreliable for fine-grained temporal recognition, exhibiting both performance gaps and sensitivity to prompt phrasing. Our work provides a reproducible, scalable foundation for the community to build more general surgical video models. All code, models, and data will be publicly released. △ Less

Submitted 2 April, 2026; v1 submitted 31 March, 2026; originally announced March 2026.

arXiv:2603.29563 [pdf]

Gate-Tunable Mid-Infrared Electroluminescence from Te/MoS2 p-n Heterojunctions

Authors: Shiyu Wang, Delang Liang, Zhi Zheng, Mingyang Qin, Yuchun Chen, Jie Sheng, Shula Chen, Lin Li, Changgan Zeng, Anlian Pan, Jinluo Cheng, Dong Sun

Abstract: Mid-infrared (MIR) emitters are critical components in advanced photonic systems, driving progress in fields such as chemical sensing, environmental monitoring, medical diagnostics, thermal imaging and free-space communications. Conventional MIR emitters based on III-V heterostructures rely on complex epitaxial growth on rigid lattice-matched substrates and suffer from limited integration compatib… ▽ More Mid-infrared (MIR) emitters are critical components in advanced photonic systems, driving progress in fields such as chemical sensing, environmental monitoring, medical diagnostics, thermal imaging and free-space communications. Conventional MIR emitters based on III-V heterostructures rely on complex epitaxial growth on rigid lattice-matched substrates and suffer from limited integration compatibility with CMOS or flexible platforms. The recent development of novel MIR emitters based on two-dimensional (2D) materials such as black phosphorus (BP) is more suitable for on-chip applications but faces challenges related to stability and emission efficiency. Based on the recently discovered highly efficient photoluminescence of Te, we demonstrate a gate-tunable midinfrared light-emitting diode based on a van der Waals heterojunction formed by multilayer transition metal dichalcogenide (TMD) MoS2 and tellurium (Te). The device emits polarized electroluminescence (EL) centered at 3.5 $μ$m under forward bias at 25 K, and the EL persists up to 80 K with reduced intensity. Gate control of the MoS2 Fermi level modulates the band alignment and injection efficiency, enabling dynamic tuning of the EL intensity. The emission remains spectrally stable under varying bias and gating, indicating robust band-edge recombination. These results establish the Te/TMD heterostructure as a promising platform for integrated polarized mid-infrared optoelectronics. △ Less

Submitted 31 March, 2026; originally announced March 2026.

arXiv:2603.24322 [pdf, ps, other]

Heuristic Self-Paced Learning for Domain Adaptive Semantic Segmentation under Adverse Conditions

Authors: Shiqin Wang, Haoyang Chen, Huaizhou Huang, Yinkan He, Dongfang Sun, Xiaoqing Chen, Xingyu Liu, Zheng Wang, Kaiyan Zhao

Abstract: The learning order of semantic classes significantly impacts unsupervised domain adaptation for semantic segmentation, especially under adverse weather conditions. Most existing curricula rely on handcrafted heuristics (e.g., fixed uncertainty metrics) and follow a static schedule, which fails to adapt to a model's evolving, high-dimensional training dynamics, leading to category bias. Inspired by… ▽ More The learning order of semantic classes significantly impacts unsupervised domain adaptation for semantic segmentation, especially under adverse weather conditions. Most existing curricula rely on handcrafted heuristics (e.g., fixed uncertainty metrics) and follow a static schedule, which fails to adapt to a model's evolving, high-dimensional training dynamics, leading to category bias. Inspired by Reinforcement Learning, we cast curriculum learning as a sequential decision problem and propose an autonomous class scheduler. This scheduler consists of two components: (i) a high-dimensional state encoder that maps the model's training status into a latent space and distills key features indicative of progress, and (ii) a category-fair policy-gradient objective that ensures balanced improvement across classes. Coupled with mixed source-target supervision, the learned class rankings direct the network's focus to the most informative classes at each stage, enabling more adaptive and dynamic learning. It is worth noting that our method achieves state-of-the-art performance on three widely used benchmarks (e.g., ACDC, Dark Zurich, and Nighttime Driving) and shows generalization ability in synthetic-to-real semantic segmentation. △ Less

Submitted 25 March, 2026; originally announced March 2026.

Comments: Accepted by CVPR 2026

arXiv:2603.21708 [pdf, ps, other]

Compensating Visual Insufficiency with Stratified Language Guidance for Long-Tail Class Incremental Learning

Authors: Xi Wang, Xu Yang, Donghao Sun, Cheng Deng

Abstract: Long-tail class incremental learning (LT CIL) remains highly challenging because the scarcity of samples in tail classes not only hampers their learning but also exacerbates catastrophic forgetting under continuously evolving and imbalanced data distributions. To tackle these issues, we exploit the informativeness and scalability of language knowledge. Specifically, we analyze the LT CIL data dist… ▽ More Long-tail class incremental learning (LT CIL) remains highly challenging because the scarcity of samples in tail classes not only hampers their learning but also exacerbates catastrophic forgetting under continuously evolving and imbalanced data distributions. To tackle these issues, we exploit the informativeness and scalability of language knowledge. Specifically, we analyze the LT CIL data distribution to guide large language models (LLMs) in generating a stratified language tree that hierarchically organizes semantic information from coarse to fine grained granularity. Building upon this structure, we introduce stratified adaptive language guidance, which leverages learnable weights to merge multi-scale semantic representations, thereby enabling dynamic supervisory adjustment for tail classes and alleviating the impact of data imbalance. Furthermore, we introduce stratified alignment language guidance, which exploits the structural stability of the language tree to constrain optimization and reinforce semantic visual alignment, thereby alleviating catastrophic forgetting. Extensive experiments on multiple benchmarks demonstrate that our method achieves state of the art performance. △ Less

Submitted 23 March, 2026; originally announced March 2026.

arXiv:2603.15325 [pdf]

Coupled Ferroelectricity and Phonon Chirality

Authors: Xiang-Bin Han, Cong Yang, Rui Sun, Xiaotong Zhang, Thuc Mai, Zhengze Xu, Aryan Jouneghaninaseri, Xiaoning Jiang, Rahul Rao, Yi Xia, Dali Sun, Jun Liu, Xiaotong Li

Abstract: The ability to control chirality and chiral phonons offers a route to manipulate the direction of spin and angular-momentum transport. In materials with rigid structural chirality, such as quartz, phonon chirality is fixed by the handedness and cannot be switched. By contrast, ferroelectric materials host a spontaneous polarization that can be reversibly switched by an external electric field. Whe… ▽ More The ability to control chirality and chiral phonons offers a route to manipulate the direction of spin and angular-momentum transport. In materials with rigid structural chirality, such as quartz, phonon chirality is fixed by the handedness and cannot be switched. By contrast, ferroelectric materials host a spontaneous polarization that can be reversibly switched by an external electric field. When chirality is coupled to this ferroelectric polarization, it enables electrical switching of crystal chirality and the associated phonon angular momentum, which is compatible with solid-state spintronic architectures, enabling control over chirality-dependent quantum states.1 Here, we report the experimental demonstration of the coupling between ferroelectricity and phonon chirality in the molecular ferroelectric triglycine sulfate. By electrically switching the crystal chirality, we achieve reversible and device-compatible control of phonon chirality, as revealed by in situ time-resolved magneto-optical Kerr effect measurements. The Kerr rotation reverses with electric-field switching, while phonon chirality vanishes in the paraelectric phase and is tunable in the racemic ferroelectric state. Furthermore, density functional theory calculations and circularly polarized Raman spectroscopy further corroborate the opposite circular phonon motions. These results establish an electrically addressable coupling pathway linking ferroelectricity, structural chirality, chiral phonons, and spin, opening a route toward chiral-phonon-enabled spin and phonon control technologies based on ferroelectric materials. △ Less

Submitted 13 March, 2026; originally announced March 2026.

arXiv:2603.15041 [pdf, ps, other]

Domain Walls Stabilized by Intrinsic Phonon Modes and Engineered Defects Enable Robust Ferroelectricity in HfO2

Authors: Chenxi Yu, Jiajia Zhang, Xujin Song, Dijiang Sun, Shangze Li, Fei Liu, Xiaoyan Liu, Wei Xi, Jinfeng Kang

Abstract: Ferroelectric $\mathrm{HfO}_2$ has attracted extensive research interest for its applications in AI era. The domain walls play a crucial role in phase structure stabilization and polarization switching of ferroelectric $\mathrm{HfO}_2$, however, a thorough understanding is still lacking. Here, we developed a unified framework based on phonon mode expansion to systematically study the effects of ph… ▽ More Ferroelectric $\mathrm{HfO}_2$ has attracted extensive research interest for its applications in AI era. The domain walls play a crucial role in phase structure stabilization and polarization switching of ferroelectric $\mathrm{HfO}_2$, however, a thorough understanding is still lacking. Here, we developed a unified framework based on phonon mode expansion to systematically study the effects of phonon modes and defects on domain wall structures. Using this approach combined with first-principle calculations, we revealed that the interface phonon modes play a key role in stability of domain walls; defects pin and stabilize ferroelectric domains, which in turn stabilizes the metastable orthorhombic phase and facilitates polarization switching. This provides an insight from the microscopic physics origin into the enhanced ferroelectricity in $\mathrm{HfO}_2$ by doping and defect engineering. Furthermore, the theoretically predicted domain structures and defect distributions were observed in La-doped $\mathrm{HfO}_2$ ferroelectric films by EELS and STEM experiments, which confirms the validity of our findings. △ Less

Submitted 16 March, 2026; originally announced March 2026.

Comments: 23 pages, 7 figures

arXiv:2603.13042 [pdf, ps, other]

OpenACMv2: An Accuracy-Constrained Co-Optimization Framework for Approximate DCiM

Authors: Yiqi Zhou, Yue Yuan, Yikai Wang, Bohao Liu, Qinxin Mei, Zhuohua Liu, Shan Shen, Wei Xing, Daying Sun, Li Li, Guozhu Liu

Abstract: Digital Compute-in-Memory (DCiM) accelerates neural networks by reducing data movement. Approximate DCiM can further improve power-performance-area (PPA), but demands accuracy-constrained co-optimization across coupled architecture and transistor-level choices. Building on OpenYield, we introduce Accuracy-Constrained Co-Optimization (ACCO) and present OpenACMv2, an open framework that operationali… ▽ More Digital Compute-in-Memory (DCiM) accelerates neural networks by reducing data movement. Approximate DCiM can further improve power-performance-area (PPA), but demands accuracy-constrained co-optimization across coupled architecture and transistor-level choices. Building on OpenYield, we introduce Accuracy-Constrained Co-Optimization (ACCO) and present OpenACMv2, an open framework that operationalizes ACCO via two-level optimization: (1) accuracy-constrained architecture search of compressor combinations and SRAM macro parameters, driven by a fast GNN-based surrogate for PPA and error; and (2) variation- and PVT-aware transistor sizing for standard cells and SRAM bitcells using Monte Carlo. By decoupling ACCO into architecture-level exploration and circuit-level sizing, OpenACMv2 integrates classic single- and multi-objective optimizers to deliver strong PPA-accuracy tradeoffs and robust convergence. The workflow is compatible with FreePDK45 and OpenROAD, supporting reproducible evaluation and easy adoption. Experiments demonstrate significant PPA improvements under controlled accuracy budgets, enabling rapid "what-if" exploration for approximate DCiM. The framework is available on https://github.com/ShenShan123/OpenACM. △ Less

Submitted 13 March, 2026; originally announced March 2026.

Comments: Accepted by DAC2026. Initial version

arXiv:2603.12812 [pdf, ps, other]

A Level Set Method with Secant Iterations for the Least-Squares Constrained Nuclear Norm Minimization

Authors: Chiyu Ma, Jiaming Ma, Defeng Sun

Abstract: We present an efficient algorithm for least-squares constrained nuclear norm minimization, a computationally challenging problem with broad applications. Our approach combines a level set method with secant iterations and a proximal generation method. As a key theoretical contribution, we establish the nonsingularity of the Clarke generalized Jacobian for a general class of projection norm functio… ▽ More We present an efficient algorithm for least-squares constrained nuclear norm minimization, a computationally challenging problem with broad applications. Our approach combines a level set method with secant iterations and a proximal generation method. As a key theoretical contribution, we establish the nonsingularity of the Clarke generalized Jacobian for a general class of projection norm functions over closed convex sets. This property and the (strong) semismoothness of our value function yield fast local convergence of the secant method. For the resulting nuclear norm regularized subproblems, we develop a proximal generation method that exploits low-rank structures without compromising convergence. Extensive numerical experiments demonstrate the superior performance of our approach compared to state-of-the-art methods. △ Less

Submitted 13 March, 2026; originally announced March 2026.

MSC Class: 90C25; 49J52; 90C31

arXiv:2603.10466 [pdf, ps, other]

UniPINN: A Unified PINN Framework for Multi-task Learning of Diverse Navier-Stokes Equations

Authors: Dengdi Sun, Jie Chen, Xiao Wang, Jin Tang

Abstract: Physics-Informed Neural Networks (PINNs) have shown promise in solving incompressible Navier-Stokes equations, yet existing approaches are predominantly designed for single-flow settings. When extended to multi-flow scenarios, these methods face three key challenges: (1) difficulty in simultaneously capturing both shared physical principles and flow-specific characteristics, (2) susceptibility to… ▽ More Physics-Informed Neural Networks (PINNs) have shown promise in solving incompressible Navier-Stokes equations, yet existing approaches are predominantly designed for single-flow settings. When extended to multi-flow scenarios, these methods face three key challenges: (1) difficulty in simultaneously capturing both shared physical principles and flow-specific characteristics, (2) susceptibility to inter-task negative transfer that degrades prediction accuracy, and (3) unstable training dynamics caused by disparate loss magnitudes across heterogeneous flow regimes. To address these limitations, we propose UniPINN, a unified multi-flow PINN framework that integrates three complementary components: a shared-specialized architecture that disentangles universal physical laws from flow-specific features, a cross-flow attention mechanism that selectively reinforces relevant patterns while suppressing task-irrelevant interference, and a dynamic weight allocation strategy that adaptively balances loss contributions to stabilize multi-objective optimization. Extensive experiments on three canonical flows demonstrate that UniPINN effectively unifies multi-flow learning, achieving superior prediction accuracy and balanced performance across heterogeneous regimes while successfully mitigating negative transfer. The source code of this paper will be released on https://github.com/Event-AHU/OpenFusion △ Less

Submitted 11 March, 2026; originally announced March 2026.

arXiv:2603.10000 [pdf, ps, other]

Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

Authors: Yuling Jiao, Yanming Lai, Huazhen Lin, Wensen Ma, Houduo Qi, Defeng Sun

Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, exhibiting emergent properties such as semantic prompt comprehension, In-Context Learning (ICL), and Chain-of-Thought (CoT) reasoning. Despite their empirical success, the theoretical mechanisms driving these phenomena remain poorly understood. This study dives into the foundations of these observations by… ▽ More Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, exhibiting emergent properties such as semantic prompt comprehension, In-Context Learning (ICL), and Chain-of-Thought (CoT) reasoning. Despite their empirical success, the theoretical mechanisms driving these phenomena remain poorly understood. This study dives into the foundations of these observations by addressing three critical questions: (1) How do LLMs accurately decode prompt semantics despite being trained solely on a next-token prediction objective? (2) Through what mechanism does ICL facilitate performance gains without explicit parameter updates? and (3) Why do intermediate reasoning steps in CoT prompting effectively unlock capabilities for complex, multi-step problems? Our results demonstrate that, through the autoregressive process, LLMs are capable of exactly inferring the transition probabilities between tokens across distinct tasks using provided prompts. We show that ICL enhances performance by reducing prompt ambiguity and facilitating posterior concentration on the intended task. Furthermore, we find that CoT prompting activates the model's capacity for task decomposition, breaking complex problems into a sequence of simpler sub-tasks that the model has mastered during the pretraining phase. By comparing their individual error bounds, we provide novel theoretical insights into the statistical superiority of advanced prompt engineering techniques. △ Less

Submitted 12 March, 2026; v1 submitted 16 February, 2026; originally announced March 2026.

arXiv:2603.07889 [pdf, ps, other]

Structure and Progress Aware Diffusion for Medical Image Segmentation

Authors: Siyuan Song, Guyue Hu, Chenglong Li, Dengdi Sun, Zhe Jin, Jin Tang

Abstract: Medical image segmentation is crucial for computer-aided diagnosis, which necessitates understanding both coarse morphological and semantic structures, as well as carving fine boundaries. The morphological and semantic structures in medical images are beneficial and stable clues for target understanding. While the fine boundaries of medical targets (like tumors and lesions) are usually ambiguous a… ▽ More Medical image segmentation is crucial for computer-aided diagnosis, which necessitates understanding both coarse morphological and semantic structures, as well as carving fine boundaries. The morphological and semantic structures in medical images are beneficial and stable clues for target understanding. While the fine boundaries of medical targets (like tumors and lesions) are usually ambiguous and noisy since lesion overlap, annotation uncertainty, and so on, making it not reliable to serve as early supervision. However, existing methods simultaneously learn coarse structures and fine boundaries throughout the training process. In this paper, we propose a structure and progress-aware diffusion (SPAD) for medical image segmentation, which consists of a semantic-concentrated diffusion (ScD) and a boundary-centralized diffusion (BcD) modulated by a progress-aware scheduler (PaS). Specifically, the semantic-concentrated diffusion introduces anchor-preserved target perturbation, which perturbs pixels within a medical target but preserves unaltered areas as semantic anchors, encouraging the model to infer noisy target areas from the surrounding semantic context. The boundary-centralized diffusion introduces progress-aware boundary noise, which blurs unreliable and ambiguous boundaries, thus compelling the model to focus on coarse but stable anatomical morphology and global semantics. Furthermore, the progress-aware scheduler gradually modulates noise intensity of the ScD and BcD forming a coarse-to-fine diffusion paradigm, which encourage focusing on coarse morphological and semantic structures during early target understanding stages and gradually shifting to fine target boundaries during later contour adjusting stages. △ Less

Submitted 8 March, 2026; originally announced March 2026.

arXiv:2603.07815 [pdf, ps, other]

HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

Authors: Desen Sun, Jason Hon, Jintao Zhang, Sihang Liu

Abstract: Diffusion models have demonstrated a remarkable ability in Text-to-Image (T2I) generation applications. Despite the advanced generation output, they suffer from heavy computation overhead, especially for large models that contain tens of billions of parameters. Prior work has illustrated that replacing part of the denoising steps with a smaller model still maintains the generation quality. However… ▽ More Diffusion models have demonstrated a remarkable ability in Text-to-Image (T2I) generation applications. Despite the advanced generation output, they suffer from heavy computation overhead, especially for large models that contain tens of billions of parameters. Prior work has illustrated that replacing part of the denoising steps with a smaller model still maintains the generation quality. However, these methods only focus on saving computation for some timesteps, ignoring the difference in compute demand within one timestep. In this work, we propose HybridStitch, a new T2I generation paradigm that treats generation like editing. Specifically, we introduce a hybrid stage that jointly incorporates both the large model and the small model. HybridStitch separates the entire image into two regions: one that is relatively easy to render, enabling an early transition to the smaller model, and another that is more complex and therefore requires refinement by the large model. HybridStitch employs the small model to construct a coarse sketch while exploiting the large model to edit and refine the complex regions. According to our evaluation, HybridStitch achieves 1.83$\times$ speedup on Stable Diffusion 3, which is faster than all existing mixture of model methods. △ Less

Submitted 8 March, 2026; originally announced March 2026.

arXiv:2603.05785 [pdf]

Mid-wave infrared photothermal microscopy for molecular and metabolic imaging in deep tissues and spheroids

Authors: Mingsheng Li, Yuhao Yuan, Guangrui Ding, Hongli Ni, Biwen Gao, Dashan Dong, Qinshu He, Hongjian He, Xinyan Teng, Yuwei Sun, Dingcheng Sun, Qing Xia, Thao Pham, Ji-Xin Cheng

Abstract: High-resolution chemical imaging within deep tissues and intact spheroids remains a grand challenge. Here, we introduce mid-wave infrared photothermal (MWIP) microscopy operating in the underexplored 2000-2500 nm spectral window for submicron-resolution molecular and metabolic imaging in intact tumor spheroids and deep tissues. A dark-field photothermal detection scheme significantly suppresses wa… ▽ More High-resolution chemical imaging within deep tissues and intact spheroids remains a grand challenge. Here, we introduce mid-wave infrared photothermal (MWIP) microscopy operating in the underexplored 2000-2500 nm spectral window for submicron-resolution molecular and metabolic imaging in intact tumor spheroids and deep tissues. A dark-field photothermal detection scheme significantly suppresses water background and enhances contrast. By accessing strong carbon-hydrogen combination absorptions, a detection limit of 0.12% for dimethyl sulfoxide is achieved, comparable to stimulated Raman scattering microscopy. Depth-resolved imaging of endogenous biomolecules up to 500 micrometers in excised mouse skin and brain tissues is demonstrated. MWIP further enables depth-resolved tracking of transdermal drug transport via carbon-deuterium overtone absorption. Using deuterium metabolic probes, fatty-acid metabolism is imaged at 200 micrometers deep within intact tumor spheroids through carbon-deuterium overtone and combination bands. Collectively, MWIP offers a platform for functional imaging of 3D biological systems in their native environments. △ Less

Submitted 5 March, 2026; originally announced March 2026.

arXiv:2603.04743 [pdf, ps, other]

DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

Authors: Maojun Sun, Yue Wu, Yifei Xie, Ruijian Han, Binyan Jiang, Defeng Sun, Yancheng Yuan, Jian Huang

Abstract: Large Language Model (LLM) agents can automate data-science workflows, but many rigorous statistical methods implemented in R remain underused because LLMs struggle with statistical knowledge and tool retrieval. Existing retrieval-augmented approaches focus on function-level semantics and ignore data distribution, producing suboptimal matches. We propose DARE (Distribution-Aware Retrieval Embeddin… ▽ More Large Language Model (LLM) agents can automate data-science workflows, but many rigorous statistical methods implemented in R remain underused because LLMs struggle with statistical knowledge and tool retrieval. Existing retrieval-augmented approaches focus on function-level semantics and ignore data distribution, producing suboptimal matches. We propose DARE (Distribution-Aware Retrieval Embedding), a lightweight, plug-and-play retrieval model that incorporates data distribution information into function representations for R package retrieval. Our main contributions are: (i) RPKB, a curated R Package Knowledge Base derived from 8,191 high-quality CRAN packages; (ii) DARE, an embedding model that fuses distributional features with function metadata to improve retrieval relevance; and (iii) RCodingAgent, an R-oriented LLM agent for reliable R code generation and a suite of statistical analysis tasks for systematically evaluating LLM agents in realistic analytical scenarios. Empirically, DARE achieves an NDCG at 10 of 93.47%, outperforming state-of-the-art open-source embedding models by up to 17% on package retrieval while using substantially fewer parameters. Integrating DARE into RCodingAgent yields significant gains on downstream analysis tasks. This work helps narrow the gap between LLM automation and the mature R statistical ecosystem. △ Less

Submitted 4 March, 2026; originally announced March 2026.

Comments: 24 pages,7 figures, 3 tables

arXiv:2603.04406 [pdf, ps, other]

CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Authors: Zhehao Tan, Yihan Jiao, Dan Yang, Junjie Wang, Duolin Sun, Jie Feng, Xidong Wang, Lei Liu, Yue Shen, Jian Wang, Jinjie Gu

Abstract: With the growing use of Retrieval-Augmented Generation (RAG), training large language models (LLMs) for context-sensitive reasoning and faithfulness is increasingly important. Existing RAG-oriented reinforcement learning (RL) methods rely on external rewards that often fail to evaluate document faithfulness, and may misjudge similar answers in open-domain settings. In addition, there is no RAG-bas… ▽ More With the growing use of Retrieval-Augmented Generation (RAG), training large language models (LLMs) for context-sensitive reasoning and faithfulness is increasingly important. Existing RAG-oriented reinforcement learning (RL) methods rely on external rewards that often fail to evaluate document faithfulness, and may misjudge similar answers in open-domain settings. In addition, there is no RAG-based selfreward mechanism. Moreover, although such a mechanism could in principle estimate answer confidence given documents, the absence of objective feedback in a self-judgment can cause hallucination accumulation and eventual model collapse. To tackle these issues, we propose a novel "internal-external" hybrid reward framework centered on a Contrastive Likelihood Reward (CLR). CLR directly optimizes the log-likelihood gap between responses conditioned on prompts with and without supporting evidence. This encourages the model to extract relevant evidence and increases its confidence when grounded in a specific context. Experiments show that our method (used alone or combined with external correctness rewards) achieves strong performance on singlehop, multi-hop, vertical-domain, and faithfulness benchmarks. Our training code and models are coming soon. △ Less

Submitted 2 February, 2026; originally announced March 2026.

arXiv:2603.03269 [pdf, ps, other]

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

Authors: Junyi Zhang, Charles Herrmann, Junhwa Hur, Chen Sun, Ming-Hsuan Yang, Forrester Cole, Trevor Darrell, Deqing Sun

Abstract: Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization. LoGeR… ▽ More Feedforward geometric foundation models achieve strong short-window reconstruction, yet scaling them to minutes-long videos is bottlenecked by quadratic attention complexity or limited effective memory in recurrent designs. We present LoGeR (Long-context Geometric Reconstruction), a novel architecture that scales dense 3D reconstruction to extremely long sequences without post-optimization. LoGeR processes video streams in chunks, leveraging strong bidirectional priors for high-fidelity intra-chunk reasoning. To manage the critical challenge of coherence across chunk boundaries, we propose a learning-based hybrid memory module. This dual-component system combines a parametric Test-Time Training (TTT) memory to anchor the global coordinate frame and prevent scale drift, alongside a non-parametric Sliding Window Attention (SWA) mechanism to preserve uncompressed context for high-precision adjacent alignment. Remarkably, this memory architecture enables LoGeR to be trained on sequences of 128 frames, and generalize up to thousands of frames during inference. Evaluated across standard benchmarks and a newly repurposed VBR dataset with sequences of up to 19k frames, LoGeR substantially outperforms prior state-of-the-art feedforward methods--reducing ATE on KITTI by over 74%--and achieves robust, globally consistent reconstruction over unprecedented horizons. △ Less

Submitted 3 March, 2026; originally announced March 2026.

Comments: Project page: https://LoGeR-project.github.io/

arXiv:2603.02999 [pdf, ps, other]

OneRanker: Unified Generation and Ranking with One Model in Industrial Advertising Recommendation

Authors: Dekai Sun, Yiming Liu, Jiafan Zhou, Xun Liu, Chenchen Yu, Yi Li, Jun Zhang, Huan Yu, Jie Jiang

Abstract: The end-to-end generative paradigm is revolutionizing advertising recommendation systems, driving a shift from traditional cascaded architectures towards unified modeling. However, practical deployment faces three core challenges: the misalignment between interest objectives and business value, the target-agnostic limitation of generative processes, and the disconnection between generation and ran… ▽ More The end-to-end generative paradigm is revolutionizing advertising recommendation systems, driving a shift from traditional cascaded architectures towards unified modeling. However, practical deployment faces three core challenges: the misalignment between interest objectives and business value, the target-agnostic limitation of generative processes, and the disconnection between generation and ranking stages. Existing solutions often fall into a dilemma where single-stage fusion induces optimization tension, while stage decoupling causes irreversible information loss. To address this, we propose OneRanker, achieving architectural-level deep integration of generation and ranking. First, we design a value-aware multi-task decoupling architecture. By leveraging task token sequences and causal mask, we separate interest coverage and value optimization spaces within shared representations, effectively alleviating target conflicts. Second, we construct a coarse-to-fine collaborative target awareness mechanism, utilizing Fake Item Tokens for implicit awareness during generation and a ranking decoder for explicit value alignment at the candidate level. Finally, we propose input-output dual-side consistency guarantees. Through Key/Value pass-through mechanisms and Distribution Consistency (DC) Constraint Loss, we achieve end-to-end collaborative optimization between generation and ranking. The full deployment on Tencent's WeiXin channels advertising system has shown a significant improvement in key business metrics (GMV - Normal +1.34\%), providing a new paradigm with industrial feasibility for generative advertising recommendations. △ Less

Submitted 12 March, 2026; v1 submitted 3 March, 2026; originally announced March 2026.

arXiv:2603.02415 [pdf, ps, other]

Modulating Surface Acoustic Wave Generation through Superconductivity

Authors: Andrew Christy, Yuzan Xiong, Rui Sun, Yi Li, Kenneth O. Chua, Andrew H. Comstock, Junming Wu, Sidong Lei, Frank Tsui, Megan N. Jackson, Dali Sun, Valentine Novosad, James F. Cahoon, Wei Zhang

Abstract: Surface acoustic waves (SAWs), with their five orders-of-magnitude slower propagation velocity, allow for considerably shorter wavelengths at the same frequency compared to electromagnetic waves. The short wavelengths allow for device miniaturization and on-chip integration. The generic design of these devices involve piezoelectric substrates with comblike arrays of Al or Au electrodes known as in… ▽ More Surface acoustic waves (SAWs), with their five orders-of-magnitude slower propagation velocity, allow for considerably shorter wavelengths at the same frequency compared to electromagnetic waves. The short wavelengths allow for device miniaturization and on-chip integration. The generic design of these devices involve piezoelectric substrates with comblike arrays of Al or Au electrodes known as interdigitated transducers deposited on the surface. However, Al and Au both have shortcomings at the cryogenic temperatures required for quantum applications, namely the formation of two-level systems and the lack of superconductivity perpetuating Ohmic losses, respectively. In this work, SAWs are generated in the high-MHz to low-GHz range using niobium nitride (NbN) interdigitated transducers (IDTs) and Bragg reflectors. We demonstrate the fabrication of acoustic devices through photolithography and reactive ion etching (RIE). The sharp transition between superconducting and normal states and the corresponding change in SAW transmission allows for fine control of the 'on' (superconducting) and 'off' (normal) states of NbN, with a Δ_T = K separating the transmission minimum and maximum. We demonstrate a 16x difference in transmission between the 'on' and 'off' states of the device. The SAW transmission behavior mirrors the change in resistance of NbN at its Tc. These findings open up new possibilities for the integration of NbN SAW resonators into existing quantum architectures based on NbN and a method for adjusting transmission properties independent of applied voltage. △ Less

Submitted 2 March, 2026; originally announced March 2026.

Comments: 7 pages, 4 figures

arXiv:2603.01357 [pdf, ps, other]

ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context

Authors: Zidi Xiu, David Q. Sun, Kevin Cheng, Maitrik Patel, Josh Date, Yizhe Zhang, Jiarui Lu, Omar Attia, Raviteja Vemulapalli, Oncel Tuzel, Meng Cao, Samy Bengio

Abstract: Next-generation AI must manage vast personal data, diverse tools, and multi-step reasoning, yet most benchmarks remain context-free and single-turn. We present ASTRA-bench (Assistant Skills in Tool-use, Reasoning \& Action-planning), a benchmark that uniquely unifies time-evolving personal context with an interactive toolbox and complex user intents. Our event-driven pipeline generates 2,413 scena… ▽ More Next-generation AI must manage vast personal data, diverse tools, and multi-step reasoning, yet most benchmarks remain context-free and single-turn. We present ASTRA-bench (Assistant Skills in Tool-use, Reasoning \& Action-planning), a benchmark that uniquely unifies time-evolving personal context with an interactive toolbox and complex user intents. Our event-driven pipeline generates 2,413 scenarios across four protagonists, grounded in longitudinal life events and annotated by referential, functional, and informational complexity. Evaluation of state-of-the-art models (e.g., Claude-4.5-Opus, DeepSeek-V3.2) reveals significant performance degradation under high-complexity conditions, with argument generation emerging as the primary bottleneck. These findings expose critical limitations in current agents' ability to ground reasoning within messy personal context and orchestrate reliable multi-step plans. We release ASTRA-bench with a full execution environment and evaluation scripts to provide a diagnostic testbed for developing truly context-aware AI assistants. △ Less

Submitted 1 March, 2026; originally announced March 2026.

arXiv:2602.24290 [pdf, ps, other]

UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images

Authors: Junhwa Hur, Charles Herrmann, Songyou Peng, Philipp Henzler, Zeyu Ma, Todd Zickler, Deqing Sun

Abstract: Dense 4D reconstruction from unposed images remains a critical challenge, with current methods relying on slow test-time optimization or fragmented, task-specific feedforward models. We introduce UFO-4D, a unified feedforward framework to reconstruct a dense, explicit 4D representation from just a pair of unposed images. UFO-4D directly estimates dynamic 3D Gaussian Splats, enabling the joint and… ▽ More Dense 4D reconstruction from unposed images remains a critical challenge, with current methods relying on slow test-time optimization or fragmented, task-specific feedforward models. We introduce UFO-4D, a unified feedforward framework to reconstruct a dense, explicit 4D representation from just a pair of unposed images. UFO-4D directly estimates dynamic 3D Gaussian Splats, enabling the joint and consistent estimation of 3D geometry, 3D motion, and camera pose in a feedforward manner. Our core insight is that differentiably rendering multiple signals from a single Dynamic 3D Gaussian representation offers major training advantages. This approach enables a self-supervised image synthesis loss while tightly coupling appearance, depth, and motion. Since all modalities share the same geometric primitives, supervising one inherently regularizes and improves the others. This synergy overcomes data scarcity, allowing UFO-4D to outperform prior work by up to 3 times in joint geometry, motion, and camera pose estimation. Our representation also enables high-fidelity 4D interpolation across novel views and time. Please visit our project page for visual results: https://ufo-4d.github.io/ △ Less

Submitted 5 March, 2026; v1 submitted 27 February, 2026; originally announced February 2026.

Comments: ICLR 2026, Project page: https://ufo-4d.github.io/

arXiv:2602.22059 [pdf, ps, other]

NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training

Authors: Dengdi Sun, Xiaoya Zhou, Xiao Wang, Hao Si, Wanli Lyu, Jin Tang, Bin Luo

Abstract: Neural operators have emerged as an efficient paradigm for solving PDEs, overcoming the limitations of traditional numerical methods and significantly improving computational efficiency. However, due to the diversity and complexity of PDE systems, existing neural operators typically rely on a single network architecture, which limits their capacity to fully capture heterogeneous features and compl… ▽ More Neural operators have emerged as an efficient paradigm for solving PDEs, overcoming the limitations of traditional numerical methods and significantly improving computational efficiency. However, due to the diversity and complexity of PDE systems, existing neural operators typically rely on a single network architecture, which limits their capacity to fully capture heterogeneous features and complex system dependencies. This constraint poses a bottleneck for large-scale PDE pre-training based on neural operators. To address these challenges, we propose a large-scale PDE pre-trained neural operator based on a nested Mixture-of-Experts (MoE) framework. In particular, the image-level MoE is designed to capture global dependencies, while the token-level Sub-MoE focuses on local dependencies. Our model can selectively activate the most suitable expert networks for a given input, thereby enhancing generalization and transferability. We conduct large-scale pre-training on twelve PDE datasets from diverse sources and successfully transfer the model to downstream tasks. Extensive experiments demonstrate the effectiveness of our approach. △ Less

Submitted 25 February, 2026; originally announced February 2026.

Comments: Accepted by CVPR 2026

arXiv:2602.20555 [pdf, ps, other]

Standard Transformers Achieve the Minimax Rate in Nonparametric Regression with $C^{s,λ}$ Targets

Authors: Yanming Lai, Defeng Sun

Abstract: The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can approximate Hölder functions $ C^{s,λ}\left([0,1]^{d\times n}\right) $$ (s\in\mathbb{N}_{\geq0},0<λ\leq1) $ under the $L^t$ distance (… ▽ More The tremendous success of Transformer models in fields such as large language models and computer vision necessitates a rigorous theoretical investigation. To the best of our knowledge, this paper is the first work proving that standard Transformers can approximate Hölder functions $ C^{s,λ}\left([0,1]^{d\times n}\right) $$ (s\in\mathbb{N}_{\geq0},0<λ\leq1) $ under the $L^t$ distance ($t \in [1, \infty]$) with arbitrary precision. Building upon this approximation result, we demonstrate that standard Transformers achieve the minimax optimal rate in nonparametric regression for Hölder target functions. It is worth mentioning that, by introducing two metrics: the size tuple and the dimension vector, we provide a fine-grained characterization of Transformer structures, which facilitates future research on the generalization and optimization errors of Transformers with different structures. As intermediate results, we also derive the upper bounds for the Lipschitz constant of standard Transformers and their memorization capacity, which may be of independent interest. These findings provide theoretical justification for the powerful capabilities of Transformer models. △ Less

Submitted 24 February, 2026; originally announced February 2026.

Comments: 58 pages, 1 figure

arXiv:2602.16992 [pdf, ps, other]

Modeling Multivariate Missingness with Tree Graphs and Conjugate Odds

Authors: Daniel Suen, Yen-Chi Chen

Abstract: In this paper, we analyze a specific class of missing not at random (MNAR) assumptions called tree graphs, extending upon the work of pattern graphs. We build off previous work by introducing the idea of a conjugate odds family in which certain parametric models on the selection odds can preserve the data distribution family across all missing data patterns. Under a conjugate odds family and a tre… ▽ More In this paper, we analyze a specific class of missing not at random (MNAR) assumptions called tree graphs, extending upon the work of pattern graphs. We build off previous work by introducing the idea of a conjugate odds family in which certain parametric models on the selection odds can preserve the data distribution family across all missing data patterns. Under a conjugate odds family and a tree graph assumption, we are able to model the full data distribution elegantly in the sense that for the observed data, we obtain a model that is conjugate from the complete-data, and for the missing entries, we create a simple imputation model. In addition, we investigate the problem of graph selection, sensitivity analysis, and statistical inference. Using both simulations and real data, we illustrate the applicability of our method. △ Less

Submitted 18 February, 2026; originally announced February 2026.

Comments: 82 pages, 15 figures

arXiv:2602.14670 [pdf, ps, other]

FactorMiner: A Self-Evolving Agent with Skills and Experience Memory for Financial Alpha Discovery

Authors: Yanlong Wang, Jian Xu, Hongkang Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang

Abstract: Formulaic alpha factor mining is a critical yet challenging task in quantitative investment, characterized by a vast search space and the need for domain-informed, interpretable signals. However, finding novel signals becomes increasingly difficult as the library grows due to high redundancy. We propose FactorMiner, a lightweight and flexible self-evolving agent framework designed to navigate this… ▽ More Formulaic alpha factor mining is a critical yet challenging task in quantitative investment, characterized by a vast search space and the need for domain-informed, interpretable signals. However, finding novel signals becomes increasingly difficult as the library grows due to high redundancy. We propose FactorMiner, a lightweight and flexible self-evolving agent framework designed to navigate this complex landscape through continuous knowledge accumulation. FactorMiner combines a Modular Skill Architecture that encapsulates systematic financial evaluation into executable tools with a structured Experience Memory that distills historical mining trials into actionable insights (successful patterns and failure constraints). By instantiating the Ralph Loop paradigm -- retrieve, generate, evaluate, and distill -- FactorMiner iteratively uses memory priors to guide exploration, reducing redundant search while focusing on promising directions. Experiments on multiple datasets across different assets and Markets show that FactorMiner constructs a diverse library of high-quality factors with competitive performance, while maintaining low redundancy among factors as the library scales. Overall, FactorMiner provides a practical approach to scalable discovery of interpretable formulaic alpha factors under the "Correlation Red Sea" constraint. △ Less

Submitted 16 February, 2026; originally announced February 2026.

arXiv:2602.13411 [pdf, ps, other]

LHAASO observation of Mrk 421 during 2021 March - 2024 March: a comprehensive VHE catalog of multi-timescale outbursts and its time average behavior

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, J. Blunier, A. V. Bukevich, C. M. Cai, Y. Y. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, E. S. Chen, G. H. Chen, H. K. Chen, L. F. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen , et al. (303 additional authors not shown)

Abstract: The Large High Altitude Air Shower Observatory (LHAASO) monitors sources within its field of view for up to 7 hours daily, achieving a duty cycle exceeding 98% and an annual point-source sensitivity of 1.5% Crab Units (CU) in the very high energy (VHE) band. This unbiased sky-survey mode facilitates systematic monitoring and investigation of outburst phenomena. In this paper, we present results fr… ▽ More The Large High Altitude Air Shower Observatory (LHAASO) monitors sources within its field of view for up to 7 hours daily, achieving a duty cycle exceeding 98% and an annual point-source sensitivity of 1.5% Crab Units (CU) in the very high energy (VHE) band. This unbiased sky-survey mode facilitates systematic monitoring and investigation of outburst phenomena. In this paper, we present results from an unprecedented three-year monitoring campaign (March 2021--March 2024) of Mrk421 using LHAASO, spanning energies from 0.4 TeV to 20 TeV. We find that the blazar stayed in a quiescent state in 2021 and became active starting in 2022 with a total of 23 VHE outburst events identified, where the highest observed daily significance reaches $20\,σ$ with a flux equivalent to approximately 3.3~CU. LHAASO's continuous monitoring suggests the flaring occupancy of Mrk~421 to be around 14%. During long-term monitoring, multiwavelength (MWL) variability and correlation analyses are conducted using complementary data from Fermi-LAT, MAXI-GSC, Swift-XRT, and ZTF. A significant correlation ($>3\,σ$) is observed between X-ray and VHE bands with no detectable time lag, while the correlation between GeV and TeV bands is weaker. The flux distribution of the TeV emission during the quiescent state is different from that in the active state, implying the existence of two modes of energy dissipation in the blazar jet. Using simultaneous MWL data, we also analyzed both the long-term and outburst-period SEDs, and discussed the possible origin of the outburst events. △ Less

Submitted 13 February, 2026; originally announced February 2026.

Comments: 34 pages, 20 figures

arXiv:2602.12852 [pdf, ps, other]

WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning

Authors: Junjie Wang, Zequn Xie, Dan Yang, Jie Feng, Yue Shen, Duolin Sun, Meixiu Long, Yihan Jiao, Zhehao Tan, Jian Wang, Peng Wei, Jinjie Gu

Abstract: Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories with cyclic reasoning loops and exploration of unproductive branches. To address this, we propose WebClipper, a framework that compresse… ▽ More Deep Research systems based on web agents have shown strong potential in solving complex information-seeking tasks, yet their search efficiency remains underexplored. We observe that many state-of-the-art open-source web agents rely on long tool-call trajectories with cyclic reasoning loops and exploration of unproductive branches. To address this, we propose WebClipper, a framework that compresses web agent trajectories via graph-based pruning. Concretely, we model the agent's search process as a state graph and cast trajectory optimization as a minimum-necessary Directed Acyclic Graph (DAG) mining problem, yielding pruned trajectories that preserve essential reasoning while eliminating redundant steps. Continued training on these refined trajectories enables the agent to evolve toward more efficient search patterns and reduces tool-call rounds by about 20% while improving accuracy. Furthermore, we introduce a new metric called F-AE Score to measure the model's overall performance in balancing accuracy and efficiency. Experiments demonstrate that WebClipper compresses tool-call rounds under excellent performance, providing practical insight into balancing effectiveness and efficiency in web agent design. △ Less

Submitted 13 February, 2026; originally announced February 2026.

Comments: Work in Progress

arXiv:2602.10604 [pdf, ps, other]

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Authors: Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao, Changyi Wan, Chao Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengting Feng, Chengyuan Yao, Chunrui Han, Dan Ma, Dapeng Shi, Daxin Jiang, Dehua Ma, Deshan Sun , et al. (191 additional authors not shown)

Abstract: We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/f… ▽ More We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks, achieving 85.4% on IMO-AnswerBench, 86.4% on LiveCodeBench-v6 (2024.08-2025.05), 88.2% on tau2-Bench, 69.0% on BrowseComp (with context management), and 51.0% on Terminal-Bench 2.0, comparable to frontier models such as GPT-5.2 xHigh and Gemini 3.0 Pro. By redefining the efficiency frontier, Step 3.5 Flash provides a high-density foundation for deploying sophisticated agents in real-world industrial environments. △ Less

Submitted 23 February, 2026; v1 submitted 11 February, 2026; originally announced February 2026.

Comments: Technical report for Step 3.5 Flash

arXiv:2602.09245 [pdf]

Micromolar chemical imaging by high-energy low-photodamage Coherent Anti-stokes Raman Scattering (HELP-CARS)

Authors: Guangrui Ding, Dingcheng Sun, Yifan Zhu, Rong Tang, Hongli Ni, Yuhao Yuan, Haonan Lin, Ji-Xin Cheng

Abstract: Coherent anti-Stokes Raman scattering (CARS) microscopy offers label-free chemical imaging capabilities, but its performance is constrained by small Raman scattering cross-section, strong non-resonant background (NRB), and limited signal-to-noise ratio (SNR). Here, we introduce a high-energy, low-photodamage CARS (HELP-CARS) platform designed to overcome these physical limitations. By employing a… ▽ More Coherent anti-Stokes Raman scattering (CARS) microscopy offers label-free chemical imaging capabilities, but its performance is constrained by small Raman scattering cross-section, strong non-resonant background (NRB), and limited signal-to-noise ratio (SNR). Here, we introduce a high-energy, low-photodamage CARS (HELP-CARS) platform designed to overcome these physical limitations. By employing a 1-MHz non-collinear optical parametric amplifier (NOPA) with extensive pulse chirping, HELP-CARS increases the coherent Raman excitation efficiency by ~300 times and improves the signal-to-nonresonant background ratio by 11 times, while inducing negligible damage during live cell imaging. Furthermore, to remove non-independent noise and physically entangled non-resonant background, we incorporate self-supervised deep-learning denoising and background removal based on the Kramers-Kronig relationship, yielding sensitivity improvement by an additional order of magnitude. Together, these advances provide a micromolar imaging sensitivity (160 uM for Dimethyl sulfoxide-d6) corresponding to 1000 molecules in the focal volume. Such high sensitivity enables high-fidelity chemical imaging in both fingerprint and silent windows. Hyperspectral HELP-CARS imaging of deuterated fatty acids allowed first observation of chemical separation with single lipid droplet. Together, HELP-CARS offers a powerful and generalizable approach for ultrasensitive and quantitative vibrational imaging of biological systems. △ Less

Submitted 9 February, 2026; originally announced February 2026.

arXiv:2602.04705 [pdf, ps, other]

ERNIE 5.0 Technical Report

Authors: Haifeng Wang, Hua Wu, Tian Wu, Yu Sun, Jing Liu, Dianhai Yu, Yanjun Ma, Jingzhou He, Zhongjun He, Dou Hong, Qiwen Liu, Shuohuan Wang, Junyuan Shang, Zhenyu Zhang, Yuchen Ding, Jinle Zeng, Jiabin Yang, Liang Shen, Ruibiao Chen, Weichong Yin, Siyu Ding, Dai Dai, Shikun Feng, Siqi Bao, Bolei He , et al. (413 additional authors not shown)

Abstract: In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practi… ▽ More In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm. Within a single pre-training run, the model learns a family of sub-models with varying depths, expert capacities, and routing sparsity, enabling flexible trade-offs among performance, model size, and inference latency in memory- or time-constrained scenarios. Moreover, we systematically address the challenges of scaling reinforcement learning to unified foundation models, thereby guaranteeing efficient and stable post-training under ultra-sparse MoE architectures and diverse multimodal settings. Extensive experiments demonstrate that ERNIE 5.0 achieves strong and balanced performance across multiple modalities. To the best of our knowledge, among publicly disclosed models, ERNIE 5.0 represents the first production-scale realization of a trillion-parameter unified autoregressive model that supports both multimodal understanding and generation. To facilitate further research, we present detailed visualizations of modality-agnostic expert routing in the unified model, alongside comprehensive empirical analysis of elastic training, aiming to offer profound insights to the community. △ Less

Submitted 4 February, 2026; originally announced February 2026.

arXiv:2602.03388 [pdf, ps, other]

Exploring Hyperon Skyrme Forces in Multi-$Λ$ Hypernuclei and Neutron Star Matter

Authors: X. D. Sun, S. C. Han, J. N. Hu, A. Li

Abstract: A major source of uncertainty in modeling the strangeness-rich interiors of neutron stars arises from the poorly constrained two-body and three-body interactions among hyperons and nucleons. We perform a comprehensive Bayesian analysis of the $ΛΛ$ and $ΛΛN$ interaction parameters within the Skyrme Hartree-Fock framework, constrained by both hypernuclei experimental data and astrophysical observati… ▽ More A major source of uncertainty in modeling the strangeness-rich interiors of neutron stars arises from the poorly constrained two-body and three-body interactions among hyperons and nucleons. We perform a comprehensive Bayesian analysis of the $ΛΛ$ and $ΛΛN$ interaction parameters within the Skyrme Hartree-Fock framework, constrained by both hypernuclei experimental data and astrophysical observations. Our results show that the parameter space of the $ΛΛ$ interaction is tightly constrained by combining nuclear and astrophysical data, while the parameters of the $ΛΛN$ three-body interaction remain sensitive to astrophysical inputs alone. Specifically, the local, momentum-independent two-body interaction parameter $λ_0$ is tightly constrained and predominantly attractive, while the momentum-dependent parameters $λ_1$ and $λ_2$ contribute repulsive effects at high densities. A key role is played by the $ΛΛ$ potential depth in pure $Λ$ matter, which effectively constrains the two-body $ΛΛ$ interaction and governs the balance between attraction at low densities and repulsion at high densities. The repulsive components of $ΛΛ$ interactions then decrease hyperon fractions and reconcile hyperon-rich equations of state with the observed $\sim2\,M_{\odot}$ neutron stars, increasing the maximum mass by up to 22\%. The inclusion of $ΛΛN$ three-body forces further stiffens the EOS, raising the maximum mass by up to $\sim 0.1\,M_{\odot}$. Our study represents a promising step toward a complete, experimentally grounded description of dense matter across a wide range of densities and strangeness compositions. △ Less

Submitted 3 February, 2026; originally announced February 2026.

Comments: 18 pages incuding APPENDIX, 9 figures, 7 tables; accepted for publication in MNRAS (2026)

arXiv:2602.00948 [pdf, ps, other]

FinEvo: From Isolated Backtests to Ecological Market Games for Multi-Agent Financial Strategy Evolution

Authors: Mingxi Zou, Jiaxiang Chen, Aotian Luo, Jingyi Dai, Chi Zhang, Dongning Sun, Zenglin Xu

Abstract: Conventional financial strategy evaluation relies on isolated backtests in static environments. Such evaluations assess each policy independently, overlook correlations and interactions, and fail to explain why strategies ultimately persist or vanish in evolving markets. We shift to an ecological perspective, where trading strategies are modeled as adaptive agents that interact and learn within a… ▽ More Conventional financial strategy evaluation relies on isolated backtests in static environments. Such evaluations assess each policy independently, overlook correlations and interactions, and fail to explain why strategies ultimately persist or vanish in evolving markets. We shift to an ecological perspective, where trading strategies are modeled as adaptive agents that interact and learn within a shared market. Instead of proposing a new strategy, we present FinEvo, an ecological game formalism for studying the evolutionary dynamics of multi-agent financial strategies. At the individual level, heterogeneous ML-based traders-rule-based, deep learning, reinforcement learning, and large language model (LLM) agents-adapt using signals such as historical prices and external news. At the population level, strategy distributions evolve through three designed mechanisms-selection, innovation, and environmental perturbation-capturing the dynamic forces of real markets. Together, these two layers of adaptation link evolutionary game theory with modern learning dynamics, providing a principled environment for studying strategic behavior. Experiments with external shocks and real-world news streams show that FinEvo is both stable for reproducibility and expressive in revealing context-dependent outcomes. Strategies may dominate, collapse, or form coalitions depending on their competitors-patterns invisible to static backtests. By reframing strategy evaluation as an ecological game formalism, FinEvo provides a unified, mechanism-level protocol for analyzing robustness, adaptation, and emergent dynamics in multi-agent financial markets, and may offer a means to explore the potential impact of macroeconomic policies and financial regulations on price evolution and equilibrium. △ Less

Submitted 31 January, 2026; originally announced February 2026.

Comments: Preprint. Submitted to a conference

arXiv:2601.17722 [pdf, ps, other]

EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents

Authors: Ying Mo, Yu Bai, Dapeng Sun, Yuqian Shi, Yukai Miao, Li Chen, Dan Li

Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have enabled agents to operate in open-ended web and operating system environments. However, existing benchmarks predominantly target consumer-oriented scenarios (e.g., e-commerce and travel booking), failing to capture the complexity and rigor of professional enterprise workflows. Enterprise systems pose distinct challenges, including hi… ▽ More Recent advances in Multimodal Large Language Models (MLLMs) have enabled agents to operate in open-ended web and operating system environments. However, existing benchmarks predominantly target consumer-oriented scenarios (e.g., e-commerce and travel booking), failing to capture the complexity and rigor of professional enterprise workflows. Enterprise systems pose distinct challenges, including high-density user interfaces, strict business logic constraints, and a strong reliance on precise, state-consistent information retrieval-settings in which current generalist agents often struggle. To address this gap, we introduce EntWorld, a large-scale benchmark consisting of 1,756 tasks across six representative enterprise domains, including customer relationship management (CRM), information technology infrastructure library (ITIL), and enterprise resource planning (ERP) systems. Unlike previous datasets that depend on fragile execution traces or extensive manual annotation, EntWorld adopts a schema-grounded task generation framework that directly reverse-engineers business logic from underlying database schemas, enabling the synthesis of realistic, long-horizon workflows. Moreover, we propose a SQL-based deterministic verification mechanism in building datasets that replaces ambiguous visual matching with rigorous state-transition validation. Experimental results demonstrate that state-of-the-art models (e.g., GPT-4.1) achieve 47.61% success rate on EntWorld, substantially lower than the human performance, highlighting a pronounced enterprise gap in current agentic capabilities and the necessity of developing domain-specific agents. We release EntWorld as a rigorous testbed to facilitate the development and evaluation of the next generation of enterprise-ready digital agents. △ Less

Submitted 25 January, 2026; originally announced January 2026.

arXiv:2601.14792 [pdf, ps, other]

Robustness of Mixtures of Experts to Feature Noise

Authors: Dong Sun, Rahul Nittala, Rebekka Burkholz

Abstract: Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, Mo… ▽ More Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation. △ Less

Submitted 21 January, 2026; originally announced January 2026.

arXiv:2601.13591 [pdf, ps, other]

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

Authors: Maojun Sun, Yifei Xie, Yue Wu, Ruijian Han, Binyan Jiang, Defeng Sun, Yancheng Yuan, Jian Huang

Abstract: Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers, poses a significant challenge for evaluation. To address this, we introduce DSAEval, a benchmark comprising 641 real-world data science problems grounded in 28… ▽ More Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers, poses a significant challenge for evaluation. To address this, we introduce DSAEval, a benchmark comprising 641 real-world data science problems grounded in 285 diverse datasets, covering both structured and unstructured data (e.g., vision and text). DSAEval incorporates three distinctive features: (1) Multimodal Environment Perception, which enables agents to interpret observations from multiple modalities including text and vision; (2) Multi-Query Interactions, which mirror the iterative and cumulative nature of real-world data science projects; and (3) Multi-Dimensional Evaluation, which provides a holistic assessment across reasoning, code, and results. We systematically evaluate 11 advanced agentic LLMs using DSAEval. Our results show that Claude-Sonnet-4.5 achieves the strongest overall performance, GPT-5.2 is the most efficient, and MiMo-V2-Flash is the most cost-effective. We further demonstrate that multimodal perception consistently improves performance on vision-related tasks, with gains ranging from 2.04% to 11.30%. Overall, while current data science agents perform well on structured data and routine data anlysis workflows, substantial challenges remain in unstructured domains. Finally, we offer critical insights and outline future research directions to advance the development of data science agents. △ Less

Submitted 19 January, 2026; originally announced January 2026.

arXiv:2601.11588 [pdf, ps, other]

The global well-posedness for master equations of mean field games of controls

Authors: Shuhui Liu, Xintian Liu, Chenchen Mou, Defeng Sun

Abstract: In this manuscript, we establish the global well-posedness for master equations of mean field games of controls, where the interaction is through the joint law of the state and control. Our results are proved under two different conditions: the Lasry-Lions monotonicity and the displacement $λ$-monotonicity, both considered in their integral forms. We provide a detailed analysis of both the differe… ▽ More In this manuscript, we establish the global well-posedness for master equations of mean field games of controls, where the interaction is through the joint law of the state and control. Our results are proved under two different conditions: the Lasry-Lions monotonicity and the displacement $λ$-monotonicity, both considered in their integral forms. We provide a detailed analysis of both the differential and integral versions of these monotonicity conditions for the corresponding nonseparable Hamiltonian and examine their relation. The proof of global well-posedness relies on the propagation of these monotonicity conditions in their integral forms and a priori uniform Lipschitz continuity of the solution with respect to the measure variable. △ Less

Submitted 4 January, 2026; originally announced January 2026.

Comments: 27 pages

MSC Class: 35R15; 49N80; 49Q22; 60H30; 91A16; 93E20

arXiv:2601.11292 [pdf, ps, other]

OpenACM: An Open-Source SRAM-Based Approximate CiM Compiler

Authors: Yiqi Zhou, JunHao Ma, Xingyang Li, Yule Sheng, Yue Yuan, Yikai Wang, Bochang Wang, Yiheng Wu, Shan Shen, Wei Xing, Daying Sun, Li Li, Zhiqiang Xiao

Abstract: The rise of data-intensive AI workloads has exacerbated the ``memory wall'' bottleneck. Digital Compute-in-Memory (DCiM) using SRAM offers a scalable solution, but its vast design space makes manual design impractical, creating a need for automated compilers. A key opportunity lies in approximate computing, which leverages the error tolerance of AI applications for significant energy savings. Howe… ▽ More The rise of data-intensive AI workloads has exacerbated the ``memory wall'' bottleneck. Digital Compute-in-Memory (DCiM) using SRAM offers a scalable solution, but its vast design space makes manual design impractical, creating a need for automated compilers. A key opportunity lies in approximate computing, which leverages the error tolerance of AI applications for significant energy savings. However, existing DCiM compilers focus on exact arithmetic, failing to exploit this optimization. This paper introduces OpenACM, the first open-source, accuracy-aware compiler for SRAM-based approximate DCiM architectures. OpenACM bridges the gap between application error tolerance and hardware automation. Its key contribution is an integrated library of accuracy-configurable multipliers (exact, tunable approximate, and logarithmic), enabling designers to make fine-grained accuracy-energy trade-offs. The compiler automates the generation of the DCiM architecture, integrating a transistor-level customizable SRAM macro with variation-aware characterization into a complete, open-source physical design flow based on OpenROAD and the FreePDK45 library. This ensures full reproducibility and accessibility, removing dependencies on proprietary tools. Experimental results on representative convolutional neural networks (CNNs) demonstrate that OpenACM achieves energy savings of up to 64\% with negligible loss in application accuracy. The framework is available on \href{https://github.com/ShenShan123/OpenACM}{OpenACM:URL} △ Less

Submitted 16 January, 2026; originally announced January 2026.

Comments: Accepted by DATE 2026

arXiv:2601.02801 [pdf, ps, other]

Transient Large-Scale Anisotropy in TeV Cosmic Rays due to an Interplanetary Coronal Mass Ejection

Authors: Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen, S. H. Chen , et al. (291 additional authors not shown)

Abstract: Large- or medium-scale cosmic ray anisotropy at TeV energies has not previously been confirmed to vary with time. Transient anisotropy changes have been observed below 150 GeV, especially near the passage of an interplanetary shock and coronal mass ejection containing a magnetic flux rope ejected by a solar storm, which can trigger a geomagnetic storm with practical consequences. In such events, c… ▽ More Large- or medium-scale cosmic ray anisotropy at TeV energies has not previously been confirmed to vary with time. Transient anisotropy changes have been observed below 150 GeV, especially near the passage of an interplanetary shock and coronal mass ejection containing a magnetic flux rope ejected by a solar storm, which can trigger a geomagnetic storm with practical consequences. In such events, cosmic rays provide remote sensing of the magnetic field properties. Here we report the observation of transient large-scale anisotropy in TeV cosmic ray ions using data from the Large High Altitude Air Shower Observatory (LHAASO). We analyze hourly skymaps of the transient cosmic ray intensity excess or deficit, the gradient of which indicates the direction and magnitude of transient large-scale anisotropy across the field of view. We observe enhanced anisotropy above typical hourly fluctuations with $>$5$σ$ significance during some hours of November 4, 2021, in separate data sets for four primary cosmic ray energy ranges of median energy from $E$=0.7 to 3.1 TeV. The gradient varies with energy as $E^γ$, where $γ\approx-0.5$. At a median energy $\leq$1.0 TeV, this gradient corresponds to a dipole anisotropy of at least 1\%, or possibly a weaker anisotropy of higher order. This new type of observation opens the opportunity to study interplanetary magnetic structures using air shower arrays around the world, complementing existing in situ and remote measurements of plasma properties. △ Less

Submitted 6 January, 2026; originally announced January 2026.

Comments: 14 pages, 11 figures. Submitted to Physical Review Letters, November, 2025

arXiv:2601.01337 [pdf, ps, other]

HyperNetWalk: A Unified Framework for Personalized and Population-Level Cancer Driver Gene Identification via Multi-Network Hypergraph Diffusion

Authors: Xueqing Xu, Yonghang Gao, Duanchen Sun, Ling-Yun Wu

Abstract: Identifying cancer driver genes is crucial for understanding tumor biology and developing precision therapies. However, existing computational methods often rely on single biological networks or population-level mutation patterns, limiting their ability to identify patient-specific drivers and leverage the complementary information from multiple network types. Here, we present HyperNetWalk, a nove… ▽ More Identifying cancer driver genes is crucial for understanding tumor biology and developing precision therapies. However, existing computational methods often rely on single biological networks or population-level mutation patterns, limiting their ability to identify patient-specific drivers and leverage the complementary information from multiple network types. Here, we present HyperNetWalk, a novel computational framework that integrates multiple biological networks and hypergraph diffusion to identify driver genes at both personalized and cohort levels. In the first stage, HyperNetWalk integrates protein-protein interaction networks, gene regulatory networks, and dynamic co-expression networks through sample-independent random walks on patient-specific subnetworks to capture topological importance and expression perturbation effects. In the second stage, it refines predictions through hypergraph-based random walks that leverage cross-sample information while preserving individual mutational contexts. Comprehensive evaluation on 12 TCGA cancer types demonstrates that HyperNetWalk achieves superior or competitive performance compared to state-of-the-art methods in both personalized and cohort-level predictions. Notably, HyperNetWalk successfully identifies known driver genes with high precision while revealing cancer type-specific drivers that reflect distinct biological mechanisms. Our framework provides a unified solution for personalized and population-based driver gene identification, offering valuable insights for precision oncology and therapeutic target discovery. △ Less

Submitted 3 January, 2026; originally announced January 2026.

Comments: 31 pages, 4 main figures, 7 supplementary figures. Code is available at https://github.com/xqxu921/HyperNetWalk

MSC Class: 92B05 ACM Class: J.3

arXiv:2512.22274 [pdf, ps, other]

GeCo: A Differentiable Geometric Consistency Metric for Video Generation

Authors: Leslie Gu, Junhwa Hur, Charles Herrmann, Fangneng Zhan, Todd Zickler, Deqing Sun, Hanspeter Pfister

Abstract: We introduce GeCo, a geometry-grounded metric for jointly detecting geometric deformation and occlusion-inconsistency artifacts in static scenes. By fusing residual motion and depth priors, GeCo produces interpretable, dense consistency maps that reveal these artifacts. We use GeCo to systematically benchmark recent video generation models, uncovering common failure modes, and further employ it as… ▽ More We introduce GeCo, a geometry-grounded metric for jointly detecting geometric deformation and occlusion-inconsistency artifacts in static scenes. By fusing residual motion and depth priors, GeCo produces interpretable, dense consistency maps that reveal these artifacts. We use GeCo to systematically benchmark recent video generation models, uncovering common failure modes, and further employ it as a training-free guidance loss to reduce deformation artifacts during video generation. △ Less

Submitted 24 December, 2025; originally announced December 2025.

arXiv:2512.18401 [pdf, ps, other]

Energy-Dependent Shifts of Medium-Scale Anisotropies in Very-High-Energy Cosmic Rays Observed by LHAASO-KM2A

Authors: The LHAASO collabration, Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, C. M. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, G. H. Chen, H. X. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (292 additional authors not shown)

Abstract: Small deviations from isotropy in the arrival directions of Galactic cosmic rays serve as a unique probe of the local magnetic environment. In this Letter, we report observations of medium-scale anisotropies (MSA) at energies above 10 TeV using the LHAASO-KM2A array. Our analysis identifies four regions of excess and four regions of deficit, each spanning angular scales of approximately ten degree… ▽ More Small deviations from isotropy in the arrival directions of Galactic cosmic rays serve as a unique probe of the local magnetic environment. In this Letter, we report observations of medium-scale anisotropies (MSA) at energies above 10 TeV using the LHAASO-KM2A array. Our analysis identifies four regions of excess and four regions of deficit, each spanning angular scales of approximately ten degrees. Crucially, we detect significant energy-dependent shifts in the centroids of two excess regions: Region B and the newly identified Region $\mathrm{\widetilde{D}}$. We also characterize the energy evolution of the fractional relative intensity across both excess and deficit regions. These findings imply that the observed anisotropies are shaped by the specific realization of the local turbulent magnetic field within the cosmic ray scattering length. Such energy-dependent behaviors impose strict constraints on local turbulence models and cosmic ray propagation theories. △ Less

Submitted 5 January, 2026; v1 submitted 20 December, 2025; originally announced December 2025.

arXiv:2512.16638 [pdf, ps, other]

Cygnus X-3: A variable petaelectronvolt gamma-ray source

Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, J. Blunier, A. V. Bukevich, C. M. Cai, Y. Y. Cai, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, E. S. Chen, G. H. Chen, H. K. Chen, L. F. Chen, Liang Chen, Long Chen, M. J. Chen, M. L. Chen , et al. (306 additional authors not shown)

Abstract: We report the discovery of variable $γ$-rays up to petaelectronvolt from Cygnus X-3, an iconic X-ray binary. The $γ$-ray signal was detected with a statistical significance of approximately 10 $σ$ by the Large High Altitude Air Shower Observatory (LHAASO). Its intrinsic spectral energy distribution (SED), extending from 0.06 to 3.7 PeV, shows a pronounced rise toward 1 PeV after accounting for abs… ▽ More We report the discovery of variable $γ$-rays up to petaelectronvolt from Cygnus X-3, an iconic X-ray binary. The $γ$-ray signal was detected with a statistical significance of approximately 10 $σ$ by the Large High Altitude Air Shower Observatory (LHAASO). Its intrinsic spectral energy distribution (SED), extending from 0.06 to 3.7 PeV, shows a pronounced rise toward 1 PeV after accounting for absorption by the cosmic microwave background radiation. We find variability on month-long timescales at a significance of $8.6 σ$, coinciding with a high state of the GeV gamma-ray flux detected by the Fermi-LAT. This,together with a 3.2$σ$ evidence for orbital modulation, suggests that the PeV $γ$-rays originate within, or in close proximity to, the binary system itself. The observed energy spectrum and temporal modulation can be naturally explained by $γ$-ray production through photomeson processes in the innermost region of the relativistic jet, where protons need to be accelerated to tens of PeV energies. △ Less

Submitted 12 April, 2026; v1 submitted 18 December, 2025; originally announced December 2025.

Comments: Submitted to NSR

arXiv:2512.08046 [pdf]

Chirality-induced magnetoresistance in hybrid organic-inorganic perovskite semiconductors

Authors: Md Azimul Haque, Pius Markus Theiler, Ian A. Leahy, Steven P. Harvey, Jeiwan Tan, Matthew P Hautzinger, Margherita Taddei, Aeron McConnell, Andrew Greider, Andrew H. Comstock, Yifan Dong, Kirstin Alberi, Yuan Ping, Peter C. Sercel, Joseph M. Luther, Dali Sun, Matthew C. Beard

Abstract: The combination of semiconducting properties and synthetically tunable chirality in chiral metal halide semiconductors (CMHS) offer a compelling platform for room temperature control over electronic spin properties, leveraging effects such as chirality-induced spin selectivity (CISS) for the development of new opto-spintronic functionalities. We report room-temperature CISS-induced magnetoresistan… ▽ More The combination of semiconducting properties and synthetically tunable chirality in chiral metal halide semiconductors (CMHS) offer a compelling platform for room temperature control over electronic spin properties, leveraging effects such as chirality-induced spin selectivity (CISS) for the development of new opto-spintronic functionalities. We report room-temperature CISS-induced magnetoresistance (CISS-MR) exceeding 100% for spin valves in a configuration consisting of a ferromagnet (FM), tunneling barrier, and CMHS. The high CISS-MR is attributed to interfacial spin-selective tunneling barrier induced by the chirality, which can produce current dissymmetry factors that surpass the limit imposed by the Jullière model governed by the intrinsic spin polarization of the adjacent FM contact. The CISS-MR exhibits a strong dependence on the CMHS composition, revealing a structure-property relationship between CISS and structural chirality. The observed exceptionally large tunneling MR response differentiates from a subtle anisotropic MR arising from the proximity effect at the FM/CMHS interface in the absence of a tunneling barrier. Our study provides insights into charge-to-spin interconversion in chiral semiconductors, offering materials design principles to control and enhance CISS response and utilize it in functional platforms. △ Less

Submitted 8 December, 2025; originally announced December 2025.

Report number: MR-01

Showing 1–50 of 801 results for author: Sun, D