-
SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment
Authors:
Xixun Lin,
Yang Liu,
Yancheng Chen,
Yongxuan Wu,
Yucheng Ning,
Yilong Liu,
Nan Sun,
Shun Zhang,
Bin Chong,
Chuan Zhou,
Yanan Cao,
Li Guo
Abstract:
The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security ap…
▽ More
The performance of large language model (LLM) agents depends critically on the execution harness, the system layer that orchestrates tool use, context management, and state persistence. Yet this same architectural centrality makes the harness a high-value attack surface: a single compromise at the harness level can cascade through the entire execution pipeline. We observe that existing security approaches suffer from structural mismatch, leaving them blind to harness-internal state and unable to coordinate across the different phases of agent operation. In this paper, we introduce \safeharness{}, a security architecture in which four proposed defense layers are woven directly into the agent lifecycle to address above significant limitations: adversarial context filtering at input processing, tiered causal verification at decision making, privilege-separated tool control at action execution, and safe rollback with adaptive degradation at state update. The proposed cross-layer mechanisms tie these layers together, escalating verification rigor, triggering rollbacks, and tightening tool privileges whenever sustained anomalies are detected. We evaluate \safeharness{} on benchmark datasets across diverse harness configurations, comparing against four security baselines under five attack scenarios spanning six threat categories. Compared to the unprotected baseline, \safeharness{} achieves an average reduction of approximately 38\% in UBR and 42\% in ASR, substantially lowering both the unsafe behavior rate and the attack success rate while preserving core task utility.
△ Less
Submitted 15 April, 2026;
originally announced April 2026.
-
Exploring Concept Subspace for Self-explainable Text-Attributed Graph Learning
Authors:
Xiaoxue Han,
Libo Zhang,
Zining Zhu,
Yue Ning
Abstract:
We introduce Graph Concept Bottleneck (GCB) as a new paradigm for self-explainable text-attributed graph learning. GCB maps graphs into a subspace, concept bottleneck, where each concept is a meaningful phrase, and predictions are made based on the activation of these concepts. Unlike existing interpretable graph learning methods that primarily rely on subgraphs as explanations, the concept bottle…
▽ More
We introduce Graph Concept Bottleneck (GCB) as a new paradigm for self-explainable text-attributed graph learning. GCB maps graphs into a subspace, concept bottleneck, where each concept is a meaningful phrase, and predictions are made based on the activation of these concepts. Unlike existing interpretable graph learning methods that primarily rely on subgraphs as explanations, the concept bottleneck provides a new form of interpretation. To refine the concept space, we apply the information bottleneck principle to focus on the most relevant concepts. This not only yields more concise and faithful explanations but also explicitly guides the model to "think" toward the correct decision. We empirically show that GCB achieves intrinsic interpretability with accuracy on par with black-box Graph Neural Networks. Moreover, it delivers better performance under distribution shifts and data perturbations, showing improved robustness and generalizability, benefitting from concept-guided prediction.
△ Less
Submitted 13 April, 2026;
originally announced April 2026.
-
Kramers-Kronig causality in integrated photonics: The spectral tension between ultraviolet transition and mid-infrared absorption
Authors:
Yue Hu,
Zhenyuan Shang,
Chenxi Zhang,
Yuanjie Ning,
Weiqin Zheng,
Dengke Chen,
Sanli Huang,
Baoqi Shi,
Zeying Zhong,
Hao Tan,
Wei Sun,
Yi-Han Luo,
Xinmao Yin,
Zhi-Chuan Niu,
Junqiu Liu
Abstract:
Dispersion engineering via geometric confinement is essential to integrated photonics, enabling phenomena such as soliton microcombs, supercontinua, parametric oscillators, and entangled photons. However, prevailing methodologies rely on semi-empirical Sellmeier models that assume idealized material purity, neglecting the pronounced dispersion shifts induced by residual impurities like hydrogen-re…
▽ More
Dispersion engineering via geometric confinement is essential to integrated photonics, enabling phenomena such as soliton microcombs, supercontinua, parametric oscillators, and entangled photons. However, prevailing methodologies rely on semi-empirical Sellmeier models that assume idealized material purity, neglecting the pronounced dispersion shifts induced by residual impurities like hydrogen-related bonds. Here, we demonstrate that these residual bonds fundamentally alter the dispersion landscape spanning from the ultraviolet (UV) to the mid-infrared (MIR) spectra. Specifically, they introduce MIR vibrational absorption while simultaneously modifying UV electronic transition, shifting the bandgap and UV pole. We show that the spectral tension between these UV and MIR modifications dictates the group velocity dispersion from the visible to the near-infrared (NIR) via the Kramers-Kronig causality. We experimentally validate this phenomenon through systematic characterization of broadband loss and dispersion in ultralow-loss silicon nitride photonic integrated circuits. By rigorously incorporating these effects, we bridge the gap between empirical fitting and predictive physical modelling. Our study resolves long-standing discrepancies in dispersion engineering, providing precision control essential for next-generation integrated photonics.
△ Less
Submitted 31 March, 2026; v1 submitted 30 March, 2026;
originally announced March 2026.
-
Dependable Exploitation of High-Dimensional Unlabeled Data in an Assumption-Lean Framework
Authors:
Chao Ying,
Siyi Deng,
Yang Ning,
Jiwei Zhao,
Heping Zhang
Abstract:
Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data.
In this paper, we examine the statistical inference problem in an assumption-lean framework which involves a high-dimensional regression parameter, defined by minimizing the least squares, within the context of semi-supervised learning.…
▽ More
Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data.
In this paper, we examine the statistical inference problem in an assumption-lean framework which involves a high-dimensional regression parameter, defined by minimizing the least squares, within the context of semi-supervised learning.
We investigate when and how unlabeled data can enhance the estimation efficiency of a regression parameter functional.
First, we demonstrate that a straightforward debiased estimator can only be more efficient than its supervised counterpart if the unknown conditional mean function can be consistently estimated at an appropriate rate.
Otherwise, incorporating unlabeled data can actually be counterproductive.
To address this vulnerability, we propose a novel estimator guaranteed to be at least as efficient as the supervised baseline, even when the conditional mean function is misspecified.
This ensures the dependable use of unlabeled data for statistical inference.
Finally, we extend our approach to the general M-estimation framework, and demonstrate the effectiveness of our methodology through comprehensive simulation studies and a real data application.
△ Less
Submitted 29 March, 2026;
originally announced March 2026.
-
Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters
Authors:
Nan Cui,
Wendy Hui Wang,
Yue Ning
Abstract:
Large Language Models (LLMs) have introduced new capabilities to recommender systems, enabling dynamic, context-aware, and conversational recommendations. However, LLM-based recommender systems inherit and may amplify social biases embedded in their pre-training data, especially when demographic cues are present. Existing fairness solutions either require extra parameters fine-tuning, or suffer fr…
▽ More
Large Language Models (LLMs) have introduced new capabilities to recommender systems, enabling dynamic, context-aware, and conversational recommendations. However, LLM-based recommender systems inherit and may amplify social biases embedded in their pre-training data, especially when demographic cues are present. Existing fairness solutions either require extra parameters fine-tuning, or suffer from optimization instability. We propose a lightweight and scalable bias mitigation method that combines a kernelized Iterative Null-space Projection (INLP) with a gated Mixture-of-Experts (MoE) adapter. Our approach estimates a closed-form projection that removes single or multiple sensitive attributes from LLM representations with no additional trainable parameters. To preserve task utility, we introduce a two-level MoE adapter that selectively restores useful signals without reintroducing bias. Experiments on two public datasets show that our method reduces attribute leakage across multiple protected variables while maintaining competitive recommendation accuracy.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Ultrafast microwave sensing and automatic recognition of dynamic objects in open world using programmable surface plasmonic neural networks
Authors:
Qian Ma,
Ze Gu,
Zi Rui Feng,
Qian Wen Wu,
Yu Ming Ning,
Zhi Qiao Han,
Rui Si Li,
Xinxin Gao,
Tie Jun Cui
Abstract:
The evolution toward next-generation intelligent sensing requires microwave systems to move beyond static detection and achieve high-speed and adaptive perception of dynamic scenes. However, the existing microwave sensing systems have bottlenecks owing to their sequential digital processing chain, limiting the refresh rates to hundreds of hertz, while the existing integrated microwave processors a…
▽ More
The evolution toward next-generation intelligent sensing requires microwave systems to move beyond static detection and achieve high-speed and adaptive perception of dynamic scenes. However, the existing microwave sensing systems have bottlenecks owing to their sequential digital processing chain, limiting the refresh rates to hundreds of hertz, while the existing integrated microwave processors are lack of programmable and scalable capabilities for robust and open-world deployment. To break the bottlenecks, here we report a programmable surface plasmonic neural network (P-SPNN) that enables real-time microwave sensing and automatic recognition of dynamic objects in open-world environment. With a perception latency of 25 ns and a refresh rate exceeding 10 kHz, the P-SPNN system operates more than two orders of magnitude faster than the conventional millimeter-wave sensors, while achieving an energy efficiency of 17 TOPS per W. With 288 programmable phase-modulated neurons, we demonstrate real time and robust classification of persons and cars with 91-97% accuracy in the open road scenarios. By further integrating beam-scanning function, P-SPNN enables multi-dimensional spatial temporal frequency sensing without the digital preprocessing. These results establish P-SPNN as a programmable, scalable, and low-power platform for high-speed perception tasks in realistic world, with broad implications for autonomous driving, intelligent sensing, and next-generation artificial intelligence hardware.
△ Less
Submitted 22 March, 2026;
originally announced March 2026.
-
Grade and Cohen-Macaulayness for DG-modules
Authors:
Yuancheng Ning,
Xiaoyan Yang
Abstract:
We establish an inequality relating the projective dimension of a DG-module in $\mathrm{D}^\mathrm{b}_\mathrm{f}(A)$ to its grade and introduce the concept of perfect DG-modules as a natural generalization of perfect modules. It is proved that a DG-module $M$ over a local Cohen-Macaulay DG-ring with constant amplitude is Cohen-Macaulay if and only if $M$ is perfect and…
▽ More
We establish an inequality relating the projective dimension of a DG-module in $\mathrm{D}^\mathrm{b}_\mathrm{f}(A)$ to its grade and introduce the concept of perfect DG-modules as a natural generalization of perfect modules. It is proved that a DG-module $M$ over a local Cohen-Macaulay DG-ring with constant amplitude is Cohen-Macaulay if and only if $M$ is perfect and $\mathrm{amp}M \leq \mathrm{amp}\mathrm{R}Γ_{\bar{\mathfrak{m}}}(M)$. An affirmative answer is provided to Conjecture 2.11 of Yoshida [J. Pure Appl. Algebra 123 (1998) 313--326]. We also study the grade of DG-modules with finite injective dimension and examine the preservation of Cohen-Macaulayness under tensor products.
△ Less
Submitted 24 February, 2026;
originally announced February 2026.
-
Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference
Authors:
Pengfei Hu,
Chang Lu,
Feifan Liu,
Yue Ning
Abstract:
Deep learning models for clinical event prediction on electronic health records (EHR) often suffer performance degradation when deployed under different data distributions. While domain adaptation (DA) methods can mitigate such shifts, its "black-box" nature prevents widespread adoption in clinical practice where transparency is essential for trust and safety. We propose ExtraCare to decompose pat…
▽ More
Deep learning models for clinical event prediction on electronic health records (EHR) often suffer performance degradation when deployed under different data distributions. While domain adaptation (DA) methods can mitigate such shifts, its "black-box" nature prevents widespread adoption in clinical practice where transparency is essential for trust and safety. We propose ExtraCare to decompose patient representations into invariant and covariant components. By supervising these two components and enforcing their orthogonality during training, our model preserves label information while exposing domain-specific variation at the same time for more accurate predictions than most feature alignment models. More importantly, it offers human-understandable explanations by mapping sparse latent dimensions to medical concepts and quantifying their contributions via targeted ablations. ExtraCare is evaluated on two real-world EHR datasets across multiple domain partition settings, demonstrating superior performance along with enhanced transparency, as evidenced by its accurate predictions and explanations from extensive case studies.
△ Less
Submitted 12 February, 2026;
originally announced February 2026.
-
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
Authors:
Yuting Ning,
Jaylen Jones,
Zhehao Zhang,
Chentao Ye,
Weitong Ruan,
Junyi Li,
Rahul Gupta,
Huan Sun
Abstract:
Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user's original intent. Such misaligned actions may arise from external attacks (e.g., indirect prompt injection) or from internal limitations (e.g., erroneous reasoning). They not only expose CUAs to safety risks, but also degrade task efficiency an…
▽ More
Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user's original intent. Such misaligned actions may arise from external attacks (e.g., indirect prompt injection) or from internal limitations (e.g., erroneous reasoning). They not only expose CUAs to safety risks, but also degrade task efficiency and reliability. This work makes the first effort to define and study misaligned action detection in CUAs, with comprehensive coverage of both externally induced and internally arising misaligned actions. We further identify three common categories in real-world CUA deployment and construct MisActBench, a benchmark of realistic trajectories with human-annotated, action-level alignment labels. Moreover, we propose DeAction, a practical and universal guardrail that detects misaligned actions before execution and iteratively corrects them through structured feedback. DeAction outperforms all existing baselines across offline and online evaluations with moderate latency overhead: (1) On MisActBench, it outperforms baselines by over 15% absolute in F1 score; (2) In online evaluation, it reduces attack success rate by over 90% under adversarial settings while preserving or even improving task success rate in benign environments.
△ Less
Submitted 9 February, 2026;
originally announced February 2026.
-
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
Authors:
Jaylen Jones,
Zhehao Zhang,
Yuting Ning,
Eric Fosler-Lussier,
Pierre-Luc St-Charles,
Yoshua Bengio,
Dawn Song,
Yu Su,
Huan Sun
Abstract:
Although computer-use agents (CUAs) hold significant potential to automate increasingly complex OS workflows, they can demonstrate unsafe unintended behaviors that deviate from expected outcomes even under benign input contexts. However, exploration of this risk remains largely anecdotal, lacking concrete characterization and automated methods to proactively surface long-tail unintended behaviors…
▽ More
Although computer-use agents (CUAs) hold significant potential to automate increasingly complex OS workflows, they can demonstrate unsafe unintended behaviors that deviate from expected outcomes even under benign input contexts. However, exploration of this risk remains largely anecdotal, lacking concrete characterization and automated methods to proactively surface long-tail unintended behaviors under realistic CUA scenarios. To fill this gap, we introduce the first conceptual and methodological framework for unintended CUA behaviors, by defining their key characteristics, automatically eliciting them, and analyzing how they arise from benign inputs. We propose AutoElicit: an agentic framework that iteratively perturbs benign instructions using CUA execution feedback, and elicits severe harms while keeping perturbations realistic and benign. Using AutoElicit, we surface hundreds of harmful unintended behaviors from state-of-the-art CUAs such as Claude 4.5 Haiku and Opus. We further evaluate the transferability of human-verified successful perturbations, identifying persistent susceptibility to unintended behaviors across various other frontier CUAs. This work establishes a foundation for systematically analyzing unintended behaviors in realistic computer-use settings.
△ Less
Submitted 8 February, 2026;
originally announced February 2026.
-
Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning
Authors:
Yansong Ning,
Jun Fang,
Naiqiang Tan,
Hao Liu
Abstract:
Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent…
▽ More
Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.
△ Less
Submitted 4 February, 2026;
originally announced February 2026.
-
Long-range orbital transport and inverse orbital Hall effect in Co/Ru-based terahertz emitters
Authors:
Zhou Chao,
Zhang Shaohua,
Hao Lei,
Jin Yaxuan,
Jiang Xianguo,
Yang Ning,
Zheng Li,
Meng Hao,
Lu Chao,
Huang Wendeng,
Wu Yizheng,
Zhou Yan,
Jia Xu
Abstract:
The utilization of terahertz (THz) emission spectroscopy in femtosecond photoexcited spintronic heterostructures has emerged as a versatile tool for investigating ultrafast spin-transport in a noncontact and non-invasive manner. However, the investigation of ultrafast orbital-transport is still in the primitive stage. Here, we experimentally demonstrate the orbital-to-charge current conversion in…
▽ More
The utilization of terahertz (THz) emission spectroscopy in femtosecond photoexcited spintronic heterostructures has emerged as a versatile tool for investigating ultrafast spin-transport in a noncontact and non-invasive manner. However, the investigation of ultrafast orbital-transport is still in the primitive stage. Here, we experimentally demonstrate the orbital-to-charge current conversion in Co/Ru heterostructures. Time-domain measurements reveal delayed and broadened terahertz waveforms with increasing Ru thickness, consistent with long-range orbital transport. In Co/Pt/Ru trilayers, the terahertz emission is further enhanced through constructive interference between the inverse spin Hall effect (ISHE) in Pt and inverse orbital Hall effect (IOHE) in Ru, while reversed stack structures show suppressed output. Ferromagnetic resonance (FMR) measurements reveal a strong correlation between damping and THz amplitude, highlighting efficient angular momentum conversion. These results position Co/Ru as a promising orbitronic platform for tunable ultrafast THz emission. Our results not only strengthen the physical mechanism of condensed matter physics but also pave the way for designing promising spin-orbitronic devices and terahertz emitters.
△ Less
Submitted 3 February, 2026;
originally announced February 2026.
-
The R2Pub Telescopes for Surveying: An Overview and Performance Evaluation of the System
Authors:
Xuan Song,
Xiaofeng Wang,
Jin Zhu,
Jian Li,
Jincheng Guo,
Danfeng Xiang,
Xin Li,
Cheng Liu,
Yuanhang Ning,
Zhishuai Ge,
Zhenzhen Shao,
Xiaochen Zheng,
Yi Yang,
Lei Zhang,
Yaqing Shi,
Dongyao Zhao,
Xiangyun Zeng,
Jun Mo,
Tengfei Song,
Yufeng Fan,
Yu Liu,
Jingxing Wang,
Shousheng He,
Ciren Wangdui,
Jujia Zhang
, et al. (7 additional authors not shown)
Abstract:
The R2Pub telescope, built by the Beijing Planetarium, is a 60 cm equatorial binocular telescope located at the Daocheng site of Yunnan Observatories in China, at an altitude of about 4700 m. This paper presents an overview of the R2Pub telescope system, including its design, instrumentation, and survey capabilities, and reports an initial evaluation of its system performance. R2Pub is a prime-foc…
▽ More
The R2Pub telescope, built by the Beijing Planetarium, is a 60 cm equatorial binocular telescope located at the Daocheng site of Yunnan Observatories in China, at an altitude of about 4700 m. This paper presents an overview of the R2Pub telescope system, including its design, instrumentation, and survey capabilities, and reports an initial evaluation of its system performance. R2Pub is a prime-focus binocular system, with each optical tube covering a field of view of approximately 18 square degrees. It is designed to detect a wide range of transient and variable sources in the local universe, such as variable stars, eclipsing binaries, supernovae, gamma-ray burst afterglows, tidal disruption events, active galactic nuclei, and other unknown transients. The observatory infrastructure, including the dome, equatorial mount, optical tubes, and associated subsystems, has been fully constructed and installed, and the system has entered the commissioning phase. Benefiting from the high-altitude location, good seeing conditions, and dark sky background at the Daocheng site, performance tests during commissioning show that the R2Pub system can reach a 5-sigma limiting magnitude of about 18.7 mag in the Pan-STARRS r' band with a 60 s exposure. Ongoing observations with R2Pub are expected to contribute to studies of variable and transient phenomena and to enhance public outreach in astronomy. The binocular design enables simultaneous dual-band observations, providing instantaneous color information for transient sources and improving the classification and physical characterization of their properties and evolution.
△ Less
Submitted 19 January, 2026;
originally announced January 2026.
-
An efficient solver based on low-rank approximation and Neumann matrix series for unsteady diffusion-type partial differential equations with random coefficients
Authors:
Yujun Zhu,
Min Li,
Yulan Ning,
Ju Ming
Abstract:
In this paper, we develop an efficient numerical solver for unsteady diffusion-type partial differential equations with random coefficients. A major computational challenge in such problems lies in repeatedly handling large-scale linear systems arising from spatial and temporal discretizations under uncertainty. To address this issue, we propose a novel generalized low-rank matrix approximation to…
▽ More
In this paper, we develop an efficient numerical solver for unsteady diffusion-type partial differential equations with random coefficients. A major computational challenge in such problems lies in repeatedly handling large-scale linear systems arising from spatial and temporal discretizations under uncertainty. To address this issue, we propose a novel generalized low-rank matrix approximation to represent the stochastic stiffness matrices, and approximate their inverses using the Neumann matrix series expansion. This approach transforms high-dimensional matrix inversion into a sequence of low-dimensional matrix multiplications. Therefore, the solver significantly reduces the computational cost and storage requirements while maintaining high numerical accuracy. The error analysis of the proposed solver is also provided. Finally, we apply the method to two classic uncertainty quantification problems: unsteady stochastic diffusion equations and the associated distributed optimal control problems. Numerical results demonstrate the feasibility and effectiveness of the proposed solver.
△ Less
Submitted 16 January, 2026;
originally announced January 2026.
-
Toward Global Large Language Models in Medicine
Authors:
Rui Yang,
Huitao Li,
Weihao Xuan,
Heli Qi,
Xin Li,
Kunyu Yu,
Yingjian Chen,
Rongrong Wang,
Jacques Behmoaras,
Tianxi Cai,
Bibhas Chakraborty,
Qingyu Chen,
Lionel Tim-Ee Cheng,
Marie-Louise Damwanza,
Chido Dzinotyiwei,
Aosong Feng,
Chuan Hong,
Yusuke Iwasawa,
Yuhe Ke,
Linah Kitala,
Taehoon Ko,
Jisan Lee,
Irene Li,
Jonathan Chong Kai Liew,
Hongfang Liu
, et al. (25 additional authors not shown)
Abstract:
Despite continuous advances in medical technology, the global distribution of health care resources remains uneven. The development of large language models (LLMs) has transformed the landscape of medicine and holds promise for improving health care quality and expanding access to medical information globally. However, existing LLMs are primarily trained on high-resource languages, limiting their…
▽ More
Despite continuous advances in medical technology, the global distribution of health care resources remains uneven. The development of large language models (LLMs) has transformed the landscape of medicine and holds promise for improving health care quality and expanding access to medical information globally. However, existing LLMs are primarily trained on high-resource languages, limiting their applicability in global medical scenarios. To address this gap, we constructed GlobMed, a large multilingual medical dataset, containing over 500,000 entries spanning 12 languages, including four low-resource languages. Building on this, we established GlobMed-Bench, which systematically assesses 56 state-of-the-art proprietary and open-weight LLMs across multiple multilingual medical tasks, revealing significant performance disparities across languages, particularly for low-resource languages. Additionally, we introduced GlobMed-LLMs, a suite of multilingual medical LLMs trained on GlobMed, with parameters ranging from 1.7B to 8B. GlobMed-LLMs achieved an average performance improvement of over 40% relative to baseline models, with a more than threefold increase in performance on low-resource languages. Together, these resources provide an important foundation for advancing the equitable development and application of LLMs globally, enabling broader language communities to benefit from technological advances.
△ Less
Submitted 5 January, 2026;
originally announced January 2026.
-
Towards Understanding and Characterizing Vulnerabilities in Intelligent Connected Vehicles through Real-World Exploits
Authors:
Yuelin Wang,
Yuqiao Ning,
Yanbang Sun,
Xiaofei Xie,
Zhihua Xie,
Yang Chen,
Zhen Guo,
Shihao Xue,
Junjie Wang,
Sen Chen
Abstract:
Intelligent Connected Vehicles (ICVs) are a core component of modern transportation systems, and their security is crucial as it directly relates to user safety. Despite prior research, most existing studies focus only on specific sub-components of ICVs due to their inherent complexity. As a result, there is a lack of systematic understanding of ICV vulnerabilities. Moreover, much of the current l…
▽ More
Intelligent Connected Vehicles (ICVs) are a core component of modern transportation systems, and their security is crucial as it directly relates to user safety. Despite prior research, most existing studies focus only on specific sub-components of ICVs due to their inherent complexity. As a result, there is a lack of systematic understanding of ICV vulnerabilities. Moreover, much of the current literature relies on human subjective analysis, such as surveys and interviews, which tends to be high-level and unvalidated, leaving a significant gap between theoretical findings and real-world attacks. To address this issue, we conducted the first large-scale empirical study on ICV vulnerabilities. We began by analyzing existing ICV security literature and summarizing the prevailing taxonomies in terms of vulnerability locations and types. To evaluate their real-world relevance, we collected a total of 649 exploitable vulnerabilities, including 592 from eight ICV vulnerability discovery competitions, Anonymous Cup, between January 2023 and April 2024, covering 48 different vehicles. The remaining 57 vulnerabilities were submitted daily by researchers. Based on this dataset, we assessed the coverage of existing taxonomies and identified several gaps, discovering one new vulnerability location and 13 new vulnerability types. We further categorized these vulnerabilities into 6 threat types (e.g., privacy data breach) and 4 risk levels (ranging from low to critical) and analyzed participants' skills and the types of ICVs involved in the competitions. This study provides a comprehensive and data-driven analysis of ICV vulnerabilities, offering actionable insights for researchers, industry practitioners, and policymakers. To support future research, we have made our vulnerability dataset publicly available.
△ Less
Submitted 2 January, 2026;
originally announced January 2026.
-
Targeted learning via probabilistic subpopulation matching
Authors:
Xiaokang Liu,
Jie Hu,
Naimin Jing,
Yang Ning,
Cheng Yong Tang,
Runze Li,
Yong Chen
Abstract:
In biomedical research, to obtain more accurate prediction results from a target study, leveraging information from multiple similar source studies is proved to be useful. However, in many biomedical applications based on real-world data, populations under consideration in different studies, e.g., clinical sites, can be heterogeneous, leading to challenges in properly borrowing information towards…
▽ More
In biomedical research, to obtain more accurate prediction results from a target study, leveraging information from multiple similar source studies is proved to be useful. However, in many biomedical applications based on real-world data, populations under consideration in different studies, e.g., clinical sites, can be heterogeneous, leading to challenges in properly borrowing information towards the target study. The state of art methods are typically based on study-level matching to identify source studies that are similar to the target study, whilst samples from source studies that significantly differ from the target study will all be dropped at the study level, which can lead to substantial loss of information. We consider a general situation where all studies are sampled from a super-population composed of distinct subpopulations, and propose a novel framework of targeted learning via subpopulation matching. In contrast to the existing study-level matching methods, measuring similarities between subpopulations can effectively decompose both within- and between-study heterogeneity, allowing incorporation of information from all source studies without dropping any samples as in the existing methods. We devise the proposed framework as a two-step procedure, where a finite mixture model is first fitted jointly across all studies to provide subject-wise probabilistic subpopulation information, followed by a step of within-subpopulation information transferring from source studies to the target study for each identified subpopulation. We establish the non-asymptotic properties of our estimator and demonstrate the ability of our method to improve prediction at the target study via simulation studies.
△ Less
Submitted 25 December, 2025;
originally announced December 2025.
-
Distributed inference for heterogeneous mixture models using multi-site data
Authors:
Xiaokang Liu,
Rui Duan,
Raymond J. Carroll,
Yang Ning,
Yong Chen
Abstract:
Mixture models postulate the overall population as a mixture of finite subpopulations with unobserved membership. Fitting mixture models usually requires large sample sizes and combining data from multiple sites can be beneficial. However, sharing individual participant data across sites is often less feasible due to various types of practical constraints, such as data privacy concerns. Moreover,…
▽ More
Mixture models postulate the overall population as a mixture of finite subpopulations with unobserved membership. Fitting mixture models usually requires large sample sizes and combining data from multiple sites can be beneficial. However, sharing individual participant data across sites is often less feasible due to various types of practical constraints, such as data privacy concerns. Moreover, substantial heterogeneity may exist across sites, and locally identified latent classes may not be comparable across sites. We propose a unified modeling framework where a common definition of the latent classes is shared across sites and heterogeneous mixing proportions of latent classes are allowed to account for between-site heterogeneity. To fit the heterogeneous mixture model on multi-site data, we propose a novel distributed Expectation-Maximization (EM) algorithm where at each iteration a density ratio tilted surrogate Q function is constructed to approximate the standard Q function of the EM algorithm as if the data from multiple sites could be pooled together. Theoretical analysis shows that our estimator achieves the same contraction property as the estimators derived from the EM algorithm based on the pooled data.
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
AgentBalance: Backbone-then-Topology Design for Cost-Effective Multi-Agent Systems under Budget Constraints
Authors:
Shuowei Cai,
Yansong Ning,
Hao Liu
Abstract:
Large Language Model (LLM)-based multi-agent systems (MAS) are becoming indispensable building blocks for web-scale applications such as web search, social network analytics, and online customer support, where cost-effectiveness is increasingly the primary constraint for large-scale deployment. While recent work improves MAS cost-effectiveness by shaping inter-agent communication topologies and se…
▽ More
Large Language Model (LLM)-based multi-agent systems (MAS) are becoming indispensable building blocks for web-scale applications such as web search, social network analytics, and online customer support, where cost-effectiveness is increasingly the primary constraint for large-scale deployment. While recent work improves MAS cost-effectiveness by shaping inter-agent communication topologies and selecting agent backbones, it rarely models and optimizes under explicit token-cost and latency budgets that reflect deployment constraints. This often leads to topology-first designs and suboptimal cost-effectiveness when budgets are binding. We present AgentBalance, a framework for constructing cost-effective MAS under explicit token-cost and latency budgets via a backbone-then-topology design. AgentBalance first performs backbone-oriented agent generation, constructing agents with heterogeneous backbones through LLM pool construction, pool selection, and role-backbone matching. It then performs adaptive MAS topology generation, guiding inter-agent communication via agent representation learning, gating, and latency-aware topology synthesis. Experiments on benchmarks with 14 candidate LLM backbones show that AgentBalance achieves up to 10% and 22% performance gains under matched token-cost and latency budgets, respectively, and yields strong AUC on performance-versus-budget curves across benchmarks. AgentBalance also functions as a plug-in for existing MAS, improving performance under the same token-cost and latency constraints, and it generalizes well to unseen LLMs for practical, budget-aware deployment. Code: https://github.com/usail-hkust/AgentBalance
△ Less
Submitted 12 December, 2025;
originally announced December 2025.
-
Checkerboard-type Zhang-Rice States in Overdoped Cuprate Superconductors
Authors:
Xiongfang Liu,
Kun Han,
Yan Peng,
Yuanjie Ning,
Jing Wu,
Zhaoyang Luo,
Difan Zhou,
Zhigang Zeng,
Qian He,
Chuanbing Cai,
Mark. B. H. Breese,
Ariando Ariando,
Chi Sin Tang,
George A. Sawatzky,
Mi Jiang,
Xinmao Yin
Abstract:
Cuprate superconductors remain central to condensed matter physics due to their technological relevance and unconventional, incompletely understood electronic behavior. While the canonical phase diagram and low-energy models have been shaped largely by studies of underdoped and moderately doped cuprates, the overdoped regime has received comparatively limited attention.Here, we track the evolution…
▽ More
Cuprate superconductors remain central to condensed matter physics due to their technological relevance and unconventional, incompletely understood electronic behavior. While the canonical phase diagram and low-energy models have been shaped largely by studies of underdoped and moderately doped cuprates, the overdoped regime has received comparatively limited attention.Here, we track the evolution of the electronic structure from optimal to heavy overdoping in La2-xSrxCuO4(LSCO) using broadband optical spectroscopy across x=0.15-0.60. The measured spectral changes--including the redistribution of Zhang-Rice-related spectral weigh--are in qualitative agreement with determinant quantum Monte Carlo simulations of the three-orbital Emery model, which together indicate a pronounced reconstruction of the electronic structure beyond hole concentrations x>0.2. Guided by these observations, we propose a spontaneous checkerboard-type Zhang-Rice electronic configuration that captures the coexistence of itinerant and localized carriers characteristic of the heavily overdoped state. Our results refine the doping-dependent Zhang-Rice-based framework for cuprates, illuminate how correlations persist deep into the overdoped regime, and provide new constraints on microscopic mechanisms of high-temperature superconductivity, with broader implications for correlated transition-metal oxides.
△ Less
Submitted 10 December, 2025;
originally announced December 2025.
-
HydroDCM: Hydrological Domain-Conditioned Modulation for Cross-Reservoir Inflow Prediction
Authors:
Pengfei Hu,
Fan Ming,
Xiaoxue Han,
Chang Lu,
Yue Ning,
Dan Lu
Abstract:
Deep learning models have shown promise in reservoir inflow prediction, yet their performance often deteriorates when applied to different reservoirs due to distributional differences, referred to as the domain shift problem. Domain generalization (DG) solutions aim to address this issue by extracting domain-invariant representations that mitigate errors in unseen domains. However, in hydrological…
▽ More
Deep learning models have shown promise in reservoir inflow prediction, yet their performance often deteriorates when applied to different reservoirs due to distributional differences, referred to as the domain shift problem. Domain generalization (DG) solutions aim to address this issue by extracting domain-invariant representations that mitigate errors in unseen domains. However, in hydrological settings, each reservoir exhibits unique inflow patterns, while some metadata beyond observations like spatial information exerts indirect but significant influence. This mismatch limits the applicability of conventional DG techniques to many-domain hydrological systems. To overcome these challenges, we propose HydroDCM, a scalable DG framework for cross-reservoir inflow forecasting. Spatial metadata of reservoirs is used to construct pseudo-domain labels that guide adversarial learning of invariant temporal features. During inference, HydroDCM adapts these features through light-weight conditioning layers informed by the target reservoir's metadata, reconciling DG's invariance with location-specific adaptation. Experiment results on 30 real-world reservoirs in the Upper Colorado River Basin demonstrate that our method substantially outperforms state-of-the-art DG baselines under many-domain conditions and remains computationally efficient.
△ Less
Submitted 2 December, 2025;
originally announced December 2025.
-
Quantum Chromatic Number of Subgraphs of Orthogonality Graphs and the Distance-2 Hamming Graph
Authors:
Tao Luo,
Yu Ning,
Xiande Zhang
Abstract:
The determination of the quantum chromatic number of graphs has attracted considerable attention recently. However, there are few families of graphs whose quantum chromatic numbers are determined. A notable exception is the family of orthogonality graphs, whose quantum chromatic numbers are fully determined. In this paper, we extend these results by determining the exact quantum chromatic number o…
▽ More
The determination of the quantum chromatic number of graphs has attracted considerable attention recently. However, there are few families of graphs whose quantum chromatic numbers are determined. A notable exception is the family of orthogonality graphs, whose quantum chromatic numbers are fully determined. In this paper, we extend these results by determining the exact quantum chromatic number of several subgraphs of the orthogonality graphs. Using the technique of combinatorial designs, we also determine the quantum chromatic number of the distance-2 Hamming graph, whose edges consist of binary vectors of Hamming distance 2, for infinitely many length.
△ Less
Submitted 30 November, 2025;
originally announced December 2025.
-
Attention-Guided Fair AI Modeling for Skin Cancer Diagnosis
Authors:
Mingcheng Zhu,
Mingxuan Liu,
Han Yuan,
Yilin Ning,
Zhiyao Luo,
Tingting Zhu,
Nan Liu
Abstract:
Artificial intelligence (AI) has shown remarkable promise in dermatology, offering accurate and non-invasive diagnosis of skin cancer. While extensive research has addressed skin tone-related bias, gender bias in dermatologic AI remains underexplored, leading to unequal care and reinforcing existing gender disparities. In this study, we developed LesionAttn, a fairness-aware algorithm that integra…
▽ More
Artificial intelligence (AI) has shown remarkable promise in dermatology, offering accurate and non-invasive diagnosis of skin cancer. While extensive research has addressed skin tone-related bias, gender bias in dermatologic AI remains underexplored, leading to unequal care and reinforcing existing gender disparities. In this study, we developed LesionAttn, a fairness-aware algorithm that integrates clinical knowledge into model design by directing attention toward lesion regions, mirroring the diagnostic focus of clinicians. Combined with Pareto-frontier optimization for dual-objective model selection, LesionAttn balances fairness and predictive accuracy. Validated on two large-scale dermatological datasets, LesionAttn significantly mitigates gender bias while maintaining high diagnostic performance, outperforming existing bias mitigation algorithms. Our study highlights the potential of embedding clinical knowledge into AI development to advance both model performance and fairness, and further to foster interdisciplinary collaboration between clinicians and AI developers.
△ Less
Submitted 26 November, 2025;
originally announced November 2025.
-
Adaptive Graph Learning with Transformer for Multi-Reservoir Inflow Prediction
Authors:
Pengfei Hu,
Ming Fan,
Xiaoxue Han,
Chang Lu,
Wei Zhang,
Hyun Kang,
Yue Ning,
Dan Lu
Abstract:
Reservoir inflow prediction is crucial for water resource management, yet existing approaches mainly focus on single-reservoir models that ignore spatial dependencies among interconnected reservoirs. We introduce AdaTrip as an adaptive, time-varying graph learning framework for multi-reservoir inflow forecasting. AdaTrip constructs dynamic graphs where reservoirs are nodes with directed edges refl…
▽ More
Reservoir inflow prediction is crucial for water resource management, yet existing approaches mainly focus on single-reservoir models that ignore spatial dependencies among interconnected reservoirs. We introduce AdaTrip as an adaptive, time-varying graph learning framework for multi-reservoir inflow forecasting. AdaTrip constructs dynamic graphs where reservoirs are nodes with directed edges reflecting hydrological connections, employing attention mechanisms to automatically identify crucial spatial and temporal dependencies. Evaluation on thirty reservoirs in the Upper Colorado River Basin demonstrates superiority over existing baselines, with improved performance for reservoirs with limited records through parameter sharing. Additionally, AdaTrip provides interpretable attention maps at edge and time-step levels, offering insights into hydrological controls to support operational decision-making. Our code is available at https://github.com/humphreyhuu/AdaTrip.
△ Less
Submitted 10 November, 2025;
originally announced November 2025.
-
Machine-Learning-Assisted Comparison of Regression Functions
Authors:
Jian Yan,
Zhuoxi Li,
Yang Ning,
Yong Chen
Abstract:
We revisit the classical problem of comparing regression functions, a fundamental question in statistical inference with broad relevance to modern applications such as data integration, transfer learning, and causal inference. Existing approaches typically rely on smoothing techniques and are thus hindered by the curse of dimensionality. We propose a generalized notion of kernel-based conditional…
▽ More
We revisit the classical problem of comparing regression functions, a fundamental question in statistical inference with broad relevance to modern applications such as data integration, transfer learning, and causal inference. Existing approaches typically rely on smoothing techniques and are thus hindered by the curse of dimensionality. We propose a generalized notion of kernel-based conditional mean dependence that provides a new characterization of the null hypothesis of equal regression functions. Building on this reformulation, we develop two novel tests that leverage modern machine learning methods for flexible estimation. We establish the asymptotic properties of the test statistics, which hold under both fixed- and high-dimensional regimes. Unlike existing methods that often require restrictive distributional assumptions, our framework only imposes mild moment conditions. The efficacy of the proposed tests is demonstrated through extensive numerical studies.
△ Less
Submitted 28 October, 2025;
originally announced October 2025.
-
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs
Authors:
Yucheng Ning,
Xixun Lin,
Fang Fang,
Yanan Cao
Abstract:
The widespread adoption of Large Language Models (LLMs) raises critical concerns about the factual accuracy of their outputs, especially in high-risk domains such as biomedicine, law, and education. Existing evaluation methods for short texts often fail on long-form content due to complex reasoning chains, intertwined perspectives, and cumulative information. To address this, we propose a systemat…
▽ More
The widespread adoption of Large Language Models (LLMs) raises critical concerns about the factual accuracy of their outputs, especially in high-risk domains such as biomedicine, law, and education. Existing evaluation methods for short texts often fail on long-form content due to complex reasoning chains, intertwined perspectives, and cumulative information. To address this, we propose a systematic approach integrating large-scale long-form datasets, multi-agent verification mechanisms, and weighted evaluation metrics. We construct LongHalluQA, a Chinese long-form factuality dataset; and develop MAD-Fact, a debate-based multi-agent verification system. We introduce a fact importance hierarchy to capture the varying significance of claims in long-form texts. Experiments on two benchmarks show that larger LLMs generally maintain higher factual consistency, while domestic models excel on Chinese content. Our work provides a structured framework for evaluating and enhancing factual reliability in long-form LLM outputs, guiding their safe deployment in sensitive domains.
△ Less
Submitted 29 October, 2025; v1 submitted 26 October, 2025;
originally announced October 2025.
-
Equitable Survival Prediction: A Fairness-Aware Survival Modeling (FASM) Approach
Authors:
Mingxuan Liu,
Yilin Ning,
Haoyuan Wang,
Chuan Hong,
Matthew Engelhard,
Danielle S. Bitterman,
William G. La Cava,
Nan Liu
Abstract:
As machine learning models become increasingly integrated into healthcare, structural inequities and social biases embedded in clinical data can be perpetuated or even amplified by data-driven models. In survival analysis, censoring and time dynamics can further add complexity to fair model development. Additionally, algorithmic fairness approaches often overlook disparities in cross-group ranking…
▽ More
As machine learning models become increasingly integrated into healthcare, structural inequities and social biases embedded in clinical data can be perpetuated or even amplified by data-driven models. In survival analysis, censoring and time dynamics can further add complexity to fair model development. Additionally, algorithmic fairness approaches often overlook disparities in cross-group rankings, e.g., high-risk Black patients may be ranked below lower-risk White patients who do not experience the event of mortality. Such misranking can reinforce biological essentialism and undermine equitable care. We propose a Fairness-Aware Survival Modeling (FASM), designed to mitigate algorithmic bias regarding both intra-group and cross-group risk rankings over time. Using breast cancer prognosis as a representative case and applying FASM to SEER breast cancer data, we show that FASM substantially improves fairness while preserving discrimination performance comparable to fairness-unaware survival models. Time-stratified evaluations show that FASM maintains stable fairness over a 10-year horizon, with the greatest improvements observed during the mid-term of follow-up. Our approach enables the development of survival models that prioritize both accuracy and equity in clinical decision-making, advancing fairness as a core principle in clinical care.
△ Less
Submitted 23 October, 2025;
originally announced October 2025.
-
A Conditional Diffusion Model for Probabilistic Prediction of Battery Capacity Degradation
Authors:
Hequn Li,
Zhongwei Deng,
Chunlin Jiang,
Yvxin He andZhansheng Ning
Abstract:
Accurate prediction of lithium-ion battery capacity and its associated uncertainty is essential for reliable battery management but remains challenging due to the stochastic nature of aging. This paper presents a novel method, termed the Condition Diffusion U-Net with Attention (CDUA), which integrates feature engineering and deep learning to address this challenge. The proposed approach employs a…
▽ More
Accurate prediction of lithium-ion battery capacity and its associated uncertainty is essential for reliable battery management but remains challenging due to the stochastic nature of aging. This paper presents a novel method, termed the Condition Diffusion U-Net with Attention (CDUA), which integrates feature engineering and deep learning to address this challenge. The proposed approach employs a diffusion-based generative model for time-series forecasting and incorporates attention mechanisms to enhance predictive performance. Battery capacity is first derived from real-world vehicle operation data. The most relevant features are then identified using the Pearson correlation coefficient and the XGBoost algorithm. These features are used to train the CDUA model, which comprises two core components: (1) a contextual U-Net with self-attention to capture complex temporal dependencies, and (2) a denoising network to reconstruct accurate capacity values from noisy observations. Experimental validation on the real-world vehicle data demonstrates that the proposed CDUA model achieves a relative Mean Absolute Error (MAE) of 0.94% and a relative Root Mean Square Error (RMSE) of 1.14%, with a narrow 95% confidence interval of 3.74% in relative width. These results confirm that CDUA provides both accurate capacity estimation and reliable uncertainty quantification. Comparative experiments further verify its robustness and superior performance over existing mainstream approaches.
△ Less
Submitted 20 October, 2025;
originally announced October 2025.
-
Improved Extended Kalman Filter-Based Disturbance Observers for Exoskeletons
Authors:
Shilei Li,
Dawei Shi,
Makoto Iwasaki,
Yan Ning,
Hongpeng Zhou,
Ling Shi
Abstract:
The nominal performance of mechanical systems is often degraded by unknown disturbances. A two-degree-of-freedom control structure can decouple nominal performance from disturbance rejection. However, perfect disturbance rejection is unattainable when the disturbance dynamic is unknown. In this work, we reveal an inherent trade-off in disturbance estimation subject to tracking speed and tracking u…
▽ More
The nominal performance of mechanical systems is often degraded by unknown disturbances. A two-degree-of-freedom control structure can decouple nominal performance from disturbance rejection. However, perfect disturbance rejection is unattainable when the disturbance dynamic is unknown. In this work, we reveal an inherent trade-off in disturbance estimation subject to tracking speed and tracking uncertainty. Then, we propose two novel methods to enhance disturbance estimation: an interacting multiple model extended Kalman filter-based disturbance observer and a multi-kernel correntropy extended Kalman filter-based disturbance observer. Experiments on an exoskeleton verify that the proposed two methods improve the tracking accuracy $36.3\%$ and $16.2\%$ in hip joint error, and $46.3\%$ and $24.4\%$ in knee joint error, respectively, compared to the extended Kalman filter-based disturbance observer, in a time-varying interaction force scenario, demonstrating the superiority of the proposed method.
△ Less
Submitted 17 October, 2025;
originally announced October 2025.
-
MedTrust-RAG: Evidence Verification and Trust Alignment for Biomedical Question Answering
Authors:
Yingpeng Ning,
Yuanyuan Sun,
Ling Luo,
Yanhua Wang,
Yuchen Pan,
Hongfei Lin
Abstract:
Biomedical question answering (QA) requires accurate interpretation of complex medical knowledge. Large language models (LLMs) have shown promising capabilities in this domain, with retrieval-augmented generation (RAG) systems enhancing performance by incorporating external medical literature. However, RAG-based approaches in biomedical QA suffer from hallucinations due to post-retrieval noise and…
▽ More
Biomedical question answering (QA) requires accurate interpretation of complex medical knowledge. Large language models (LLMs) have shown promising capabilities in this domain, with retrieval-augmented generation (RAG) systems enhancing performance by incorporating external medical literature. However, RAG-based approaches in biomedical QA suffer from hallucinations due to post-retrieval noise and insufficient verification of retrieved evidence, undermining response reliability. We propose MedTrust-Guided Iterative RAG, a framework designed to enhance factual consistency and mitigate hallucinations in medical QA. Our method introduces three key innovations. First, it enforces citation-aware reasoning by requiring all generated content to be explicitly grounded in retrieved medical documents, with structured Negative Knowledge Assertions used when evidence is insufficient. Second, it employs an iterative retrieval-verification process, where a verification agent assesses evidence adequacy and refines queries through Medical Gap Analysis until reliable information is obtained. Third, it integrates the MedTrust-Align Module (MTAM) that combines verified positive examples with hallucination-aware negative samples, leveraging Direct Preference Optimization to reinforce citation-grounded reasoning while penalizing hallucination-prone response patterns.
△ Less
Submitted 18 October, 2025; v1 submitted 16 October, 2025;
originally announced October 2025.
-
Gender Bias in Large Language Models for Healthcare: Assignment Consistency and Clinical Implications
Authors:
Mingxuan Liu,
Yuhe Ke,
Wentao Zhu,
Mayli Mertens,
Yilin Ning,
Jingchi Liao,
Chuan Hong,
Daniel Shu Wei Ting,
Yifan Peng,
Danielle S. Bitterman,
Marcus Eng Hock Ong,
Nan Liu
Abstract:
The integration of large language models (LLMs) into healthcare holds promise to enhance clinical decision-making, yet their susceptibility to biases remains a critical concern. Gender has long influenced physician behaviors and patient outcomes, raising concerns that LLMs assuming human-like roles, such as clinicians or medical educators, may replicate or amplify gender-related biases. Using case…
▽ More
The integration of large language models (LLMs) into healthcare holds promise to enhance clinical decision-making, yet their susceptibility to biases remains a critical concern. Gender has long influenced physician behaviors and patient outcomes, raising concerns that LLMs assuming human-like roles, such as clinicians or medical educators, may replicate or amplify gender-related biases. Using case studies from the New England Journal of Medicine Challenge (NEJM), we assigned genders (female, male, or unspecified) to multiple open-source and proprietary LLMs. We evaluated their response consistency across LLM-gender assignments regarding both LLM-based diagnosis and models' judgments on the clinical relevance or necessity of patient gender. In our findings, diagnoses were relatively consistent across LLM genders for most models. However, for patient gender's relevance and necessity in LLM-based diagnosis, all models demonstrated substantial inconsistency across LLM genders, particularly for relevance judgements. Some models even displayed a systematic female-male disparity in their interpretation of patient gender. These findings present an underexplored bias that could undermine the reliability of LLMs in clinical practice, underscoring the need for routine checks of identity-assignment consistency when interacting with LLMs to ensure reliable and equitable AI-supported clinical care.
△ Less
Submitted 7 October, 2025;
originally announced October 2025.
-
Agent Learning via Early Experience
Authors:
Kai Zhang,
Xiangchao Chen,
Bo Liu,
Tianci Xue,
Zeyi Liao,
Zhihan Liu,
Xiyao Wang,
Yuting Ning,
Zhaorun Chen,
Xiaohan Fu,
Jian Xie,
Yuxuan Sun,
Boyu Gou,
Qi Qi,
Zihang Meng,
Jianwei Yang,
Ning Zhang,
Xian Li,
Ashish Shah,
Dat Huynh,
Hengduo Li,
Zi Yang,
Sara Cao,
Lawrence Jang,
Shuyan Zhou
, et al. (5 additional authors not shown)
Abstract:
A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a r…
▽ More
A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm we study two strategies of using such data: (1) Implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) Self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. We evaluate across eight diverse environments and multiple model families. Our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, positioning it as a practical bridge between imitation learning and fully experience-driven agents.
△ Less
Submitted 13 October, 2025; v1 submitted 9 October, 2025;
originally announced October 2025.
-
LVLMs as inspectors: an agentic framework for category-level structural defect annotation
Authors:
Sheng Jiang,
Yuanmin Ning,
Bingxi Huang,
Peiyin Chen,
Zhaohui Chen
Abstract:
Automated structural defect annotation is essential for ensuring infrastructure safety while minimizing the high costs and inefficiencies of manual labeling. A novel agentic annotation framework, Agent-based Defect Pattern Tagger (ADPT), is introduced that integrates Large Vision-Language Models (LVLMs) with a semantic pattern matching module and an iterative self-questioning refinement mechanism.…
▽ More
Automated structural defect annotation is essential for ensuring infrastructure safety while minimizing the high costs and inefficiencies of manual labeling. A novel agentic annotation framework, Agent-based Defect Pattern Tagger (ADPT), is introduced that integrates Large Vision-Language Models (LVLMs) with a semantic pattern matching module and an iterative self-questioning refinement mechanism. By leveraging optimized domain-specific prompting and a recursive verification process, ADPT transforms raw visual data into high-quality, semantically labeled defect datasets without any manual supervision. Experimental results demonstrate that ADPT achieves up to 98% accuracy in distinguishing defective from non-defective images, and 85%-98% annotation accuracy across four defect categories under class-balanced settings, with 80%-92% accuracy on class-imbalanced datasets. The framework offers a scalable and cost-effective solution for high-fidelity dataset construction, providing strong support for downstream tasks such as transfer learning and domain adaptation in structural damage assessment.
△ Less
Submitted 1 October, 2025;
originally announced October 2025.
-
DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents
Authors:
Yansong Ning,
Rui Liu,
Jun Wang,
Kai Chen,
Wei Li,
Jun Fang,
Kan Zheng,
Naiqiang Tan,
Hao Liu
Abstract:
Travel planning (TP) agent has recently worked as an emerging building block to interact with external tools and resources for travel itinerary generation, ensuring enjoyable user experience. Despite its benefits, existing studies rely on hand craft prompt and fixed agent workflow, hindering more flexible and autonomous TP agent. This paper proposes DeepTravel, an end to end agentic reinforcement…
▽ More
Travel planning (TP) agent has recently worked as an emerging building block to interact with external tools and resources for travel itinerary generation, ensuring enjoyable user experience. Despite its benefits, existing studies rely on hand craft prompt and fixed agent workflow, hindering more flexible and autonomous TP agent. This paper proposes DeepTravel, an end to end agentic reinforcement learning framework for building autonomous travel planning agent, capable of autonomously planning, executing tools, and reflecting on tool responses to explore, verify, and refine intermediate actions in multi step reasoning. To achieve this, we first construct a robust sandbox environment by caching transportation, accommodation and POI data, facilitating TP agent training without being constrained by real world APIs limitations (e.g., inconsistent outputs). Moreover, we develop a hierarchical reward modeling system, where a trajectory level verifier first checks spatiotemporal feasibility and filters unsatisfied travel itinerary, and then the turn level verifier further validate itinerary detail consistency with tool responses, enabling efficient and precise reward service. Finally, we propose the reply augmented reinforcement learning method that enables TP agent to periodically replay from a failures experience buffer, emerging notable agentic capacity. We deploy trained TP agent on DiDi Enterprise Solutions App and conduct comprehensive online and offline evaluations, demonstrating that DeepTravel enables small size LLMs (e.g., Qwen3 32B) to significantly outperform existing frontier LLMs such as OpenAI o1, o3 and DeepSeek R1 in travel planning tasks.
△ Less
Submitted 26 September, 2025;
originally announced September 2025.
-
Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety
Authors:
Chi-Shian Dai,
Chao Ying,
Yang Ning,
Jiwei Zhao
Abstract:
Randomized controlled trials (RCTs) are widely regarded as the gold standard for causal inference in biomedical research. For instance, when estimating the average treatment effect on the treated (ATT), a doubly robust estimation procedure can be applied, requiring either the propensity score model or the control outcome model to be correctly specified. In this paper, we address scenarios where ex…
▽ More
Randomized controlled trials (RCTs) are widely regarded as the gold standard for causal inference in biomedical research. For instance, when estimating the average treatment effect on the treated (ATT), a doubly robust estimation procedure can be applied, requiring either the propensity score model or the control outcome model to be correctly specified. In this paper, we address scenarios where external control data, often with a much larger sample size, are available. Such data are typically easier to obtain from historical records or third-party sources. However, we find that incorporating external controls into the standard doubly robust estimator for ATT may paradoxically result in reduced efficiency compared to using the estimator without external controls. This counterintuitive outcome suggests that the naive incorporation of external controls could be detrimental to estimation efficiency. To resolve this issue, we propose a novel doubly robust estimator that guarantees higher efficiency than the standard approach without external controls, even under model misspecification. When all models are correctly specified, this estimator aligns with the standard doubly robust estimator that incorporates external controls and achieves semiparametric efficiency. The asymptotic theory developed in this work applies to high-dimensional confounder settings, which are increasingly common with the growing prevalence of electronic health record data. We demonstrate the effectiveness of our methodology through extensive simulation studies and a real-world data application.
△ Less
Submitted 24 September, 2025;
originally announced September 2025.
-
MARS: A Malignity-Aware Backdoor Defense in Federated Learning
Authors:
Wei Wan,
Yuxuan Ning,
Zhicong Huang,
Cheng Hong,
Shengshan Hu,
Ziqi Zhou,
Yechao Zhang,
Tianqing Zhu,
Wanlei Zhou,
Leo Yu Zhang
Abstract:
Federated Learning (FL) is a distributed paradigm aimed at protecting participant data privacy by exchanging model parameters to achieve high-quality model training. However, this distributed nature also makes FL highly vulnerable to backdoor attacks. Notably, the recently proposed state-of-the-art (SOTA) attack, 3DFed (SP2023), uses an indicator mechanism to determine whether the backdoor models…
▽ More
Federated Learning (FL) is a distributed paradigm aimed at protecting participant data privacy by exchanging model parameters to achieve high-quality model training. However, this distributed nature also makes FL highly vulnerable to backdoor attacks. Notably, the recently proposed state-of-the-art (SOTA) attack, 3DFed (SP2023), uses an indicator mechanism to determine whether the backdoor models have been accepted by the defender and adaptively optimizes backdoor models, rendering existing defenses ineffective. In this paper, we first reveal that the failure of existing defenses lies in the employment of empirical statistical measures that are loosely coupled with backdoor attacks. Motivated by this, we propose a Malignity-Aware backdooR defenSe (MARS) that leverages backdoor energy (BE) to indicate the malicious extent of each neuron. To amplify malignity, we further extract the most prominent BE values from each model to form a concentrated backdoor energy (CBE). Finally, a novel Wasserstein distance-based clustering method is introduced to effectively identify backdoor models. Extensive experiments demonstrate that MARS can defend against SOTA backdoor attacks and significantly outperforms existing defenses.
△ Less
Submitted 21 September, 2025;
originally announced September 2025.
-
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions
Authors:
Xixun Lin,
Yucheng Ning,
Jingwen Zhang,
Yan Dong,
Yilong Liu,
Yongxuan Wu,
Xiaohua Qi,
Nan Sun,
Yanmin Shang,
Kun Wang,
Pengfei Cao,
Qingyue Wang,
Lixin Zou,
Xu Chen,
Chuan Zhou,
Jia Wu,
Peng Zhang,
Qingsong Wen,
Shirui Pan,
Bin Wang,
Yanan Cao,
Kai Chen,
Songlin Hu,
Li Guo
Abstract:
Driven by the rapid advancements of Large Language Models (LLMs), LLM-based agents have emerged as powerful intelligent systems capable of human-like cognition, reasoning, and interaction. These agents are increasingly being deployed across diverse real-world applications, including student education, scientific research, and financial analysis. However, despite their remarkable potential, LLM-bas…
▽ More
Driven by the rapid advancements of Large Language Models (LLMs), LLM-based agents have emerged as powerful intelligent systems capable of human-like cognition, reasoning, and interaction. These agents are increasingly being deployed across diverse real-world applications, including student education, scientific research, and financial analysis. However, despite their remarkable potential, LLM-based agents remain vulnerable to hallucination issues, which can result in erroneous task execution and undermine the reliability of the overall system design. Addressing this critical challenge requires a deep understanding and a systematic consolidation of recent advances on LLM-based agents. To this end, we present the first comprehensive survey of hallucinations in LLM-based agents. By carefully analyzing the complete workflow of agents, we propose a new taxonomy that identifies different types of agent hallucinations occurring at different stages. Furthermore, we conduct an in-depth examination of eighteen triggering causes underlying the emergence of agent hallucinations. Through a detailed review of a large number of existing studies, we summarize approaches for hallucination mitigation and detection, and highlight promising directions for future research. We hope this survey will inspire further efforts toward addressing hallucinations in LLM-based agents, ultimately contributing to the development of more robust and reliable agent systems.
△ Less
Submitted 18 November, 2025; v1 submitted 23 September, 2025;
originally announced September 2025.
-
Fundamental theorem of transposed Poisson $(A,H)$-Hopf modules
Authors:
Yan Ning,
Daowei Lu,
Dingguo Wang
Abstract:
Transposed Poisson algebra was introduced as a dual notion of the Poisson algebra by switching the roles played by the commutative associative operation and Lie operation in the Leibniz rule defining the Poisson algebra. Let $H$ be a Hopf algebra with a bijective antipode and $A$ an $H$-comodule transposed Poisson algebra. Assume that there exists an $H$-colinear map which is also an algebra map f…
▽ More
Transposed Poisson algebra was introduced as a dual notion of the Poisson algebra by switching the roles played by the commutative associative operation and Lie operation in the Leibniz rule defining the Poisson algebra. Let $H$ be a Hopf algebra with a bijective antipode and $A$ an $H$-comodule transposed Poisson algebra. Assume that there exists an $H$-colinear map which is also an algebra map from $H$ to the transposed Poisson center of $A$. In this paper we generalize the fundamental theorem of $(A, H)$-Hopf modules to transposed Poisson $(A, H)$-Hopf modules and deduce relative projectivity in the category of transposed Poisson $(A, H)$-Hopf modules.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
RDIT: Residual-based Diffusion Implicit Models for Probabilistic Time Series Forecasting
Authors:
Chih-Yu Lai,
Yu-Chien Ning,
Duane S. Boning
Abstract:
Probabilistic Time Series Forecasting (PTSF) plays a critical role in domains requiring accurate and uncertainty-aware predictions for decision-making. However, existing methods offer suboptimal distribution modeling and suffer from a mismatch between training and evaluation metrics. Surprisingly, we found that augmenting a strong point estimator with a zero-mean Gaussian, whose standard deviation…
▽ More
Probabilistic Time Series Forecasting (PTSF) plays a critical role in domains requiring accurate and uncertainty-aware predictions for decision-making. However, existing methods offer suboptimal distribution modeling and suffer from a mismatch between training and evaluation metrics. Surprisingly, we found that augmenting a strong point estimator with a zero-mean Gaussian, whose standard deviation matches its training error, can yield state-of-the-art performance in PTSF. In this work, we propose RDIT, a plug-and-play framework that combines point estimation and residual-based conditional diffusion with a bidirectional Mamba network. We theoretically prove that the Continuous Ranked Probability Score (CRPS) can be minimized by adjusting to an optimal standard deviation and then derive algorithms to achieve distribution matching. Evaluations on eight multivariate datasets across varied forecasting horizons demonstrate that RDIT achieves lower CRPS, rapid inference, and improved coverage compared to strong baselines.
△ Less
Submitted 2 September, 2025;
originally announced September 2025.
-
G-HIVE: Parameter Estimation and Approximate Inference for Multivariate Response Generalized Linear Models with Hidden Variables
Authors:
Inbeom Lee,
Yang Ning
Abstract:
In practice, there often exist unobserved variables, also termed hidden variables, associated with both the response and covariates. Existing works in the literature mostly focus on linear regression with hidden variables. However, when the regression model is non-linear, the presence of hidden variables leads to new challenges in parameter identification, estimation, and statistical inference. Th…
▽ More
In practice, there often exist unobserved variables, also termed hidden variables, associated with both the response and covariates. Existing works in the literature mostly focus on linear regression with hidden variables. However, when the regression model is non-linear, the presence of hidden variables leads to new challenges in parameter identification, estimation, and statistical inference. This paper studies multivariate response generalized linear models (GLMs) with hidden variables. We propose a unified framework for parameter estimation and statistical inference called G-HIVE, short for 'G'eneralized - 'HI'dden 'V'ariable adjusted 'E'stimation. Specifically, based on factor model assumptions, we propose a modified quasi-likelihood approach to estimate an intermediate parameter, defined through a set of reweighted estimating equations. The key of our approach is to construct the proper weight, so that the first-order asymptotic bias of the estimator can be removed by orthogonal projection. Moreover, we propose an approximate inference framework for uncertainty quantification. Theoretically, we establish the first-order and second-order asymptotic bias and the convergence rate of our estimator. In addition, we characterize the accuracy of the Gaussian approximation of our estimator via the Berry-Esseen bound, which justifies the validity of the proposed approximate inference approach. Extensive simulations and real data analysis results show that G-HIVE is feasibly implementable and can outperform the baseline method that ignores hidden variables.
△ Less
Submitted 29 August, 2025;
originally announced September 2025.
-
Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation
Authors:
Xiaohan Wang,
Yang Ning
Abstract:
In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the geometric transformation of the Bayes decision boundary, our method reformulates the problem as a low-dimensional empirical risk minimization problem. Under mild…
▽ More
In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the geometric transformation of the Bayes decision boundary, our method reformulates the problem as a low-dimensional empirical risk minimization problem. Under mild regularity conditions, we establish the consistency of our estimators and derive the risk bounds. Moreover, we illustrate the broad applicability of our method by adapting it to the estimation of optimal individualized treatment rules. Extensive simulation studies and analyses of real-world data further demonstrate both superior performance and robustness of our approach.
△ Less
Submitted 28 August, 2025;
originally announced August 2025.
-
Atomistic understanding of hydrogen bubble-induced embrittlement in tungsten enabled by machine learning molecular dynamics
Authors:
Yu Bao,
Keke Song,
Jiahui Liu,
Yanzhou Wang,
Yifei Ning,
Penghua Ying,
Ping Qian
Abstract:
Hydrogen bubble formation within nanoscale voids is a critical mechanism underlying the embrittlement of metallic materials, yet its atomistic origins remains elusive. Here, we present an accurate and transferable machine-learned potential (MLP) for the tungsten-hydrogen binary system within the neuroevolution potential (NEP) framework, trained through active learning on extensive density function…
▽ More
Hydrogen bubble formation within nanoscale voids is a critical mechanism underlying the embrittlement of metallic materials, yet its atomistic origins remains elusive. Here, we present an accurate and transferable machine-learned potential (MLP) for the tungsten-hydrogen binary system within the neuroevolution potential (NEP) framework, trained through active learning on extensive density functional theory data. The developed NEP-WH model reproduces a wide range of lattice and defect properties in tungsten systems, as well as hydrogen solubility, with near first-principles accuracy, while retaining the efficiency of empirical potentials. Crucially, it is the first MLP capable of capturing hydrogen trapping and H\textsubscript{2} formation in nanovoids, with quantitative fidelity. Large-scale machine-learning molecular dynamics simulations reveal a distinct aggregation pathway where planar hydrogen clusters nucleate and grow along \{100\} planes near voids, with hexagonal close-packed structures emerging at their intersections. Under uniaxial tension, these aggregates promote bubble fracture and the development of regular \{100\} cracks, suppressing dislocation activity and resulting in brittle fracture behavior. This work provides detailed atomistic insights into hydrogen bubble evolution and fracture in nanovoids, enables predictive modeling of structural degradation in extreme environments, and advances fundamental understanding of hydrogen-induced damage in structural metals.
△ Less
Submitted 27 August, 2025;
originally announced August 2025.
-
FedSODA: Federated Fine-tuning of LLMs via Similarity Group Pruning and Orchestrated Distillation Alignment
Authors:
Manning Zhu,
Songtao Guo,
Pengzhan Zhou,
Yansong Ning,
Chang Han,
Dewen Qiao
Abstract:
Federated fine-tuning (FFT) of large language models (LLMs) has recently emerged as a promising solution to enable domain-specific adaptation while preserving data privacy. Despite its benefits, FFT on resource-constrained clients relies on the high computational and memory demands of full-model fine-tuning, which limits the potential advancement. This paper presents FedSODA, a resource-efficient…
▽ More
Federated fine-tuning (FFT) of large language models (LLMs) has recently emerged as a promising solution to enable domain-specific adaptation while preserving data privacy. Despite its benefits, FFT on resource-constrained clients relies on the high computational and memory demands of full-model fine-tuning, which limits the potential advancement. This paper presents FedSODA, a resource-efficient FFT framework that enables clients to adapt LLMs without accessing or storing the full model. Specifically, we first propose a similarity group pruning (SGP) module, which prunes redundant layers from the full LLM while retaining the most critical layers to preserve the model performance. Moreover, we introduce an orchestrated distillation alignment (ODA) module to reduce gradient divergence between the sub-LLM and the full LLM during FFT. Through the use of the QLoRA, clients only need to deploy quantized sub-LLMs and fine-tune lightweight adapters, significantly reducing local resource requirements. We conduct extensive experiments on three open-source LLMs across a variety of downstream tasks. The experimental results demonstrate that FedSODA reduces communication overhead by an average of 70.6%, decreases storage usage by 75.6%, and improves task accuracy by 3.1%, making it highly suitable for practical FFT applications under resource constraints.
△ Less
Submitted 18 August, 2025;
originally announced August 2025.
-
A dual AGN at z = 5.4 associated with a Lyman-alpha Nebula in the Center of a Cosmic Filament
Authors:
Qiong Li,
Christopher J. Conselice,
Qiao Duan,
Duncan Austin,
Tom Harvey,
Nathan Adams,
George Bendo,
Lewi Westcott,
Vadim Rusakov,
Zheng Cai,
Yuanhang Ning,
Shiwu Zhang
Abstract:
Predictions from current theories and simulations suggest that dual AGN systems are exceedingly rare at high redshifts. The intense radiation and powerful outflows from AGNs regulate star formation, heat the interstellar medium, and drive massive gas outflows that shape the host galaxy and its surroundings. One manifestation of AGN feedback is the creation of extended Ly$α$ nebulae. However, ident…
▽ More
Predictions from current theories and simulations suggest that dual AGN systems are exceedingly rare at high redshifts. The intense radiation and powerful outflows from AGNs regulate star formation, heat the interstellar medium, and drive massive gas outflows that shape the host galaxy and its surroundings. One manifestation of AGN feedback is the creation of extended Ly$α$ nebulae. However, identifying these systems at high-$z$ is challenging. Here, we report a remarkable dual AGN candidate at $z \sim 5.4$ using JWST NIRCam and NIRSpec, with a separation of $\sim1.7$ arcseconds ($\sim10.4$ pkpc). This is one of the highest spectroscopically confirmed redshift dual AGNs discovered. Photometric SED fitting shows excellent agreement with AGN templates, strongly suggesting a rare dual AGN system. BPT diagrams and high ionisation lines further support the presence of AGNs. VLT/MUSE observations reveal strong extended Ly$α$ emission, extending to $>22$ kpc, making it one of the most extended Ly$α$ nebulae at $z \sim 6$. This provides observational evidence of anisotropic AGN-driven photoionization or shocks. The high Ly$α$ escape fraction also indicates an AGN outflow. This dual AGN candidate is also associated with a well-defined overdensity, potentially at the center of a $z \sim 5.4$ protocluster or filamentary structure node. Further analysis indicates the fraction of dual AGNs is significantly higher than theoretically expected at high redshifts. This discovery provides a new opportunity to study dual AGN interactions and their impact on the circumgalactic medium and cosmic structure evolution.
△ Less
Submitted 29 August, 2025; v1 submitted 13 August, 2025;
originally announced August 2025.
-
Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering
Authors:
Yunfeng Ning,
Mayi Xu,
Jintao Wen,
Qiankun Pi,
Yuanyuan Zhu,
Ming Zhong,
Jiawei Jiang,
Tieyun Qian
Abstract:
LLMs often suffer from hallucinations and outdated or incomplete knowledge. RAG is proposed to address these issues by integrating external knowledge like that in KGs into LLMs. However, leveraging private KGs in RAG systems poses significant privacy risks due to the black-box nature of LLMs and potential insecure data transmission, especially when using third-party LLM APIs lacking transparency a…
▽ More
LLMs often suffer from hallucinations and outdated or incomplete knowledge. RAG is proposed to address these issues by integrating external knowledge like that in KGs into LLMs. However, leveraging private KGs in RAG systems poses significant privacy risks due to the black-box nature of LLMs and potential insecure data transmission, especially when using third-party LLM APIs lacking transparency and control. In this paper, we investigate the privacy-protected RAG scenario for the first time, where entities in KGs are anonymous for LLMs, thus preventing them from accessing entity semantics. Due to the loss of semantics of entities, previous RAG systems cannot retrieve question-relevant knowledge from KGs by matching questions with the meaningless identifiers of anonymous entities. To realize an effective RAG system in this scenario, two key challenges must be addressed: (1) How can anonymous entities be converted into retrievable information. (2) How to retrieve question-relevant anonymous entities. Hence, we propose a novel ARoG framework including relation-centric abstraction and structure-oriented abstraction strategies. For challenge (1), the first strategy abstracts entities into high-level concepts by dynamically capturing the semantics of their adjacent relations. It supplements meaningful semantics which can further support the retrieval process. For challenge (2), the second strategy transforms unstructured natural language questions into structured abstract concept paths. These paths can be more effectively aligned with the abstracted concepts in KGs, thereby improving retrieval performance. To guide LLMs to effectively retrieve knowledge from KGs, the two strategies strictly protect privacy from being exposed to LLMs. Experiments on three datasets demonstrate that ARoG achieves strong performance and privacy-robustness.
△ Less
Submitted 3 December, 2025; v1 submitted 12 August, 2025;
originally announced August 2025.
-
A low-rank solver for the Stokes-Darcy model with random hydraulic conductivity and Beavers-Joseph condition
Authors:
Yujun Zhu,
Yulan Ning,
Zhipeng Yang,
Xiaoming He,
Ju Ming
Abstract:
This paper proposes, analyzes, and demonstrates an efficient low-rank solver for the stochastic Stokes-Darcy interface model with a random hydraulic conductivity both in the porous media domain and on the interface. We consider three interface conditions with randomness, including the Beavers-Joseph interface condition with the random hydraulic conductivity, on the interface between the free flow…
▽ More
This paper proposes, analyzes, and demonstrates an efficient low-rank solver for the stochastic Stokes-Darcy interface model with a random hydraulic conductivity both in the porous media domain and on the interface. We consider three interface conditions with randomness, including the Beavers-Joseph interface condition with the random hydraulic conductivity, on the interface between the free flow and the porous media flow. Our solver employs a novel generalized low-rank approximation of the large-scale stiffness matrices, which can significantly cut down the computational costs and memory requirements associated with matrix inversion without losing accuracy. Therefore, by adopting a suitable data compression ratio, the low-rank solver can maintain a high numerical precision with relatively low computational and space complexities. We also propose a strategy to determine the best choice of data compression ratios. Furthermore, we carry out the error analysis of the generalized low-rank matrix approximation algorithm and the low-rank solver. Finally, numerical experiments are conducted to validate the proposed algorithms and the theoretical conclusions.
△ Less
Submitted 7 August, 2025;
originally announced August 2025.
-
Post-Hopf group algebras, Hopf group braces and Rota-Baxter operators on Hopf group algebras
Authors:
Yan Ning,
Xing Wang,
Daowei Lu
Abstract:
In this paper, we introduce the notions of Hopf group braces, post-Hopf group algebras and Rota-Baxter Hopf group algebras as important generalizations of Hopf brace, post Hopf algebra and Rota-Baxter Hopf algebras respectively. We also discuss their relationships. Explicitly under the condition of cocomutativity, Hopf group braces, post-Hopf group algebras could be mutually obtained, and Rota-Bax…
▽ More
In this paper, we introduce the notions of Hopf group braces, post-Hopf group algebras and Rota-Baxter Hopf group algebras as important generalizations of Hopf brace, post Hopf algebra and Rota-Baxter Hopf algebras respectively. We also discuss their relationships. Explicitly under the condition of cocomutativity, Hopf group braces, post-Hopf group algebras could be mutually obtained, and Rota-Baxter Hopf group algebras could lead to Hopf group braces.
△ Less
Submitted 1 August, 2025; v1 submitted 27 July, 2025;
originally announced July 2025.
-
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations
Authors:
Xiaohu Li,
Yunfeng Ning,
Zepeng Bao,
Mayi Xu,
Jianhao Chen,
Tieyun Qian
Abstract:
Security alignment enables the Large Language Model (LLM) to gain the protection against malicious queries, but various jailbreak attack methods reveal the vulnerability of this security mechanism. Previous studies have isolated LLM jailbreak attacks and defenses. We analyze the security protection mechanism of the LLM, and propose a framework that combines attack and defense. Our method is based…
▽ More
Security alignment enables the Large Language Model (LLM) to gain the protection against malicious queries, but various jailbreak attack methods reveal the vulnerability of this security mechanism. Previous studies have isolated LLM jailbreak attacks and defenses. We analyze the security protection mechanism of the LLM, and propose a framework that combines attack and defense. Our method is based on the linearly separable property of LLM intermediate layer embedding, as well as the essence of jailbreak attack, which aims to embed harmful problems and transfer them to the safe area. We utilize generative adversarial network (GAN) to learn the security judgment boundary inside the LLM to achieve efficient jailbreak attack and defense. The experimental results indicate that our method achieves an average jailbreak success rate of 88.85\% across three popular LLMs, while the defense success rate on the state-of-the-art jailbreak dataset reaches an average of 84.17\%. This not only validates the effectiveness of our approach but also sheds light on the internal security mechanisms of LLMs, offering new insights for enhancing model security The code and data are available at https://github.com/NLPGM/CAVGAN.
△ Less
Submitted 6 August, 2025; v1 submitted 8 July, 2025;
originally announced July 2025.
-
On subcodes of the generalized Reed-Solomon codes
Authors:
Yu Ning
Abstract:
In this paper, we study a class of subcodes of codimension $1$ in the $[n,k+1]_q$ generalized Reed-Solomon (GRS) codes, whose generator matrix is derived by removing the row of degree $k-r$ from the generator matrix of the $[n,k+1]_q$ GRS codes, where $1 \le r \le k-1$. We show equivalent characterizations for this class of subcodes of the GRS codes being self-dual or near-MDS, which extends the r…
▽ More
In this paper, we study a class of subcodes of codimension $1$ in the $[n,k+1]_q$ generalized Reed-Solomon (GRS) codes, whose generator matrix is derived by removing the row of degree $k-r$ from the generator matrix of the $[n,k+1]_q$ GRS codes, where $1 \le r \le k-1$. We show equivalent characterizations for this class of subcodes of the GRS codes being self-dual or near-MDS, which extends the results for $r=1$ in the literature. Along with these characterizations, families of self-dual near-MDS subcodes of the GRS codes are also proposed. Finally, for $r = 1,2$, the dual codes of the subcodes of the GRS codes are found out. In some cases, the subcodes of the GRS codes can be closed under taking dual codes. In other cases, the dual codes turn out to be the twisted GRS codes.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Large Language Model Powered Intelligent Urban Agents: Concepts, Capabilities, and Applications
Authors:
Jindong Han,
Yansong Ning,
Zirui Yuan,
Hang Ni,
Fan Liu,
Tengfei Lyu,
Hao Liu
Abstract:
The long-standing vision of intelligent cities is to create efficient, livable, and sustainable urban environments using big data and artificial intelligence technologies. Recently, the advent of Large Language Models (LLMs) has opened new ways toward realizing this vision. With powerful semantic understanding and reasoning capabilities, LLMs can be deployed as intelligent agents capable of autono…
▽ More
The long-standing vision of intelligent cities is to create efficient, livable, and sustainable urban environments using big data and artificial intelligence technologies. Recently, the advent of Large Language Models (LLMs) has opened new ways toward realizing this vision. With powerful semantic understanding and reasoning capabilities, LLMs can be deployed as intelligent agents capable of autonomously solving complex problems across domains. In this article, we focus on Urban LLM Agents, which are LLM-powered agents that are semi-embodied within the hybrid cyber-physical-social space of cities and used for system-level urban decision-making. First, we introduce the concept of urban LLM agents, discussing their unique capabilities and features. Second, we survey the current research landscape from the perspective of agent workflows, encompassing urban sensing, memory management, reasoning, execution, and learning. Third, we categorize the application domains of urban LLM agents into five groups: urban planning, transportation, environment, public safety, and urban society, presenting representative works in each group. Finally, we discuss trustworthiness and evaluation issues that are critical for real-world deployment, and identify several open problems for future research. This survey aims to establish a foundation for the emerging field of urban LLM agents and to provide a roadmap for advancing the intersection of LLMs and urban intelligence. A curated list of relevant papers and open-source resources is maintained and continuously updated at https://github.com/usail-hkust/Awesome-Urban-LLM-Agents.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.