-
RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs
Authors:
Liang Yao,
Shengxiang Xu,
Fan Liu,
Chuanyi Zhang,
Bishun Yao,
Rui Min,
Yongjun Li,
Chaoqian Ouyang,
Shimin Di,
Min-Ling Zhang
Abstract:
Earth Observation (EO) systems are essentially designed to support domain experts who often express their requirements through vague natural language rather than precise, machine-friendly instructions. Depending on the specific application scenario, these vague queries can demand vastly different levels of visual precision. Consequently, a practical EO AI system must bridge the gap between ambiguo…
▽ More
Earth Observation (EO) systems are essentially designed to support domain experts who often express their requirements through vague natural language rather than precise, machine-friendly instructions. Depending on the specific application scenario, these vague queries can demand vastly different levels of visual precision. Consequently, a practical EO AI system must bridge the gap between ambiguous human queries and the appropriate multi-granularity visual analysis tasks, ranging from holistic image interpretation to fine-grained pixel-wise predictions. While Multi-modal Large Language Models (MLLMs) demonstrate strong semantic understanding, their text-based output format is inherently ill-suited for dense, precision-critical spatial predictions. Existing agentic frameworks address this limitation by delegating tasks to external tools, but indiscriminate tool invocation is computationally inefficient and underutilizes the MLLM's native capabilities. To this end, we propose RemoteAgent, an agentic framework that strategically respects the intrinsic capability boundaries of MLLMs. To empower this framework to understand real user intents, we construct VagueEO, a human-centric instruction dataset pairing EO tasks with simulated vague natural-language queries. By leveraging VagueEO for reinforcement fine-tuning, we align an MLLM into a robust cognitive core that directly resolves image- and sparse region-level tasks. Consequently, RemoteAgent processes suitable tasks internally while intelligently orchestrating specialized tools via the Model Context Protocol exclusively for dense predictions. Extensive experiments demonstrate that RemoteAgent achieves robust intent recognition capabilities while delivering highly competitive performance across diverse EO tasks.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Modeling and Analysis for Joint Design of Communication and Control
Authors:
Xu Gan,
Chongjun Ouyang,
Yuanwei Liu
Abstract:
A unified analytical framework for joint design of communication and control (JDCC) is proposed. Within this framework, communication transmission delay and steady-state control variance are derived as the two fundamental JDCC performance metrics. The Pareto boundary is then established to characterize the optimal communication-control trade-off in JDCC systems. To further obtain closed-form expre…
▽ More
A unified analytical framework for joint design of communication and control (JDCC) is proposed. Within this framework, communication transmission delay and steady-state control variance are derived as the two fundamental JDCC performance metrics. The Pareto boundary is then established to characterize the optimal communication-control trade-off in JDCC systems. To further obtain closed-form expressions, their performance regions are derived under maximum-ratio transmission (MRT) and zero-forcing (ZF) beamforming. For system reliability evaluation, the communication-only and control-only outage probabilities are first derived. Based on these, the JDCC outage probability is defined to quantify the probability that the communication-delay and control-error requirements cannot be simultaneously satisfied. Its analytical expressions are then derived under both MRT and ZF schemes. Finally, numerical results validate the theoretical results and reveal that: (1) the Pareto boundary characterizes the trade-off frontier and performance limit of JDCC systems and (2) the JDCC reliability is jointly determined by the uplink-downlink closed-loop control and its coupling with communication.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
Authors:
Hang Xu,
Ling Yue,
Chaoqian Ouyang,
Yuchen Liu,
Libin Zheng,
Shaowu Pan,
Shimin Di,
Min-Ling Zhang
Abstract:
Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes their outputs sensitive to presentation quality and leaves them weak when the evidence needed for review lies in related work or released code. We present FactRev…
▽ More
Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes their outputs sensitive to presentation quality and leaves them weak when the evidence needed for review lies in related work or released code. We present FactReview, an evidence-grounded reviewing system that combines claim extraction, literature positioning, and execution-based claim verification. Given a submission, FactReview identifies major claims and reported results, retrieves nearby work to clarify the paper's technical position, and, when code is available, executes the released repository under bounded budgets to test central empirical claims. It then produces a concise review and an evidence report that assigns each major claim one of five labels: Supported, Supported by the paper, Partially supported, In conflict, or Inconclusive. In a case study on CompGCN, FactReview reproduces results that closely match those reported for link prediction and node classification, yet also shows that the paper's broader performance claim across tasks is not fully sustained: on MUTAG graph classification, the reproduced result is 88.4%, whereas the strongest baseline reported in the paper remains 92.6%. The claim is therefore only partially supported. More broadly, this case suggests that AI is most useful in peer review not as a final decision-maker, but as a tool for gathering evidence and helping reviewers produce more evidence-grounded assessments. The code is public at https://github.com/DEFENSE-SEU/Review-Assistant.
△ Less
Submitted 7 April, 2026; v1 submitted 5 April, 2026;
originally announced April 2026.
-
Low-complexity tuning of pinching-antenna systems for integrated sensing and communication
Authors:
Saba Asaad,
Chongjun Ouyang,
Zhiguo Ding,
Ali Bereyhi
Abstract:
Pinching antenna systems (PASSs) can dynamically adapt their transmit and receive arrays for sensing and communication in wireless systems. This work explores the potential of PASSs for integrated sensing and communication (ISAC) by proposing a novel PASS-aided ISAC design, in which pinching locations are adaptively adjusted to enable simultaneous sensing and data transmission with minimal interfe…
▽ More
Pinching antenna systems (PASSs) can dynamically adapt their transmit and receive arrays for sensing and communication in wireless systems. This work explores the potential of PASSs for integrated sensing and communication (ISAC) by proposing a novel PASS-aided ISAC design, in which pinching locations are adaptively adjusted to enable simultaneous sensing and data transmission with minimal interference. The proposed design introduces a bi-partitioning strategy that allocates sensing power and tunes pinching locations with remarkably low computational complexity, allowing dynamic PASS tuning at high update rates. Numerical results demonstrate that the proposed approach achieves a significantly larger sensing-communication rate region compared to baseline designs at no noticeable cost.
△ Less
Submitted 16 March, 2026;
originally announced March 2026.
-
ToolRosetta: Scalable Tool Access for Open-World Scientific Agents
Authors:
Shimin Di,
Xujie Yuan,
Hanghui Guo,
Chaoqian Ouyang,
Yongxu Liu,
Ling Yue,
Zhangze Chen,
Libin Zheng,
Jia Zhu,
Shaowu Pan,
Jian Yin,
Yong Rui,
Min-Ling Zhang
Abstract:
Large Language Model (LLM)-based agent systems are increasingly being used for scientific discovery, yet their practical capability remains constrained by a narrow and manually curated tool layer. Much scientific computational capability already exists in open-source repositories, software packages and APIs, but these resources remain difficult to standardize, operationalize and invoke reliably. H…
▽ More
Large Language Model (LLM)-based agent systems are increasingly being used for scientific discovery, yet their practical capability remains constrained by a narrow and manually curated tool layer. Much scientific computational capability already exists in open-source repositories, software packages and APIs, but these resources remain difficult to standardize, operationalize and invoke reliably. Here we present ToolRosetta, a framework that equips LLM-based agent systems with scalable, open-world computational access by automatically transforming heterogeneous computational programs into validated, callable tools. ToolRosetta integrates repository retrieval, tool standardization, execution testing, iterative repair and security-aware governance. Across 122 GitHub repositories spanning 35 subdisciplines in 6 domains, ToolRosetta standardizes 1,580 callable tools. These tools support an average verified task success rate of 84.0\% across domains and substantially enhance existing agentic AI systems, including OpenClaw, particularly on out-of-distribution tasks beyond fixed curated tool inventories.
△ Less
Submitted 10 April, 2026; v1 submitted 10 March, 2026;
originally announced March 2026.
-
IOTEL: A Tool for Generating IoT-enriched Object-Centric Event Logs
Authors:
Jia Wei,
Xin Su,
Chun Ouyang
Abstract:
Integrating Internet of Things (IoT) data with business process event logs is crucial for analysing IoT-enhanced processes, yet remains challenging due to differences in abstraction levels and the separation of data sources. Simply incorporating raw IoT data increases the size and complexity of the resulting log, often requiring additional processing before process analysis can be performed. While…
▽ More
Integrating Internet of Things (IoT) data with business process event logs is crucial for analysing IoT-enhanced processes, yet remains challenging due to differences in abstraction levels and the separation of data sources. Simply incorporating raw IoT data increases the size and complexity of the resulting log, often requiring additional processing before process analysis can be performed. While tools for generating IoT-enriched event logs exist, they either rely on specialised schemas or focus on extracting event logs from sensor data, offering limited support for integrating process-relevant IoT data into existing event logs. To address this gap, we present IOTEL, a tool for systematically generating IoT-enriched object-centric event logs (OCEL). By building on the OCEL schema, IOTEL enables structured IoT data integration compatible with existing process mining tools. It support practitioners and researchers in analysing IoT-enhanced business processes, as demonstrated in a real-world scenario. A video demonstrating the tool is available online.
△ Less
Submitted 8 March, 2026;
originally announced March 2026.
-
Compressed Proximal Federated Learning for Non-Convex Composite Optimization on Heterogeneous Data
Authors:
Pu Qiu,
Chen Ouyang,
Yongyang Xiong,
Keyou You,
Wanquan Liu,
Yang Shi
Abstract:
Federated Composite Optimization (FCO) has emerged as a promising framework for training models with structural constraints (e.g., sparsity) in distributed edge networks. However, simultaneously achieving communication efficiency and convergence robustness remains a significant challenge, particularly when dealing with non-smooth regularizers, statistical heterogeneity, and the restrictions of bia…
▽ More
Federated Composite Optimization (FCO) has emerged as a promising framework for training models with structural constraints (e.g., sparsity) in distributed edge networks. However, simultaneously achieving communication efficiency and convergence robustness remains a significant challenge, particularly when dealing with non-smooth regularizers, statistical heterogeneity, and the restrictions of biased compression. To address these issues, we propose FedCEF (Federated Composite Error Feedback), a novel algorithm tailored for non-convex FCO. FedCEF introduces a decoupled proximal update scheme that separates the proximal operator from communication, enabling clients to handle non-smooth terms locally while transmitting compressed information. To mitigate the noise from aggressive quantization and the bias from non-IID data, FedCEF integrates a rigorous error feedback mechanism with control variates. Furthermore, we design a communication-efficient pre-proximal downlink strategy that allows clients to exactly reconstruct global control variables without explicit transmission. We theoretically establish that FedCEF achieves sublinear convergence to a bounded residual error under general non-convexity, which is controllable via the step size and batch size. Extensive experiments on real datasets validate FedCEF maintains competitive model accuracy even under extreme compression ratios (e.g., 1%), significantly reducing the total communication volume compared to uncompressed baselines.
△ Less
Submitted 8 March, 2026;
originally announced March 2026.
-
On the Secrecy Performance of Continuous-Aperture Arrays Over Fading Channels
Authors:
Xuan Yang,
Chongjun Ouyang,
Dongming Li,
Yuanwei Liu
Abstract:
The secrecy performance of continuous-aperture array (CAPA)-based wiretap channels in terms of secrecy rate and secrecy outage probability (SOP) is analyzed. First, the system models of CAPA systems with maximum-ratio transmission under a Rayleigh fading channel are established, and approximate probability density functions for the legitimate user Bob's signal-to-noise ratio (SNR) and the eavesdro…
▽ More
The secrecy performance of continuous-aperture array (CAPA)-based wiretap channels in terms of secrecy rate and secrecy outage probability (SOP) is analyzed. First, the system models of CAPA systems with maximum-ratio transmission under a Rayleigh fading channel are established, and approximate probability density functions for the legitimate user Bob's signal-to-noise ratio (SNR) and the eavesdropper Eve's SNR are derived using Mercer's theorem and Landau's eigenvalue theorem. Three scenarios are considered, including a single Eve, multiple independent Eves, and multiple collaborative Eves. Next, the expressions of the secrecy rate and SOP under these three scenarios are derived, and the high-SNR slope, high-SNR power offset, diversity order, and array gain in Bob's high-SNR region are obtained. It is then theoretically proven that, in all three scenarios, the CAPA system achieves the same high-SNR slope and the same diversity order, with the latter being equal to the spatial degrees of freedom. Moreover, the CAPA system with a single Eve has the smallest high-SNR offset and the highest array gain, whereas the CAPA system with multiple collaborative Eves exhibits the largest high-SNR offset and the lowest array gain. Finally, the theoretical analyses of secrecy rate, SOP, high-SNR performance are validated by the simulation results, and a higher secrecy rate and a lower SOP are achieved by the CAPA systems compared to the spatially-discrete array systems with half-wavelength antenna spacing.
△ Less
Submitted 6 March, 2026;
originally announced March 2026.
-
A Novel Modular Cable-Driven Soft Robotic Arm with Multi-Segment Reconfigurability
Authors:
Moeen Ul Islam,
Cheng Ouyang,
Xinda Qi,
Azlan Zahid,
Xiaobo Tan,
Dong Chen
Abstract:
This paper presents a novel, modular, cable-driven soft robotic arm featuring multi-segment reconfigurability. The proposed architecture enables a stackable system with independent segment control, allowing scalable adaptation to diverse structural and application requirements. The system is fabricated from soft silicone material and incorporates embedded tendon-routing channels with a protective…
▽ More
This paper presents a novel, modular, cable-driven soft robotic arm featuring multi-segment reconfigurability. The proposed architecture enables a stackable system with independent segment control, allowing scalable adaptation to diverse structural and application requirements. The system is fabricated from soft silicone material and incorporates embedded tendon-routing channels with a protective dual-helical tendon structure. Experimental results showed that modular stacking substantially expanded the reachable workspace: relative to the single-segment arm, the three-segment configuration achieved up to a 13-fold increase in planar workspace area and a 38.9-fold increase in workspace volume. Furthermore, this study investigated the effect of silicone stiffness on actuator performance. The results revealed a clear trade-off between compliance and stiffness: softer silicone improved bending flexibility, while stiffer silicone improved structural rigidity and load-bearing stability. These results highlight the potential of stiffness tuning to balance compliance and strength for configuring scalable, reconfigurable soft robotic arms.
△ Less
Submitted 4 March, 2026; v1 submitted 2 March, 2026;
originally announced March 2026.
-
CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning
Authors:
Chunlei Meng,
Guanhong Huang,
Rong Fu,
Runmin Jian,
Zhongxue Gan,
Chun Ouyang
Abstract:
Multimodal learning aims to capture both shared and private information from multiple modalities. However, existing methods that project all modalities into a single latent space for fusion often overlook the asynchronous, multi-level semantic structure of multimodal data. This oversight induces semantic misalignment and error propagation, thereby degrading representation quality. To address this…
▽ More
Multimodal learning aims to capture both shared and private information from multiple modalities. However, existing methods that project all modalities into a single latent space for fusion often overlook the asynchronous, multi-level semantic structure of multimodal data. This oversight induces semantic misalignment and error propagation, thereby degrading representation quality. To address this issue, we propose Cross-Level Co-Representation (CLCR), which explicitly organizes each modality's features into a three-level semantic hierarchy and specifies level-wise constraints for cross-modal interactions. First, a semantic hierarchy encoder aligns shallow, mid, and deep features across modalities, establishing a common basis for interaction. And then, at each level, an Intra-Level Co-Exchange Domain (IntraCED) factorizes features into shared and private subspaces and restricts cross-modal attention to the shared subspace via a learnable token budget. This design ensures that only shared semantics are exchanged and prevents leakage from private channels. To integrate information across levels, the Inter-Level Co-Aggregation Domain (InterCAD) synchronizes semantic scales using learned anchors, selectively fuses the shared representations, and gates private cues to form a compact task representation. We further introduce regularization terms to enforce separation of shared and private features and to minimize cross-level interference. Experiments on six benchmarks spanning emotion recognition, event localization, sentiment analysis, and action recognition show that CLCR achieves strong performance and generalizes well across tasks.
△ Less
Submitted 23 February, 2026;
originally announced February 2026.
-
Tri-Subspaces Disentanglement for Multimodal Sentiment Analysis
Authors:
Chunlei Meng,
Jiabin Luo,
Zhenglin Yan,
Zhenyu Yu,
Rong Fu,
Zhongxue Gan,
Chun Ouyang
Abstract:
Multimodal Sentiment Analysis (MSA) integrates language, visual, and acoustic modalities to infer human sentiment. Most existing methods either focus on globally shared representations or modality-specific features, while overlooking signals that are shared only by certain modality pairs. This limits the expressiveness and discriminative power of multimodal representations. To address this limitat…
▽ More
Multimodal Sentiment Analysis (MSA) integrates language, visual, and acoustic modalities to infer human sentiment. Most existing methods either focus on globally shared representations or modality-specific features, while overlooking signals that are shared only by certain modality pairs. This limits the expressiveness and discriminative power of multimodal representations. To address this limitation, we propose a Tri-Subspace Disentanglement (TSD) framework that explicitly factorizes features into three complementary subspaces: a common subspace capturing global consistency, submodally-shared subspaces modeling pairwise cross-modal synergies, and private subspaces preserving modality-specific cues. To keep these subspaces pure and independent, we introduce a decoupling supervisor together with structured regularization losses. We further design a Subspace-Aware Cross-Attention (SACA) fusion module that adaptively models and integrates information from the three subspaces to obtain richer and more robust representations. Experiments on CMU-MOSI and CMU-MOSEI demonstrate that TSD achieves state-of-the-art performance across all key metrics, reaching 0.691 MAE on CMU-MOSI and 54.9% ACC-7 on CMU-MOSEI, and also transfers well to multimodal intent recognition tasks. Ablation studies confirm that tri-subspace disentanglement and SACA jointly enhance the modeling of multi-granular cross-modal sentiment cues.
△ Less
Submitted 23 February, 2026;
originally announced February 2026.
-
Deep Time-Series Models Meet Volatility: Multi-Horizon Electricity Price Forecasting in the Australian National Electricity Market
Authors:
Mohammed Osman Gani,
Zhipeng He,
Chun Ouyang,
Sara Khalifa
Abstract:
Accurate electricity price forecasting (EPF) is increasingly difficult in markets characterised by extreme volatility, frequent price spikes, and rapid structural shifts. Deep learning (DL) has been increasingly adopted in EPF due to its ability to achieve high forecasting accuracy. Recently, state-of-the-art (SOTA) deep time-series models have demonstrated promising performance across general for…
▽ More
Accurate electricity price forecasting (EPF) is increasingly difficult in markets characterised by extreme volatility, frequent price spikes, and rapid structural shifts. Deep learning (DL) has been increasingly adopted in EPF due to its ability to achieve high forecasting accuracy. Recently, state-of-the-art (SOTA) deep time-series models have demonstrated promising performance across general forecasting tasks. Yet, their effectiveness in highly volatile electricity markets remains underexplored. Moreover, existing EPF studies rarely assess how model accuracy varies across intraday periods, leaving model sensitivity to market conditions unexplored. To address these gaps, this paper proposes an EPF framework that systematically evaluates SOTA deep time-series models using a direct multi-horizon forecasting approach across day-ahead and two-day-ahead settings. We conduct a comprehensive empirical study across all five regions of the Australian National Electricity Market using contemporary, high-volatility data. The results reveal a clear gap between time-series benchmark expectations and observed performance under real-world price volatility: recent deep time-series models often fail to surpass standard DL baselines. All models experience substantial degradation under extreme and negative prices, yet DL baselines often remain competitive. Intraday performance analysis further reveals that all evaluated models are consistently vulnerable to prevailing market conditions, where absolute errors peak during evening ramps, relative errors escalate during midday negative-price periods, and directional accuracy deteriorates sharply during abrupt shifts in price direction. These findings emphasise the need for volatility-aware modelling strategies and richer feature representations to advance EPF.
△ Less
Submitted 13 February, 2026; v1 submitted 1 February, 2026;
originally announced February 2026.
-
Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis
Authors:
Chunlei Meng,
Ziyang Zhou,
Lucas He,
Xiaojing Du,
Chun Ouyang,
Zhongxue Gan
Abstract:
Multimodal Sentiment Analysis integrates Linguistic, Visual, and Acoustic. Mainstream approaches based on modality-invariant and modality-specific factorization or on complex fusion still rely on spatiotemporal mixed modeling. This ignores spatiotemporal heterogeneity, leading to spatiotemporal information asymmetry and thus limited performance. Hence, we propose TSDA, Temporal-Spatial Decouple be…
▽ More
Multimodal Sentiment Analysis integrates Linguistic, Visual, and Acoustic. Mainstream approaches based on modality-invariant and modality-specific factorization or on complex fusion still rely on spatiotemporal mixed modeling. This ignores spatiotemporal heterogeneity, leading to spatiotemporal information asymmetry and thus limited performance. Hence, we propose TSDA, Temporal-Spatial Decouple before Act, which explicitly decouples each modality into temporal dynamics and spatial structural context before any interaction. For every modality, a temporal encoder and a spatial encoder project signals into separate temporal and spatial body. Factor-Consistent Cross-Modal Alignment then aligns temporal features only with their temporal counterparts across modalities, and spatial features only with their spatial counterparts. Factor specific supervision and decorrelation regularization reduce cross factor leakage while preserving complementarity. A Gated Recouple module subsequently recouples the aligned streams for task. Extensive experiments show that TSDA outperforms baselines. Ablation analysis studies confirm the necessity and interpretability of the design.
△ Less
Submitted 20 January, 2026;
originally announced January 2026.
-
Pheromone-Focused Ant Colony Optimization algorithm for path planning
Authors:
Yi Liu,
Hongda Zhang,
Zhongxue Gan,
Yuning Chen,
Ziqing Zhou,
Chunlei Meng,
Chun Ouyang
Abstract:
Ant Colony Optimization (ACO) is a prominent swarm intelligence algorithm extensively applied to path planning. However, traditional ACO methods often exhibit shortcomings, such as blind search behavior and slow convergence within complex environments. To address these challenges, this paper proposes the Pheromone-Focused Ant Colony Optimization (PFACO) algorithm, which introduces three key strate…
▽ More
Ant Colony Optimization (ACO) is a prominent swarm intelligence algorithm extensively applied to path planning. However, traditional ACO methods often exhibit shortcomings, such as blind search behavior and slow convergence within complex environments. To address these challenges, this paper proposes the Pheromone-Focused Ant Colony Optimization (PFACO) algorithm, which introduces three key strategies to enhance the problem-solving ability of the ant colony. First, the initial pheromone distribution is concentrated in more promising regions based on the Euclidean distances of nodes to the start and end points, balancing the trade-off between exploration and exploitation. Second, promising solutions are reinforced during colony iterations to intensify pheromone deposition along high-quality paths, accelerating convergence while maintaining solution diversity. Third, a forward-looking mechanism is implemented to penalize redundant path turns, promoting smoother and more efficient solutions. These strategies collectively produce the focused pheromones to guide the ant colony's search, which enhances the global optimization capabilities of the PFACO algorithm, significantly improving convergence speed and solution quality across diverse optimization problems. The experimental results demonstrate that PFACO consistently outperforms comparative ACO algorithms in terms of convergence speed and solution quality.
△ Less
Submitted 12 January, 2026;
originally announced January 2026.
-
Secure Multiuser Beamforming With Movable Antenna Arrays
Authors:
Zhenqiao Cheng,
Chongjun Ouyang,
Boqun Zhao,
Xingqi Zhang
Abstract:
A movable antennas (MAs)-enabled secure multiuser transmission framework is developed to enhance physical-layer security. Novel expressions are derived to characterize the achievable sum secrecy rate based on the secure channel coding theorem. On this basis, a joint optimization algorithm for digital beamforming and MA placement is proposed to maximize the sum secrecy rate via fractional programmi…
▽ More
A movable antennas (MAs)-enabled secure multiuser transmission framework is developed to enhance physical-layer security. Novel expressions are derived to characterize the achievable sum secrecy rate based on the secure channel coding theorem. On this basis, a joint optimization algorithm for digital beamforming and MA placement is proposed to maximize the sum secrecy rate via fractional programming and block coordinate descent. In each iteration, every variable admits either a closed-form update or a low-complexity one-dimensional or bisection search, which yields an efficient implementation. Numerical results demonstrate the effectiveness of the proposed method and show that the MA-enabled design achieves higher secrecy rates than conventional fixed-position antenna arrays.
△ Less
Submitted 9 January, 2026;
originally announced January 2026.
-
Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with a Generalist Foundation Model and Multimodal Database
Authors:
Zi Wang,
Mingkai Huang,
Zhang Shi,
Hongjie Hu,
Lan Lan,
Hui Zhang,
Yan Li,
Xi Hu,
Qing Lu,
Zongming Zhu,
Qiong Yao,
Yuxiang Dai,
Fanwen Wang,
Yinzhe Wu,
Jun Lyu,
Qianqian Gao,
Guangming Xu,
Zhenxuan Zhang,
Haosen Zhang,
Qing Li,
Guangming Wang,
Tianxing He,
Lizhen Lan,
Siyue Li,
Le Xue
, et al. (39 additional authors not shown)
Abstract:
Multimodal cardiovascular magnetic resonance (CMR) imaging provides comprehensive and non-invasive insights into cardiovascular disease (CVD) diagnosis and underlying mechanisms. Despite decades of advancements, its widespread clinical adoption remains constrained by prolonged scan times and heterogeneity across medical environments. This underscores the urgent need for a generalist reconstruction…
▽ More
Multimodal cardiovascular magnetic resonance (CMR) imaging provides comprehensive and non-invasive insights into cardiovascular disease (CVD) diagnosis and underlying mechanisms. Despite decades of advancements, its widespread clinical adoption remains constrained by prolonged scan times and heterogeneity across medical environments. This underscores the urgent need for a generalist reconstruction foundation model for ultra-fast CMR imaging, one capable of adapting across diverse imaging scenarios and serving as the essential substrate for all downstream analyses. To enable this goal, we curate MMCMR-427K, the largest and most comprehensive multimodal CMR k-space database to date, comprising 427,465 multi-coil k-space data paired with structured metadata across 13 international centers, 12 CMR modalities, 15 scanners, and 17 CVD categories in populations across three continents. Building on this unprecedented resource, we introduce CardioMM, a generalist reconstruction foundation model capable of dynamically adapting to heterogeneous fast CMR imaging scenarios. CardioMM unifies semantic contextual understanding with physics-informed data consistency to deliver robust reconstructions across varied scanners, protocols, and patient presentations. Comprehensive evaluations demonstrate that CardioMM achieves state-of-the-art performance in the internal centers and exhibits strong zero-shot generalization to unseen external settings. Even at imaging acceleration up to 24x, CardioMM reliably preserves key cardiac phenotypes, quantitative myocardial biomarkers, and diagnostic image quality, enabling a substantial increase in CMR examination throughput without compromising clinical integrity. Together, our open-access MMCMR-427K database and CardioMM framework establish a scalable pathway toward high-throughput, high-quality, and clinically accessible cardiovascular imaging.
△ Less
Submitted 25 December, 2025;
originally announced December 2025.
-
Dynamic and Static Energy Efficient Design of Pinching Antenna Systems
Authors:
Saba Asaad,
Chongjun Ouyang,
Ali Bereyhi,
Zhiguo Ding
Abstract:
We study the energy efficiency of pinching-antenna systems (PASSs) by developing a consistent formulation for power distribution in these systems. The per-antenna power distribution in PASSs is not controlled explicitly by a power allocation policy, but rather implicitly through tuning of pinching couplings and locations. Both these factors are tunable: (i) pinching locations are tuned using movab…
▽ More
We study the energy efficiency of pinching-antenna systems (PASSs) by developing a consistent formulation for power distribution in these systems. The per-antenna power distribution in PASSs is not controlled explicitly by a power allocation policy, but rather implicitly through tuning of pinching couplings and locations. Both these factors are tunable: (i) pinching locations are tuned using movable elements, and (ii) couplings can be tuned by varying the effective coupling length of the pinching elements. While the former is feasible to be addressed dynamically in settings with low user mobility, the latter cannot be addressed at a high rate. We thus develop a class of hybrid dynamic-static algorithms, which maximize the energy efficiency by updating the system parameters at different rates. Our experimental results depict that dynamic tuning of pinching locations can significantly boost energy efficiency of PASSs.
△ Less
Submitted 15 February, 2026; v1 submitted 11 November, 2025;
originally announced November 2025.
-
Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Authors:
Cheng Ouyang,
Moeen Ul Islam,
Dong Chen,
Kaixiang Zhang,
Zhaojian Li,
Xiaobo Tan
Abstract:
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown s…
▽ More
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown success in other domains, its application to soft robots remains underexplored, particularly for three-dimensional (3D) soft robotic systems. This paper addresses this gap by developing and experimentally validating an effective DeePC framework on a 3D, cable-driven soft arm. Specifically, we design and fabricate a soft robotic arm with a thick tubing backbone for stability, a dense silicone body with large cavities for strength and flexibility, and rigid endcaps for secure termination. Using this platform, we implement DeePC with singular value decomposition (SVD)-based dimension reduction for two key control tasks: fixed-point regulation and trajectory tracking in 3D space. Comparative experiments with a baseline model-based controller demonstrate DeePC's superior accuracy, robustness, and adaptability, highlighting its potential as a practical solution for dynamic control of soft robots.
△ Less
Submitted 19 March, 2026; v1 submitted 9 October, 2025;
originally announced October 2025.
-
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
Authors:
Ziliang Wang,
Kang An,
Xuhui Zheng,
Faqiang Qian,
Weikun Zhang,
Cijun Ouyang,
Jialu Cai,
Yuhang Wang,
Yichao Wu
Abstract:
While search-augmented large language models (LLMs) exhibit impressive capabilities, their reliability in complex multi-hop reasoning remains limited. This limitation arises from three fundamental challenges: decomposition errors, where tasks are incorrectly broken down; retrieval missing, where key evidence fails to be retrieved; and reasoning errors, where flawed logic propagates through the rea…
▽ More
While search-augmented large language models (LLMs) exhibit impressive capabilities, their reliability in complex multi-hop reasoning remains limited. This limitation arises from three fundamental challenges: decomposition errors, where tasks are incorrectly broken down; retrieval missing, where key evidence fails to be retrieved; and reasoning errors, where flawed logic propagates through the reasoning chain. A single failure in any of these stages can derail the final answer. We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process. ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place, preventing defective logic from propagating through the reasoning chain. This targeted correction mechanism turns brittle reasoning into a more resilient process. Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle, with the 3B model achieving +8.48% EM and +11.56% F1, and the 7B model achieving +5.38% EM and +7.22% F1 over previous state-of-the-art(SOTA) results. These findings suggest that erasable reinforcement learning provides a powerful paradigm shift for robust multi-step reasoning in LLMs.
△ Less
Submitted 1 October, 2025;
originally announced October 2025.
-
FedAgentBench: Towards Automating Real-world Federated Medical Image Analysis with Server-Client LLM Agents
Authors:
Pramit Saha,
Joshua Strong,
Divyanshu Mishra,
Cheng Ouyang,
J. Alison Noble
Abstract:
Federated learning (FL) allows collaborative model training across healthcare sites without sharing sensitive patient data. However, real-world FL deployment is often hindered by complex operational challenges that demand substantial human efforts. This includes: (a) selecting appropriate clients (hospitals), (b) coordinating between the central server and clients, (c) client-level data pre-proces…
▽ More
Federated learning (FL) allows collaborative model training across healthcare sites without sharing sensitive patient data. However, real-world FL deployment is often hindered by complex operational challenges that demand substantial human efforts. This includes: (a) selecting appropriate clients (hospitals), (b) coordinating between the central server and clients, (c) client-level data pre-processing, (d) harmonizing non-standardized data and labels across clients, and (e) selecting FL algorithms based on user instructions and cross-client data characteristics. However, the existing FL works overlook these practical orchestration challenges. These operational bottlenecks motivate the need for autonomous, agent-driven FL systems, where intelligent agents at each hospital client and the central server agent collaboratively manage FL setup and model training with minimal human intervention. To this end, we first introduce an agent-driven FL framework that captures key phases of real-world FL workflows from client selection to training completion and a benchmark dubbed FedAgentBench that evaluates the ability of LLM agents to autonomously coordinate healthcare FL. Our framework incorporates 40 FL algorithms, each tailored to address diverse task-specific requirements and cross-client characteristics. Furthermore, we introduce a diverse set of complex tasks across 201 carefully curated datasets, simulating 6 modality-specific real-world healthcare environments, viz., Dermatoscopy, Ultrasound, Fundus, Histopathology, MRI, and X-Ray. We assess the agentic performance of 14 open-source and 10 proprietary LLMs spanning small, medium, and large model scales. While some agent cores such as GPT-4.1 and DeepSeek V3 can automate various stages of the FL pipeline, our results reveal that more complex, interdependent tasks based on implicit goals remain challenging for even the strongest models.
△ Less
Submitted 28 September, 2025;
originally announced September 2025.
-
Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation
Authors:
Biwen Lei,
Yang Li,
Xinhai Liu,
Shuhui Yang,
Lixin Xu,
Jingwei Huang,
Ruining Tang,
Haohan Weng,
Jian Liu,
Jing Xu,
Zhen Zhou,
Yiling Zhu,
Jiankai Xing,
Jiachen Xu,
Changfeng Ma,
Xinhao Yan,
Yunhan Yang,
Chunshi Wang,
Duoteng Xu,
Xueqi Ma,
Yuguang Chen,
Jing Li,
Mingxin Yang,
Sheng Zhang,
Yifei Feng
, et al. (75 additional authors not shown)
Abstract:
The creation of high-quality 3D assets, a cornerstone of modern game development, has long been characterized by labor-intensive and specialized workflows. This paper presents Hunyuan3D Studio, an end-to-end AI-powered content creation platform designed to revolutionize the game production pipeline by automating and streamlining the generation of game-ready 3D assets. At its core, Hunyuan3D Studio…
▽ More
The creation of high-quality 3D assets, a cornerstone of modern game development, has long been characterized by labor-intensive and specialized workflows. This paper presents Hunyuan3D Studio, an end-to-end AI-powered content creation platform designed to revolutionize the game production pipeline by automating and streamlining the generation of game-ready 3D assets. At its core, Hunyuan3D Studio integrates a suite of advanced neural modules (such as Part-level 3D Generation, Polygon Generation, Semantic UV, etc.) into a cohesive and user-friendly system. This unified framework allows for the rapid transformation of a single concept image or textual description into a fully-realized, production-quality 3D model complete with optimized geometry and high-fidelity PBR textures. We demonstrate that assets generated by Hunyuan3D Studio are not only visually compelling but also adhere to the stringent technical requirements of contemporary game engines, significantly reducing iteration time and lowering the barrier to entry for 3D content creation. By providing a seamless bridge from creative intent to technical asset, Hunyuan3D Studio represents a significant leap forward for AI-assisted workflows in game development and interactive media.
△ Less
Submitted 16 September, 2025;
originally announced September 2025.
-
Does DINOv3 Set a New Medical Vision Standard? Benchmarking 2D and 3D Classification, Segmentation, and Registration
Authors:
Che Liu,
Yinda Chen,
Haoyuan Shi,
Jinpeng Lu,
Bailiang Jian,
Jiazhen Pan,
Linghan Cai,
Jiayi Wang,
Jieming Yu,
Ziqi Gao,
Xiaoran Zhang,
Long Bai,
Yundi Zhang,
Jun Li,
Cosmin I. Bercea,
Cheng Ouyang,
Chen Chen,
Zhiwei Xiong,
Benedikt Wiestler,
Christian Wachinger,
James S. Duncan,
Daniel Rueckert,
Wenjia Bai,
Rossella Arcucci
Abstract:
The advent of large-scale vision foundation models, pre-trained on diverse natural images, has marked a paradigm shift in computer vision. However, how the frontier vision foundation models' efficacies transfer to specialised domains such as medical imaging remains an open question. This report investigates whether DINOv3, a state-of-the-art self-supervised vision transformer (ViT) pre-trained on…
▽ More
The advent of large-scale vision foundation models, pre-trained on diverse natural images, has marked a paradigm shift in computer vision. However, how the frontier vision foundation models' efficacies transfer to specialised domains such as medical imaging remains an open question. This report investigates whether DINOv3, a state-of-the-art self-supervised vision transformer (ViT) pre-trained on natural images, can directly serve as a powerful, unified encoder for medical vision tasks without domain-specific fine-tuning. To answer this, we benchmark DINOv3 across common medical vision tasks, including 2D and 3D classification, segmentation, and registration on a wide range of medical imaging modalities. We systematically analyse its scalability by varying model sizes and input image resolutions. Our findings reveal that DINOv3 shows impressive performance and establishes a formidable new baseline. Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks, despite being trained solely on natural images. However, we identify clear limitations: The model's features degrade in scenarios requiring deep domain specialisation, such as in whole-slide images (WSIs), electron microscopy (EM), and positron emission tomography (PET). Furthermore, we observe that DINOv3 does not consistently follow the scaling law in the medical domain. Its performance does not reliably increase with larger models or finer feature resolutions, showing diverse scaling behaviours across tasks. Overall, our work establishes DINOv3 as a strong baseline, whose powerful visual features can serve as a robust prior for multiple medical tasks. This opens promising future directions, such as leveraging its features to enforce multiview consistency in 3D reconstruction.
△ Less
Submitted 17 January, 2026; v1 submitted 8 September, 2025;
originally announced September 2025.
-
Code2MCP: Transforming Code Repositories into MCP Services
Authors:
Chaoqian Ouyang,
Ling Yue,
Shimin Di,
Libin Zheng,
Linan Yue,
Shaowu Pan,
Jian Yin,
Min-Ling Zhang
Abstract:
The Model Context Protocol (MCP) aims to create a standard for how Large Language Models use tools. However, most current research focuses on selecting tools from an existing pool. A more fundamental, yet largely overlooked, problem is how to populate this pool by converting the vast number of existing software projects into MCP-compatible services. To bridge this gap, we introduce Code2MCP, an ag…
▽ More
The Model Context Protocol (MCP) aims to create a standard for how Large Language Models use tools. However, most current research focuses on selecting tools from an existing pool. A more fundamental, yet largely overlooked, problem is how to populate this pool by converting the vast number of existing software projects into MCP-compatible services. To bridge this gap, we introduce Code2MCP, an agent-based framework that automatically transforms a GitHub repository into a functional MCP service with minimal human intervention. Code2MCP employs a multi-agent workflow for code analysis, environment setup, tool function design, and service generation, enhanced by a self-correcting loop to ensure reliability. We demonstrate that Code2MCP successfully transforms open-source computing libraries in scientific fields such as bioinformatics, mathematics, and fluid dynamics that are not available in existing MCP servers. By providing a novel automated pathway to unlock GitHub, the world's largest code repository, for the MCP ecosystem, Code2MCP serves as a catalyst to significantly accelerate the protocol's adoption and practical application. The code is public at https://github.com/DEFENSE-SEU/Code2MCP.
△ Less
Submitted 10 February, 2026; v1 submitted 7 September, 2025;
originally announced September 2025.
-
Multiport Network Modeling and Optimization for Reconfigurable Pinching-Antenna Systems
Authors:
Zhaolin Wang,
Jiaqi Xu,
Chongjun Ouyang,
Xidong Mu,
Yuanwei Liu
Abstract:
A reconfigurable pinching-antenna system (PASS) is presented, endowing pinching antennas (PAs) with both amplitude- and phase-controllable radiation beyond conventional implementations. To characterize this feature, a general and physically consistent model is established for PASS via multiport network theory. Within this model, the fundamental constraint of ideal reconfigurability of PAs is ident…
▽ More
A reconfigurable pinching-antenna system (PASS) is presented, endowing pinching antennas (PAs) with both amplitude- and phase-controllable radiation beyond conventional implementations. To characterize this feature, a general and physically consistent model is established for PASS via multiport network theory. Within this model, the fundamental constraint of ideal reconfigurability of PAs is identified, allowing the full control of signal amplitudes and phases. A practical directional-coupler (DC)-based PA model is then proposed, enabling both amplitude-only control and amplitude-constrained phase control. Beamforming optimization is investigated for both ideal and practical cases: an optimal solution is obtained for ideal PAs, whereas a high-quality iterative algorithm is developed for DC-based PAs. Numerical results suggest that in single-user scenarios: (i) with optimized PA positions, performance gains arise primarily from amplitude reconfigurability and DC-based PAs approach ideal performance, and (ii) with fixed PA positions, both amplitude and phase reconfigurability are critical and DC-based PAs incur non-negligible loss.
△ Less
Submitted 6 September, 2025;
originally announced September 2025.
-
A Multi-stage Low-latency Enhancement System for Hearing Aids
Authors:
Chengwei Ouyang,
Kexin Fei,
Haoshuai Zhou,
Congxi Lu,
Linkai Li
Abstract:
This paper proposes an end-to-end system for the ICASSP 2023 Clarity Challenge. In this work, we introduce four major novelties: (1) a novel multi-stage system in both the magnitude and complex domains to better utilize phase information; (2) an asymmetric window pair to achieve higher frequency resolution with the 5ms latency constraint; (3) the integration of head rotation information and the mi…
▽ More
This paper proposes an end-to-end system for the ICASSP 2023 Clarity Challenge. In this work, we introduce four major novelties: (1) a novel multi-stage system in both the magnitude and complex domains to better utilize phase information; (2) an asymmetric window pair to achieve higher frequency resolution with the 5ms latency constraint; (3) the integration of head rotation information and the mixture signals to achieve better enhancement; (4) a post-processing module that achieves higher hearing aid speech perception index (HASPI) scores with the hearing aid amplification stage provided by the baseline system.
△ Less
Submitted 6 August, 2025;
originally announced August 2025.
-
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models
Authors:
Jiazhen Pan,
Bailiang Jian,
Paul Hager,
Yundi Zhang,
Che Liu,
Friedrike Jungmann,
Hongwei Bran Li,
Chenyu You,
Junde Wu,
Jiayuan Zhu,
Fenglin Liu,
Yuyuan Liu,
Niklas Bubeck,
Christian Wachinger,
Chen,
Chen,
Zhenyu Gong,
Cheng Ouyang,
Georgios Kaissis,
Benedikt Wiestler,
Daniel Rueckert
Abstract:
Ensuring the safety and reliability of large language models (LLMs) in clinical practice is critical to prevent patient harm. However, LLMs are advancing so rapidly that static benchmarks quickly become obsolete or prone to overfitting, yielding a misleading picture of model trustworthiness. Here we introduce a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that continuously stress…
▽ More
Ensuring the safety and reliability of large language models (LLMs) in clinical practice is critical to prevent patient harm. However, LLMs are advancing so rapidly that static benchmarks quickly become obsolete or prone to overfitting, yielding a misleading picture of model trustworthiness. Here we introduce a Dynamic, Automatic, and Systematic (DAS) red-teaming framework that continuously stress-tests LLMs across four safety-critical axes: robustness, privacy, bias/fairness, and hallucination. Validated against board-certified clinicians with high concordance, a suite of adversarial agents autonomously mutates clinical test cases to uncover vulnerabilities in real time. Applying DAS to 15 proprietary and open-source LLMs revealed a profound gap between high static benchmark performance and low dynamic reliability - the ``Benchmarking Gap''. Despite median MedQA accuracy exceeding 80\%, 94\% of previously correct answers failed our dynamic robustness tests. Crucially, this brittleness generalized to the realistic, open-ended HealthBench dataset, where top-tier models exhibited failure rates exceeding 70\% and stark shifts in model rankings across evaluations, suggesting that high scores on established static benchmarks may reflect superficial memorization. We observed similarly high failure rates across other domains: privacy leaks were elicited in 86\% of scenarios, cognitive-bias priming altered clinical recommendations in 81\% of fairness tests, and we identified hallucination rates exceeding 74\% in widely used models. By converting medical LLM safety analysis from a static checklist into a dynamic stress-test, DAS provides a foundational, scalable, and living platform to surface the latent risks that must be addressed before the next generation of medical AI can be safely deployed.
△ Less
Submitted 9 March, 2026; v1 submitted 30 July, 2025;
originally announced August 2025.
-
Topology Optimization in Medical Image Segmentation with Fast Euler Characteristic
Authors:
Liu Li,
Qiang Ma,
Cheng Ouyang,
Johannes C. Paetzold,
Daniel Rueckert,
Bernhard Kainz
Abstract:
Deep learning-based medical image segmentation techniques have shown promising results when evaluated based on conventional metrics such as the Dice score or Intersection-over-Union. However, these fully automatic methods often fail to meet clinically acceptable accuracy, especially when topological constraints should be observed, e.g., continuous boundaries or closed surfaces. In medical image se…
▽ More
Deep learning-based medical image segmentation techniques have shown promising results when evaluated based on conventional metrics such as the Dice score or Intersection-over-Union. However, these fully automatic methods often fail to meet clinically acceptable accuracy, especially when topological constraints should be observed, e.g., continuous boundaries or closed surfaces. In medical image segmentation, the correctness of a segmentation in terms of the required topological genus sometimes is even more important than the pixel-wise accuracy. Existing topology-aware approaches commonly estimate and constrain the topological structure via the concept of persistent homology (PH). However, these methods are difficult to implement for high dimensional data due to their polynomial computational complexity. To overcome this problem, we propose a novel and fast approach for topology-aware segmentation based on the Euler Characteristic ($χ$). First, we propose a fast formulation for $χ$ computation in both 2D and 3D. The scalar $χ$ error between the prediction and ground-truth serves as the topological evaluation metric. Then we estimate the spatial topology correctness of any segmentation network via a so-called topological violation map, i.e., a detailed map that highlights regions with $χ$ errors. Finally, the segmentation results from the arbitrary network are refined based on the topological violation maps by a topology-aware correction network. Our experiments are conducted on both 2D and 3D datasets and show that our method can significantly improve topological correctness while preserving pixel-wise segmentation accuracy.
△ Less
Submitted 5 August, 2025; v1 submitted 31 July, 2025;
originally announced July 2025.
-
Extreme Cardiac MRI Analysis under Respiratory Motion: Results of the CMRxMotion Challenge
Authors:
Kang Wang,
Chen Qin,
Zhang Shi,
Haoran Wang,
Xiwen Zhang,
Chen Chen,
Cheng Ouyang,
Chengliang Dai,
Yuanhan Mo,
Chenchen Dai,
Xutong Kuang,
Ruizhe Li,
Xin Chen,
Xiuzheng Yue,
Song Tian,
Alejandro Mora-Rubio,
Kumaradevan Punithakumar,
Shizhan Gong,
Qi Dou,
Sina Amirrajab,
Yasmina Al Khalil,
Cian M. Scannell,
Lexiaozi Fan,
Huili Yang,
Xiaowu Sun
, et al. (24 additional authors not shown)
Abstract:
Deep learning models have achieved state-of-the-art performance in automated Cardiac Magnetic Resonance (CMR) analysis. However, the efficacy of these models is highly dependent on the availability of high-quality, artifact-free images. In clinical practice, CMR acquisitions are frequently degraded by respiratory motion, yet the robustness of deep learning models against such artifacts remains an…
▽ More
Deep learning models have achieved state-of-the-art performance in automated Cardiac Magnetic Resonance (CMR) analysis. However, the efficacy of these models is highly dependent on the availability of high-quality, artifact-free images. In clinical practice, CMR acquisitions are frequently degraded by respiratory motion, yet the robustness of deep learning models against such artifacts remains an underexplored problem. To promote research in this domain, we organized the MICCAI CMRxMotion challenge. We curated and publicly released a dataset of 320 CMR cine series from 40 healthy volunteers who performed specific breathing protocols to induce a controlled spectrum of motion artifacts. The challenge comprised two tasks: 1) automated image quality assessment to classify images based on motion severity, and 2) robust myocardial segmentation in the presence of motion artifacts. A total of 22 algorithms were submitted and evaluated on the two designated tasks. This paper presents a comprehensive overview of the challenge design and dataset, reports the evaluation results for the top-performing methods, and further investigates the impact of motion artifacts on five clinically relevant biomarkers. All resources and code are publicly available at: https://github.com/CMRxMotion
△ Less
Submitted 25 July, 2025;
originally announced July 2025.
-
Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data
Authors:
Zhipeng He,
Alexander Stevens,
Chun Ouyang,
Johannes De Smedt,
Alistair Barros,
Catarina Moreira
Abstract:
Adversarial attacks on tabular data present unique challenges due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define imperceptible modifications. Additionally, traditional gradient-based methods prioritise $\ell_p$-norm constraints…
▽ More
Adversarial attacks on tabular data present unique challenges due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define imperceptible modifications. Additionally, traditional gradient-based methods prioritise $\ell_p$-norm constraints, often producing adversarial examples that deviate from the original data distributions. To address this, we propose a latent-space perturbation framework using a mixed-input Variational Autoencoder (VAE) to generate statistically consistent adversarial examples. The proposed VAE integrates categorical embeddings and numerical features into a unified latent manifold, enabling perturbations that preserve statistical consistency. We introduce In-Distribution Success Rate (IDSR) to jointly evaluate attack effectiveness and distributional alignment. Evaluation across six publicly available datasets and three model architectures demonstrates that our method achieves substantially lower outlier rates and more consistent performance compared to traditional input-space attacks and other VAE-based methods adapted from image domain approaches, achieving substantially lower outlier rates and higher IDSR across six datasets and three model architectures. Our comprehensive analyses of hyperparameter sensitivity, sparsity control, and generative architecture demonstrate that the effectiveness of VAE-based attacks depends strongly on reconstruction quality and the availability of sufficient training data. When these conditions are met, the proposed framework achieves superior practical utility and stability compared with input-space methods. This work underscores the importance of maintaining on-manifold perturbations for generating realistic and robust adversarial examples in tabular domains.
△ Less
Submitted 21 November, 2025; v1 submitted 15 July, 2025;
originally announced July 2025.
-
Photonic Rails in ML Datacenters
Authors:
Eric Ding,
Chuhan Ouyang,
Rachee Singh
Abstract:
Rail-optimized network fabrics have become the de facto datacenter scale-out fabric for large-scale ML training. However, the use of high-radix electrical switches to provide all-to-all connectivity in rails imposes massive power, cost, and complexity overheads. We propose a rethinking of the rail abstraction by retaining its communication semantics, but realizing it using optical circuit switches…
▽ More
Rail-optimized network fabrics have become the de facto datacenter scale-out fabric for large-scale ML training. However, the use of high-radix electrical switches to provide all-to-all connectivity in rails imposes massive power, cost, and complexity overheads. We propose a rethinking of the rail abstraction by retaining its communication semantics, but realizing it using optical circuit switches. The key challenge is that optical switches support only one-to-one connectivity at a time, limiting the fan-out of traffic in ML workloads using hybrid parallelisms. We introduce parallelism-driven rail reconfiguration as a solution that leverages the sequential ordering between traffic from different parallelisms. We design a control plane, Opus, to enable time-multiplexed emulation of electrical rail switches using optical switches. More broadly, our work discusses a new research agenda: datacenter fabrics that co-evolve with the model parallelism dimensions within each job, as opposed to the prevailing mindset of reconfiguring networks before a job begins.
△ Less
Submitted 10 July, 2025;
originally announced July 2025.
-
Capacity Characterization of Pinching-Antenna Systems
Authors:
Chongjun Ouyang,
Zhaolin Wang,
Yuanwei Liu,
Zhiguo Ding
Abstract:
Unlike conventional systems using a fixed-location antenna, the channel capacity of the pinching-antenna system (PASS) is determined by the activated positions of pinching antennas. This article characterizes the capacity region of multiuser PASS, where a single pinched waveguide is deployed to enable both uplink and downlink communications. The capacity region of the uplink channel is first chara…
▽ More
Unlike conventional systems using a fixed-location antenna, the channel capacity of the pinching-antenna system (PASS) is determined by the activated positions of pinching antennas. This article characterizes the capacity region of multiuser PASS, where a single pinched waveguide is deployed to enable both uplink and downlink communications. The capacity region of the uplink channel is first characterized. \romannumeral1) For the single-pinch case, closed-form expressions are derived for the optimal antenna activation position, along with the corresponding capacity region and the achievable data rate regions under time-division multiple access (TDMA) and frequency-division multiple access (FDMA). It is proven that the capacity region of PASS encompasses that of conventional fixed-antenna systems, and that the FDMA rate region contains the TDMA rate region. \romannumeral2) For the multiple-pinch case, inner and outer bounds on the capacity region are derived using an element-wise alternating antenna position optimization technique and the Cauchy-Schwarz inequality, respectively. The achievable FDMA rate region is also derived using the same optimization framework, while the TDMA rate region is obtained through an antenna position refinement approach. The analysis is then extended to the downlink PASS using the uplink-downlink duality framework. It is proven that the relationships among the downlink capacity and rate regions are consistent with those in the uplink case. Numerical results demonstrate that: \romannumeral1) the derived bounds closely approximate the exact capacity region, \romannumeral2) PASS yields a significantly enlarged capacity region compared to conventional fixed-antenna systems, and \romannumeral3) in the multiple-pinch case, TDMA and FDMA are capable of approaching the channel capacity limit.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Spectral Efficiency Maximization for DMA-enabled Multiuser MISO with Statistical CSI
Authors:
Hao Xu,
Boyu Ning,
Chongjun Ouyang,
Hongwen Yang
Abstract:
Dynamic metasurface antennas (DMAs) offer the potential to achieve large-scale antenna arrays with low power consumption and reduced hardware costs, making them a promising technology for future communication systems. This paper investigates the spectral efficiency (SE) of DMA-enabled multiuser multiple-input single-output (MISO) systems in both uplink and downlink transmissions, using only statis…
▽ More
Dynamic metasurface antennas (DMAs) offer the potential to achieve large-scale antenna arrays with low power consumption and reduced hardware costs, making them a promising technology for future communication systems. This paper investigates the spectral efficiency (SE) of DMA-enabled multiuser multiple-input single-output (MISO) systems in both uplink and downlink transmissions, using only statistical channel state information (CSI) to maximize the ergodic sum rate of multiple users. For the uplink system, we consider two decoding rules: minimum mean square error (MMSE) with and without successive interference cancellation (SIC). For both decoders, we derive closed-form surrogates to substitute the original expressions of ergodic sum rate and formulate tractable optimization problems for designing DMA weights. Then, a weighted MMSE (WMMSE)-based algorithm is proposed to maximize the ergodic sum rate. For the downlink system, we derive an approximate expression for the ergodic sum rate and formulate a hybrid analog/digital beamforming optimization problem that jointly optimizes the digital precoder and DMA weights. A penalty dual decomposition (PDD)-based algorithm is proposed by leveraging the fractional programming framework. Numerical results validate the accuracy of the derived surrogates and highlight the superiority of the proposed algorithms over baseline schemes. It is shown that these algorithms are effective across various DMA settings and are particularly well-suited for system design in fast time-varying channels.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data
Authors:
Zhipeng He,
Chun Ouyang,
Lijie Wen,
Cong Liu,
Catarina Moreira
Abstract:
Adversarial attacks pose a significant threat to machine learning models by inducing incorrect predictions through imperceptible perturbations to input data. While these attacks are well studied in unstructured domains such as images, their behaviour on tabular data remains underexplored due to mixed feature types and complex inter-feature dependencies. This study introduces a comprehensive benchm…
▽ More
Adversarial attacks pose a significant threat to machine learning models by inducing incorrect predictions through imperceptible perturbations to input data. While these attacks are well studied in unstructured domains such as images, their behaviour on tabular data remains underexplored due to mixed feature types and complex inter-feature dependencies. This study introduces a comprehensive benchmark that evaluates adversarial attacks on tabular datasets with respect to both effectiveness and imperceptibility. We assess five white-box attack algorithms (FGSM, BIM, PGD, DeepFool, and C\&W) across four representative models (LR, MLP, TabTransformer and FT-Transformer) using eleven datasets spanning finance, energy, and healthcare domains. The benchmark employs four quantitative imperceptibility metrics (proximity, sparsity, deviation, and sensitivity) to characterise perturbation realism. The analysis quantifies the trade-off between these two aspects and reveals consistent differences between attack types, with $\ell_\infty$-based attacks achieving higher success but lower subtlety, and $\ell_2$-based attacks offering more realistic perturbations. The benchmark findings offer actionable insights for designing more imperceptible adversarial attacks, advancing the understanding of adversarial vulnerability in tabular machine learning.
△ Less
Submitted 12 October, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
Curation and Analysis of MIMICEL -- An Event Log for MIMIC-IV Emergency Department
Authors:
Jia Wei,
Chun Ouyang,
Bemali Wickramanayake,
Zhipeng He,
Keshara Perera,
Catarina Moreira
Abstract:
The global issue of overcrowding in emergency departments (ED) necessitates the analysis of patient flow through ED to enhance efficiency and alleviate overcrowding. However, traditional analytical methods are time-consuming and costly. The healthcare industry is embracing process mining tools to analyse healthcare processes and patient flows. Process mining aims to discover, monitor, and enhance…
▽ More
The global issue of overcrowding in emergency departments (ED) necessitates the analysis of patient flow through ED to enhance efficiency and alleviate overcrowding. However, traditional analytical methods are time-consuming and costly. The healthcare industry is embracing process mining tools to analyse healthcare processes and patient flows. Process mining aims to discover, monitor, and enhance processes by obtaining knowledge from event log data. However, the availability of event logs is a prerequisite for applying process mining techniques. Hence, this paper aims to generate an event log for analysing processes in ED. In this study, we extract an event log from the MIMIC-IV-ED dataset and name it MIMICEL. MIMICEL captures the process of patient journey in ED, allowing for analysis of patient flows and improving ED efficiency. We present analyses conducted using MIMICEL to demonstrate the utility of the dataset. The curation of MIMICEL facilitates extensive use of MIMIC-IV-ED data for ED analysis using process mining techniques, while also providing the process mining research communities with a valuable dataset for study.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
Authors:
Ziliang Wang,
Xuhui Zheng,
Kang An,
Cijun Ouyang,
Jialu Cai,
Yuhang Wang,
Yichao Wu
Abstract:
Efficient multi-hop reasoning requires Large Language Models (LLMs) based agents to acquire high-value external knowledge iteratively. Previous work has explored reinforcement learning (RL) to train LLMs to perform search-based document retrieval, achieving notable improvements in QA performance, but underperform on complex, multi-hop QA resulting from the sparse rewards from global signal only. T…
▽ More
Efficient multi-hop reasoning requires Large Language Models (LLMs) based agents to acquire high-value external knowledge iteratively. Previous work has explored reinforcement learning (RL) to train LLMs to perform search-based document retrieval, achieving notable improvements in QA performance, but underperform on complex, multi-hop QA resulting from the sparse rewards from global signal only. To address this gap in existing research, we introduce StepSearch, a framework for search LLMs that trained with step-wise proximal policy optimization method. It consists of richer and more detailed intermediate search rewards and token-level process supervision based on information gain and redundancy penalties to better guide each search step. We constructed a fine-grained question-answering dataset containing sub-question-level search trajectories based on open source datasets through a set of data pipeline method. On standard multi-hop QA benchmarks, it significantly outperforms global-reward baselines, achieving 11.2% and 4.2% absolute improvements for 3B and 7B models over various search with RL baselines using only 19k training data, demonstrating the effectiveness of fine-grained, stepwise supervision in optimizing deep search LLMs. Our code will be released on https://github.com/Zillwang/StepSearch.
△ Less
Submitted 26 May, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
Assessing the Robustness and Reducibility of Multiplex Networks with Embedding-Aided Interlayer Similarities
Authors:
Haoran Nan,
Senquan Wang,
Chun Ouyang,
Yanchen Zhou,
Weiwei Gu
Abstract:
The study of interlayer similarity of multiplex networks helps to understand the intrinsic structure of complex systems, revealing how changes in one layer can propagate and affect others, thus enabling broad implications for transportation, social, and biological systems. Existing algorithms that measure similarity between network layers typically encode only partial information, which limits the…
▽ More
The study of interlayer similarity of multiplex networks helps to understand the intrinsic structure of complex systems, revealing how changes in one layer can propagate and affect others, thus enabling broad implications for transportation, social, and biological systems. Existing algorithms that measure similarity between network layers typically encode only partial information, which limits their effectiveness in capturing the full complexity inherent in multiplex networks. To address this limitation, we propose a novel interlayer similarity measuring approach named Embedding Aided inTerlayer Similarity (EATSim). EATSim concurrently incorporates intralayer structural similarity and cross-layer anchor node alignment consistency, providing a more comprehensive framework for analyzing interconnected systems. Extensive experiments on both synthetic and real-world networks demonstrate that EATSim effectively captures the underlying geometric similarities between interconnected networks, significantly improving the accuracy of interlayer similarity measurement. Moreover, EATSim achieves state-of-the-art performance in two downstream applications: predicting network robustness and network reducibility, showing its great potential in enhancing the understanding and management of complex systems.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Dataset Distillation with Probabilistic Latent Features
Authors:
Zhe Li,
Sarah Cechnicka,
Cheng Ouyang,
Katharina Breininger,
Peter Schüffler,
Bernhard Kainz
Abstract:
As deep learning models grow in complexity and the volume of training data increases, reducing storage and computational costs becomes increasingly important. Dataset distillation addresses this challenge by synthesizing a compact set of synthetic data that can effectively replace the original dataset in downstream classification tasks. While existing methods typically rely on mapping data from pi…
▽ More
As deep learning models grow in complexity and the volume of training data increases, reducing storage and computational costs becomes increasingly important. Dataset distillation addresses this challenge by synthesizing a compact set of synthetic data that can effectively replace the original dataset in downstream classification tasks. While existing methods typically rely on mapping data from pixel space to the latent space of a generative model, we propose a novel stochastic approach that models the joint distribution of latent features. This allows our method to better capture spatial structures and produce diverse synthetic samples, which benefits model training. Specifically, we introduce a low-rank multivariate normal distribution parameterized by a lightweight network. This design maintains low computational complexity and is compatible with various matching networks used in dataset distillation. After distillation, synthetic images are generated by feeding the learned latent features into a pretrained generator. These synthetic images are then used to train classification models, and performance is evaluated on real test set. We validate our method on several benchmarks, including ImageNet subsets, CIFAR-10, and the MedMNIST histopathological dataset. Our approach achieves state-of-the-art cross architecture performance across a range of backbone architectures, demonstrating its generality and effectiveness.
△ Less
Submitted 17 May, 2025; v1 submitted 10 May, 2025;
originally announced May 2025.
-
Uplink Sum Rate Maximization for Pinching Antenna-Assisted Multiuser MISO
Authors:
Jiarui Zhang,
Hao Xu,
Chongjun Ouyang,
Qiuyun Zou,
Hongwen Yang
Abstract:
This article investigates the application of pinching-antenna systems (PASS) in multiuser multiple-input single-output (MISO) communications. Two sum-rate maximization problems are formulated under minimum mean square error (MMSE) decoding, with and without successive interference cancellation (SIC). To address the joint optimization of pinching antenna locations and user transmit powers, a fracti…
▽ More
This article investigates the application of pinching-antenna systems (PASS) in multiuser multiple-input single-output (MISO) communications. Two sum-rate maximization problems are formulated under minimum mean square error (MMSE) decoding, with and without successive interference cancellation (SIC). To address the joint optimization of pinching antenna locations and user transmit powers, a fractional programming-based approach is proposed. Numerical results validate the effectiveness of the proposed method and show that PASS can significantly enhance uplink sum-rate performance compared to conventional fixed-antenna designs.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
ViG3D-UNet: Volumetric Vascular Connectivity-Aware Segmentation via 3D Vision Graph Representation
Authors:
Bowen Liu,
Chunlei Meng,
Wei Lin,
Hongda Zhang,
Ziqing Zhou,
Zhongxue Gan,
Chun Ouyang
Abstract:
Accurate vascular segmentation is essential for coronary visualization and the diagnosis of coronary heart disease. This task involves the extraction of sparse tree-like vascular branches from the volumetric space. However, existing methods have faced significant challenges due to discontinuous vascular segmentation and missing endpoints. To address this issue, a 3D vision graph neural network fra…
▽ More
Accurate vascular segmentation is essential for coronary visualization and the diagnosis of coronary heart disease. This task involves the extraction of sparse tree-like vascular branches from the volumetric space. However, existing methods have faced significant challenges due to discontinuous vascular segmentation and missing endpoints. To address this issue, a 3D vision graph neural network framework, named ViG3D-UNet, was introduced. This method integrates 3D graph representation and aggregation within a U-shaped architecture to facilitate continuous vascular segmentation. The ViG3D module captures volumetric vascular connectivity and topology, while the convolutional module extracts fine vascular details. These two branches are combined through channel attention to form the encoder feature. Subsequently, a paperclip-shaped offset decoder minimizes redundant computations in the sparse feature space and restores the feature map size to match the original input dimensions. To evaluate the effectiveness of the proposed approach for continuous vascular segmentation, evaluations were performed on two public datasets, ASOCA and ImageCAS. The segmentation results show that the ViG3D-UNet surpassed competing methods in maintaining vascular segmentation connectivity while achieving high segmentation accuracy. Our code will be available soon.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Beamforming Design for Continuous Aperture Array (CAPA)-Based MIMO Systems
Authors:
Zhaolin Wang,
Chongjun Ouyang,
Yuanwei Liu
Abstract:
An efficient beamforming design is proposed for continuous aperture array (CAPA)-based point-to-point multiple-input multiple-output (MIMO) systems. In contrast to conventional spatially discrete array (SPDA)-MIMO systems, whose optimal beamforming can be obtained using singular-value decomposition, CAPA-MIMO systems require solving the eigendecomposition of a Hermitian kernel operator, which is c…
▽ More
An efficient beamforming design is proposed for continuous aperture array (CAPA)-based point-to-point multiple-input multiple-output (MIMO) systems. In contrast to conventional spatially discrete array (SPDA)-MIMO systems, whose optimal beamforming can be obtained using singular-value decomposition, CAPA-MIMO systems require solving the eigendecomposition of a Hermitian kernel operator, which is computationally prohibitive. To address this challenge, an explicit closed-form expression for the achievable rate of CAPA-MIMO systems is first derived as a function of the continuous transmit beamformer. Subsequently, an iterative weighted minimum mean-squared error (WMMSE) algorithm is proposed, directly addressing the CAPA-MIMO beamforming optimization without discretization approximation. Closed-form updates for each iteration of the WMMSE algorithm are derived via the calculus of variations (CoV) method. For low-complexity implementation, an equivalent matrix-based iterative solution is introduced using Gauss-Legendre quadrature. Our numerical results demonstrate that 1) CAPA-MIMO achieves substantial performance gain over the SPDA-MIMO, 2) the proposed WMMSE algorithm enhances performance while significantly reducing computational complexity compared to state-of-the-art Fourier-based approaches, and 3) the proposed WMMSE algorithm enables practical realization of parallel, non-interfering transmissions.
△ Less
Submitted 18 July, 2025; v1 submitted 31 March, 2025;
originally announced April 2025.
-
Federated Continual 3D Segmentation With Single-round Communication
Authors:
Can Peng,
Qianhui Men,
Pramit Saha,
Qianye Yang,
Cheng Ouyang,
J. Alison Noble
Abstract:
Federated learning seeks to foster collaboration among distributed clients while preserving the privacy of their local data. Traditionally, federated learning methods assume a fixed setting in which client data and learning objectives remain constant. However, in real-world scenarios, new clients may join, and existing clients may expand the segmentation label set as task requirements evolve. In s…
▽ More
Federated learning seeks to foster collaboration among distributed clients while preserving the privacy of their local data. Traditionally, federated learning methods assume a fixed setting in which client data and learning objectives remain constant. However, in real-world scenarios, new clients may join, and existing clients may expand the segmentation label set as task requirements evolve. In such a dynamic federated analysis setup, the conventional federated communication strategy of model aggregation per communication round is suboptimal. As new clients join, this strategy requires retraining, linearly increasing communication and computation overhead. It also imposes requirements for synchronized communication, which is difficult to achieve among distributed clients. In this paper, we propose a federated continual learning strategy that employs a one-time model aggregation at the server through multi-model distillation. This approach builds and updates the global model while eliminating the need for frequent server communication. When integrating new data streams or onboarding new clients, this approach efficiently reuses previous client models, avoiding the need to retrain the global model across the entire federation. By minimizing communication load and bypassing the need to put unchanged clients online, our approach relaxes synchronization requirements among clients, providing an efficient and scalable federated analysis framework suited for real-world applications. Using multi-class 3D abdominal CT segmentation as an application task, we demonstrate the effectiveness of the proposed approach.
△ Less
Submitted 16 November, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
MIMO-PASS: Uplink and Downlink Transmission via MIMO Pinching-Antenna Systems
Authors:
Ali Bereyhi,
Chongjun Ouyang,
Saba Asaad,
Zhiguo Ding,
H. Vincent Poor
Abstract:
Pinching-antenna systems (PASSs) are a recent flexible-antenna technology that is realized by attaching simple components, referred to as pinching elements, to dielectric waveguides. This work explores the potential of deploying PASS for uplink and downlink transmission in multiuser MIMO settings. For downlink PASS-aided communication, we formulate the optimal hybrid beamforming, in which the digi…
▽ More
Pinching-antenna systems (PASSs) are a recent flexible-antenna technology that is realized by attaching simple components, referred to as pinching elements, to dielectric waveguides. This work explores the potential of deploying PASS for uplink and downlink transmission in multiuser MIMO settings. For downlink PASS-aided communication, we formulate the optimal hybrid beamforming, in which the digital precoding matrix at the access point and the location of pinching elements on the waveguides are jointly optimized to maximize the achievable weighted sum-rate. Invoking fractional programming and Gauss-Seidel approach, we propose two low-complexity algorithms to iteratively update the precoding matrix and activated locations of the pinching elements. We further study uplink transmission aided by a PASS, where an iterative scheme is designed to address the underlying hybrid multiuser detection problem. We validate the proposed schemes through extensive numerical experiments. The results demonstrate that using a PASS, the throughput in both uplink and downlink is boosted significantly as compared with baseline MIMO architectures, such as massive MIMO~and classical hybrid analog-digital designs. This highlights the great potential of PASSs, making it a promising reconfigurable antenna technology for next-generation wireless systems.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Authors:
Jiazhen Pan,
Che Liu,
Junde Wu,
Fenglin Liu,
Jiayuan Zhu,
Hongwei Bran Li,
Chen Chen,
Cheng Ouyang,
Daniel Rueckert
Abstract:
Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visual Language Models (VLMs) show promise for radiological tasks, most existing VLMs merely produce final answers without revealing the underlying reasoning. To address this gap, we introduce MedVLM-R1,…
▽ More
Reasoning is a critical frontier for advancing medical image analysis, where transparency and trustworthiness play a central role in both clinician trust and regulatory approval. Although Medical Visual Language Models (VLMs) show promise for radiological tasks, most existing VLMs merely produce final answers without revealing the underlying reasoning. To address this gap, we introduce MedVLM-R1, a medical VLM that explicitly generates natural language reasoning to enhance transparency and trustworthiness. Instead of relying on supervised fine-tuning (SFT), which often suffers from overfitting to training distributions and fails to foster genuine reasoning, MedVLM-R1 employs a reinforcement learning framework that incentivizes the model to discover human-interpretable reasoning paths without using any reasoning references. Despite limited training data (600 visual question answering samples) and model parameters (2B), MedVLM-R1 boosts accuracy from 55.11% to 78.22% across MRI, CT, and X-ray benchmarks, outperforming larger models trained on over a million samples. It also demonstrates robust domain generalization under out-of-distribution tasks. By unifying medical image analysis with explicit reasoning, MedVLM-R1 marks a pivotal step toward trustworthy and interpretable AI in clinical practice. Inference model is available at: https://huggingface.co/JZPeterPan/MedVLM-R1.
△ Less
Submitted 19 March, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs
Authors:
Che Liu,
Cheng Ouyang,
Zhongwei Wan,
Haozhe Wang,
Wenjia Bai,
Rossella Arcucci
Abstract:
Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation lea…
▽ More
Recent advances in multimodal ECG representation learning center on aligning ECG signals with paired free-text reports. However, suboptimal alignment persists due to the complexity of medical language and the reliance on a full 12-lead setup, which is often unavailable in under-resourced settings. To tackle these issues, we propose **K-MERL**, a knowledge-enhanced multimodal ECG representation learning framework. **K-MERL** leverages large language models to extract structured knowledge from free-text reports and employs a lead-aware ECG encoder with dynamic lead masking to accommodate arbitrary lead inputs. Evaluations on six external ECG datasets show that **K-MERL** achieves state-of-the-art performance in zero-shot classification and linear probing tasks, while delivering an average **16%** AUC improvement over existing methods in partial-lead zero-shot classification.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Identity-Free Deferral For Unseen Experts
Authors:
Joshua Strong,
Pramit Saha,
Yasin Ibrahim,
Cheng Ouyang,
Alison Noble
Abstract:
Learning to Defer (L2D) improves AI reliability in decision-critical environments by training AI to either make its own prediction or defer the decision to a human expert. A key challenge is adapting to unseen experts at test time, whose competence can differ from the training population. Current methods for this task, however, can falter when unseen experts are out-of-distribution (OOD) relative…
▽ More
Learning to Defer (L2D) improves AI reliability in decision-critical environments by training AI to either make its own prediction or defer the decision to a human expert. A key challenge is adapting to unseen experts at test time, whose competence can differ from the training population. Current methods for this task, however, can falter when unseen experts are out-of-distribution (OOD) relative to the training population. We identify a core architectural flaw as the cause: they learn identity-conditioned policies by processing class-indexed signals in fixed coordinates, creating shortcuts that violate the problem's inherent permutation symmetry. We introduce Identity-Free Deferral (IFD), an architecture that enforces this symmetry by construction. From a few-shot context, IFD builds a query-independent Bayesian competence profile for each expert. It then supplies the deferral rejector with a low-dimensional, role-indexed state containing only structural information, such as the model's confidence in its top-ranked class and the expert's estimated skill for that same role, which obscures absolute class identities. We train IFD using an uncertainty-aware, context-only objective that removes the need for expensive query-time expert labels. We formally prove the permutation invariance of our approach, contrasting it with the generic non-invariance of standard population encoders. Experiments on medical imaging benchmarks and ImageNet-16H with real human annotators show that IFD consistently improves generalisation to unseen experts, with gains in OOD settings, all while using fewer annotations than alternative methods.
△ Less
Submitted 2 March, 2026; v1 submitted 14 February, 2025;
originally announced February 2025.
-
Downlink and Uplink ISAC in Continuous-Aperture Array (CAPA) Systems
Authors:
Boqun Zhao,
Chongjun Ouyang,
Xingqi Zhang,
Hyundong Shin,
Yuanwei Liu
Abstract:
A continuous-aperture array (CAPA)-based integrated sensing and communications (ISAC) framework is proposed for both downlink and uplink scenarios. Within this framework, continuous operator-based signal models are employed to describe the sensing and communication processes. The performance of communication and sensing is analyzed using two information-theoretic metrics: the communication rate (C…
▽ More
A continuous-aperture array (CAPA)-based integrated sensing and communications (ISAC) framework is proposed for both downlink and uplink scenarios. Within this framework, continuous operator-based signal models are employed to describe the sensing and communication processes. The performance of communication and sensing is analyzed using two information-theoretic metrics: the communication rate (CR) and the sensing rate (SR). 1) For downlink ISAC, three continuous beamforming designs are proposed: i) the communications-centric (C-C) design that maximizes the CR, ii) the sensing-centric (S-C) design that maximizes the SR, and iii) the Pareto-optimal design that characterizes the Pareto boundary of the CR-SR region. A low-complexity signal subspace-based approach is proposed to derive the closed-form optimal beamformers for the considered designs. On this basis, closed-form expressions are derived for the achievable CRs and SRs, and the downlink rate region achieved by CAPAs is characterized. 2) For uplink ISAC, the C-C and S-C successive interference cancellation-based methods are proposed to manage inter-functionality interference. Using the subspace approach closed-form expressions for the optimal detectors as well as the achievable CRs and SRs are derived. The uplink SR-CR region is characterized based on the time-sharing technique. Numerical results demonstrate that, for both downlink and uplink, CAPA-based ISAC achieves higher CRs and SRs as well as larger CR-SR regions compared to conventional spatially discrete array-based ISAC.
△ Less
Submitted 20 July, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Modeling and Beamforming Optimization for Pinching-Antenna Systems
Authors:
Zhaolin Wang,
Chongjun Ouyang,
Xidong Mu,
Yuanwei Liu,
Zhiguo Ding
Abstract:
The Pinching-Antenna SyStem (PASS) is a revolutionary flexible antenna technology designed to enhance wireless communication by establishing strong line-of-sight (LoS) links, reducing free-space path loss and enabling antenna array reconfigurability. PASS uses dielectric waveguides with low propagation loss for signal transmission, radiating via a passive pinching antenna, which is a small dielect…
▽ More
The Pinching-Antenna SyStem (PASS) is a revolutionary flexible antenna technology designed to enhance wireless communication by establishing strong line-of-sight (LoS) links, reducing free-space path loss and enabling antenna array reconfigurability. PASS uses dielectric waveguides with low propagation loss for signal transmission, radiating via a passive pinching antenna, which is a small dielectric element applied to the waveguide. This paper first proposes a physics-based hardware model for PASS, where the pinching antenna is modeled as an open-ended directional coupler, and the electromagnetic field behavior is analyzed using coupled-mode theory. A simplified signal model characterizes the coupling effect between multiple antennas on the same waveguide. Based on this, two power models are proposed: equal power and proportional power models. Additionally, a transmit power minimization problem is formulated/studied for the joint optimization of transmit and pinching beamforming under both continuous and discrete pinching antenna activations. Two algorithms are proposed to solve this multimodal optimization problem: the penalty-based alternating optimization algorithm and a low-complexity zero-forcing (ZF)-based algorithm. Numerical results show that 1) the ZF-based low-complexity algorithm performs similarly to the penalty-based algorithm, 2) PASS reduces transmit power by over 95% compared to conventional and massive MIMO, 3) discrete activation causes minimal performance loss but requires a dense antenna set to match continuous activation, and 4) the proportional power model yields performance comparable to the equal power model.
△ Less
Submitted 12 June, 2025; v1 submitted 9 February, 2025;
originally announced February 2025.
-
Downlink Beamforming with Pinching-Antenna Assisted MIMO Systems
Authors:
Ali Bereyhi,
Saba Asaad,
Chongjun Ouyang,
Zhiguo Ding,
H. Vincent Poor
Abstract:
Pinching antennas have been recently proposed as a promising flexible-antenna technology, which can be implemented by attaching low-cost pinching elements to dielectric waveguides. This work explores the potential of employing pinching antenna systems (PASs) for downlink transmission in a multiuser MIMO setting. We consider the problem of hybrid beamforming, where the digital precoder at the acces…
▽ More
Pinching antennas have been recently proposed as a promising flexible-antenna technology, which can be implemented by attaching low-cost pinching elements to dielectric waveguides. This work explores the potential of employing pinching antenna systems (PASs) for downlink transmission in a multiuser MIMO setting. We consider the problem of hybrid beamforming, where the digital precoder at the access point and the activated locations of the pinching elements are jointly optimized to maximize the achievable weighted sum-rate. Invoking fractional programming, a novel low-complexity algorithm is developed to iteratively update the precoding matrix and the locations of the pinching antennas. We validate the proposed scheme through extensive numerical experiments. Our investigations demonstrate that using PAS the system throughput can be significantly boosted as compared with the conventional fixed-location antenna systems, enlightening the potential of PAS as an enabling candidate for next-generation wireless networks.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
On the Performance of Physical Layer Security for Continuous-Aperture Array (CAPA) Systems
Authors:
Boqun Zhao,
Chongjun Ouyang,
Xingqi Zhang,
Yuanwei Liu
Abstract:
A continuous-aperture array (CAPA)-based secure transmission framework is proposed to enhance physical layer security. Continuous current distributions, or beamformers, are designed to maximize the secrecy transmission rate under a power constraint and to minimize the required transmission power for achieving a specific target secrecy rate. On this basis, the fundamental secrecy performance limits…
▽ More
A continuous-aperture array (CAPA)-based secure transmission framework is proposed to enhance physical layer security. Continuous current distributions, or beamformers, are designed to maximize the secrecy transmission rate under a power constraint and to minimize the required transmission power for achieving a specific target secrecy rate. On this basis, the fundamental secrecy performance limits achieved by CAPAs are analyzed by deriving closed-form expressions for the maximum secrecy rate (MSR) and minimum required power (MRP), along with the corresponding optimal current distributions. To provide further insights, asymptotic analyses are performed for the MSR and MRP, which reveals that i) for the MSR, the optimal current distribution simplifies to maximal ratio transmission (MRT) beamforming in the low-SNR regime and to zero-forcing (ZF) beamforming in the high-SNR regime; ii) for the MRP, the optimal current distribution simplifies to ZF beamforming in the high-SNR regime. The derived results are specialized to the typical array structures, e.g., planar CAPAs and planar spatially discrete arrays (SPDAs). The rate and power scaling laws are further analyzed by assuming an infinitely large CAPA. Numerical results demonstrate that: i) the proposed secure continuous beamforming design outperforms MRT and ZF beamforming in terms of both achievable secrecy rate and power efficiency; ii) CAPAs achieve superior secrecy performance compared to conventional SPDAs.
△ Less
Submitted 2 April, 2026; v1 submitted 18 December, 2024;
originally announced December 2024.
-
SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation
Authors:
Jiayuan Zhu,
Junde Wu,
Cheng Ouyang,
Konstantinos Kamnitsas,
J. Alison Noble
Abstract:
Medical image segmentation data inherently contain uncertainty. This can stem from both imperfect image quality and variability in labeling preferences on ambiguous pixels, which depend on annotator expertise and the clinical context of the annotations. For instance, a boundary pixel might be labeled as tumor in diagnosis to avoid under-estimation of severity, but as normal tissue in radiotherapy…
▽ More
Medical image segmentation data inherently contain uncertainty. This can stem from both imperfect image quality and variability in labeling preferences on ambiguous pixels, which depend on annotator expertise and the clinical context of the annotations. For instance, a boundary pixel might be labeled as tumor in diagnosis to avoid under-estimation of severity, but as normal tissue in radiotherapy to prevent damage to sensitive structures. As segmentation preferences vary across downstream applications, it is often desirable for an image segmentation model to offer user-adaptable predictions rather than a fixed output. While prior uncertainty-aware and interactive methods offer adaptability, they are inefficient at test time: uncertainty-aware models require users to choose from numerous similar outputs, while interactive models demand significant user input through click or box prompts to refine segmentation. To address these challenges, we propose \textbf{SPA}, a new \textbf{S}egmentation \textbf{P}reference \textbf{A}lignment framework that efficiently adapts to diverse test-time preferences with minimal human interaction. By presenting users with a select few, distinct segmentation candidates that best capture uncertainties, it reduces the user workload to reach the preferred segmentation. To accommodate user preference, we introduce a probabilistic mechanism that leverages user feedback to adapt a model's segmentation preference. The proposed framework is evaluated on several medical image segmentation tasks: color fundus images, lung lesion and kidney CT scans, MRI scans of brain and prostate. SPA shows 1) a significant reduction in user time and effort compared to existing interactive segmentation approaches, 2) strong adaptability based on human feedback, and 3) state-of-the-art image segmentation performance across different imaging modalities and semantic labels.
△ Less
Submitted 16 July, 2025; v1 submitted 23 November, 2024;
originally announced November 2024.