-
Reducing Pilots in Channel Estimation With Predictive Foundation Models
Authors:
Xingyu Zhou,
Le Liang,
Hao Ye,
Jing Zhang,
Chao-Kai Wen,
Shi Jin
Abstract:
Accurate channel state information (CSI) acquisition is essential for modern wireless systems, which becomes increasingly difficult under large antenna arrays, strict pilot overhead constraints, and diverse deployment environments. Existing artificial intelligence-based solutions often lack robustness and fail to generalize across scenarios. To address this limitation, this paper introduces a pred…
▽ More
Accurate channel state information (CSI) acquisition is essential for modern wireless systems, which becomes increasingly difficult under large antenna arrays, strict pilot overhead constraints, and diverse deployment environments. Existing artificial intelligence-based solutions often lack robustness and fail to generalize across scenarios. To address this limitation, this paper introduces a predictive-foundation-model-based channel estimation framework that enables accurate, low-overhead, and generalizable CSI acquisition. The proposed framework employs a predictive foundation model trained on large-scale cross-domain CSI data to extract universal channel representations and provide predictive priors with strong cross-scenario transferability. A pilot processing network based on a vision transformer architecture is further designed to capture spatial, temporal, and frequency correlations from pilot observations. An efficient fusion mechanism integrates predictive priors with real-time measurements, enabling reliable CSI reconstruction even under sparse or noisy conditions. Extensive evaluations across diverse configurations demonstrate that the proposed estimator significantly outperforms both classical and data-driven baselines in accuracy, robustness, and generalization capability.
△ Less
Submitted 17 December, 2025;
originally announced December 2025.
-
Cross-Modal Semantic Communication for Heterogeneous Collaborative Perception
Authors:
Mingyi Lu,
Guowei Liu,
Le Liang,
Chongtao Guo,
Hao Ye,
Shi Jin
Abstract:
Collaborative perception, an emerging paradigm in autonomous driving, has been introduced to mitigate the limitations of single-vehicle systems, such as limited sensor range and occlusion. To improve the robustness of inter-vehicle data sharing, semantic communication has recently further been integrated into collaborative perception systems to enhance overall performance. However, practical deplo…
▽ More
Collaborative perception, an emerging paradigm in autonomous driving, has been introduced to mitigate the limitations of single-vehicle systems, such as limited sensor range and occlusion. To improve the robustness of inter-vehicle data sharing, semantic communication has recently further been integrated into collaborative perception systems to enhance overall performance. However, practical deployment of such systems is challenged by the heterogeneity of sensors across different connected autonomous vehicles (CAVs). This diversity in perceptual data complicates the design of a unified communication framework and impedes the effective fusion of shared information. To address this challenge, we propose a novel cross-modal semantic communication (CMSC) framework to facilitate effective collaboration among CAVs with disparate sensor configurations. Specifically, the framework first transforms heterogeneous perceptual features from different sensor modalities into a unified and standardized semantic space. Subsequently, encoding, transmission, and decoding are performed within this semantic space, enabling seamless and effective information fusion. Extensive experiments demonstrate that CMSC achieves significantly stronger perception performance than existing methods, particularly in low signal-to-noise ratio (SNR) regimes.
△ Less
Submitted 25 November, 2025;
originally announced November 2025.
-
Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication
Authors:
Tianhao Mao,
Le Liang,
Jie Yang,
Hao Ye,
Shi Jin,
Geoffrey Ye Li
Abstract:
This paper presents Multimodal-Wireless, an open-source multimodal sensing dataset designed for wireless communication research. The dataset is generated through an integrated and customizable data pipeline built upon the CARLA simulator and Sionna framework. It contains approximately 160,000 frames collected across four virtual towns, sixteen communication scenarios, and three weather conditions,…
▽ More
This paper presents Multimodal-Wireless, an open-source multimodal sensing dataset designed for wireless communication research. The dataset is generated through an integrated and customizable data pipeline built upon the CARLA simulator and Sionna framework. It contains approximately 160,000 frames collected across four virtual towns, sixteen communication scenarios, and three weather conditions, encompassing multiple sensing modalities--communication channel, light detection and ranging, RGB and depth cameras, inertial measurement unit, and radar. This paper provides a comprehensive overview of the dataset, outlining its key features, overall framework, and technical implementation details. In addition, it explores potential research applications concerning communication and collaborative perception, exemplified by beam prediction using a multimodal large language model. The dataset is open in https://le-liang.github.io/mmw/.
△ Less
Submitted 5 November, 2025;
originally announced November 2025.
-
Ordinal Label-Distribution Learning with Constrained Asymmetric Priors for Imbalanced Retinal Grading
Authors:
Nagur Shareef Shaik,
Teja Krishna Cherukuri,
Adnan Masood,
Ehsan Adeli,
Dong Hye Ye
Abstract:
Diabetic retinopathy grading is inherently ordinal and long-tailed, with minority stages being scarce, heterogeneous, and clinically critical to detect accurately. Conventional methods often rely on isotropic Gaussian priors and symmetric loss functions, misaligning latent representations with the task's asymmetric nature. We propose the Constrained Asymmetric Prior Wasserstein Autoencoder (CAP-WA…
▽ More
Diabetic retinopathy grading is inherently ordinal and long-tailed, with minority stages being scarce, heterogeneous, and clinically critical to detect accurately. Conventional methods often rely on isotropic Gaussian priors and symmetric loss functions, misaligning latent representations with the task's asymmetric nature. We propose the Constrained Asymmetric Prior Wasserstein Autoencoder (CAP-WAE), a novel framework that addresses these challenges through three key innovations. Our approach employs a Wasserstein Autoencoder (WAE) that aligns its aggregate posterior with a asymmetric prior, preserving the heavy-tailed and skewed structure of minority classes. The latent space is further structured by a Margin-Aware Orthogonality and Compactness (MAOC) loss to ensure grade-ordered separability. At the supervision level, we introduce a direction-aware ordinal loss, where a lightweight head predicts asymmetric dispersions to generate soft labels that reflect clinical priorities by penalizing under-grading more severely. Stabilized by an adaptive multi-task weighting scheme, our end-to-end model requires minimal tuning. Across public DR benchmarks, CAP-WAE consistently achieves state-of-the-art Quadratic Weighted Kappa, accuracy, and macro-F1, surpassing both ordinal classification and latent generative baselines. t-SNE visualizations further reveal that our method reshapes the latent manifold into compact, grade-ordered clusters with reduced overlap.
△ Less
Submitted 30 September, 2025;
originally announced September 2025.
-
RSU-Assisted Resource Allocation for Collaborative Perception
Authors:
Guowei Liu,
Le Liang,
Chongtao Guo,
Hao Ye,
Shi Jin
Abstract:
As a pivotal technology for autonomous driving, collaborative perception enables vehicular agents to exchange perceptual data through vehicle-to-everything (V2X) communications, thereby enhancing perception accuracy of all collaborators. However, existing collaborative perception frameworks often assume ample communication resources, which is usually impractical in real-world vehicular networks. T…
▽ More
As a pivotal technology for autonomous driving, collaborative perception enables vehicular agents to exchange perceptual data through vehicle-to-everything (V2X) communications, thereby enhancing perception accuracy of all collaborators. However, existing collaborative perception frameworks often assume ample communication resources, which is usually impractical in real-world vehicular networks. To address this challenge, this paper investigates the problem of communication resource allocation for collaborative perception and proposes RACooper, a novel RSU-assisted resource allocation framework that maximizes perception accuracy under constrained communication resources. RACooper leverages a hierarchical reinforcement learning model to dynamically allocate communication resources while accounting for real-time sensing data and channel dynamics induced by vehicular mobility. By jointly optimizing spatial confidence metrics and channel state information, our approach ensures efficient feature transmission, enhancing the effectiveness of collaborative perception. Simulation results demonstrate that compared to conventional baseline algorithms, RACooper achieves significant improvements in perception accuracy, especially under bandwidth-constrained scenarios.
△ Less
Submitted 22 September, 2025;
originally announced September 2025.
-
NeuroKoop: Neural Koopman Fusion of Structural-Functional Connectomes for Identifying Prenatal Drug Exposure in Adolescents
Authors:
Badhan Mazumder,
Aline Kotoski,
Vince D. Calhoun,
Dong Hye Ye
Abstract:
Understanding how prenatal exposure to psychoactive substances such as cannabis shapes adolescent brain organization remains a critical challenge, complicated by the complexity of multimodal neuroimaging data and the limitations of conventional analytic methods. Existing approaches often fail to fully capture the complementary features embedded within structural and functional connectomes, constra…
▽ More
Understanding how prenatal exposure to psychoactive substances such as cannabis shapes adolescent brain organization remains a critical challenge, complicated by the complexity of multimodal neuroimaging data and the limitations of conventional analytic methods. Existing approaches often fail to fully capture the complementary features embedded within structural and functional connectomes, constraining both biological insight and predictive performance. To address this, we introduced NeuroKoop, a novel graph neural network-based framework that integrates structural and functional brain networks utilizing neural Koopman operator-driven latent space fusion. By leveraging Koopman theory, NeuroKoop unifies node embeddings derived from source-based morphometry (SBM) and functional network connectivity (FNC) based brain graphs, resulting in enhanced representation learning and more robust classification of prenatal drug exposure (PDE) status. Applied to a large adolescent cohort from the ABCD dataset, NeuroKoop outperformed relevant baselines and revealed salient structural-functional connections, advancing our understanding of the neurodevelopmental impact of PDE.
△ Less
Submitted 22 August, 2025;
originally announced August 2025.
-
MIMOSA: Multi-parametric Imaging using Multiple-echoes with Optimized Simultaneous Acquisition for highly-efficient quantitative MRI
Authors:
Yuting Chen,
Yohan Jun,
Amir Heydari,
Xingwang Yong,
Jiye Kim,
Jongho Lee,
Huafeng Liu,
Huihui Ye,
Borjan Gagoski,
Shohei Fujita,
Berkin Bilgic
Abstract:
Purpose: To develop a new sequence, MIMOSA, for highly-efficient T1, T2, T2*, proton density (PD), and source separation quantitative susceptibility mapping (QSM). Methods: MIMOSA was developed based on 3D-quantification using an interleaved Look-Locker acquisition sequence with T2 preparation pulse (3D-QALAS) by combining 3D turbo Fast Low Angle Shot (FLASH) and multi-echo gradient echo acquisiti…
▽ More
Purpose: To develop a new sequence, MIMOSA, for highly-efficient T1, T2, T2*, proton density (PD), and source separation quantitative susceptibility mapping (QSM). Methods: MIMOSA was developed based on 3D-quantification using an interleaved Look-Locker acquisition sequence with T2 preparation pulse (3D-QALAS) by combining 3D turbo Fast Low Angle Shot (FLASH) and multi-echo gradient echo acquisition modules with a spiral-like Cartesian trajectory to facilitate highly-efficient acquisition. Simulations were performed to optimize the sequence. Multi-contrast/-slice zero-shot self-supervised learning algorithm was employed for reconstruction. The accuracy of quantitative mapping was assessed by comparing MIMOSA with 3D-QALAS and reference techniques in both ISMRM/NIST phantom and in-vivo experiments. MIMOSA's acceleration capability was assessed at R = 3.3, 6.5, and 11.8 in in-vivo experiments, with repeatability assessed through scan-rescan studies. Beyond the 3T experiments, mesoscale quantitative mapping was performed at 750 um isotropic resolution at 7T. Results: Simulations demonstrated that MIMOSA achieved improved parameter estimation accuracy compared to 3D-QALAS. Phantom experiments indicated that MIMOSA exhibited better agreement with the reference techniques than 3D-QALAS. In-vivo experiments demonstrated that an acceleration factor of up to R = 11.8-fold can be achieved while preserving parameter estimation accuracy, with intra-class correlation coefficients of 0.998 (T1), 0.973 (T2), 0.947 (T2*), 0.992 (QSM), 0.987 (paramagnetic susceptibility), and 0.977 (diamagnetic susceptibility) in scan-rescan studies. Whole-brain T1, T2, T2*, PD, source separation QSM were obtained with 1 mm isotropic resolution in 3 min at 3T and 750 um isotropic resolution in 13 min at 7T. Conclusion: MIMOSA demonstrated potential for highly-efficient multi-parametric mapping.
△ Less
Submitted 13 August, 2025;
originally announced August 2025.
-
U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV
Authors:
Hongbo Ye,
Fenghe Tang,
Peiang Zhao,
Zhen Huang,
Dexin Zhao,
Minghao Bian,
S. Kevin Zhou
Abstract:
Achieving equity in healthcare accessibility requires lightweight yet high-performance solutions for medical image segmentation, particularly in resource-limited settings. Existing methods like U-Net and its variants often suffer from limited global Effective Receptive Fields (ERFs), hindering their ability to capture long-range dependencies. To address this, we propose U-RWKV, a novel framework l…
▽ More
Achieving equity in healthcare accessibility requires lightweight yet high-performance solutions for medical image segmentation, particularly in resource-limited settings. Existing methods like U-Net and its variants often suffer from limited global Effective Receptive Fields (ERFs), hindering their ability to capture long-range dependencies. To address this, we propose U-RWKV, a novel framework leveraging the Recurrent Weighted Key-Value(RWKV) architecture, which achieves efficient long-range modeling at O(N) computational cost. The framework introduces two key innovations: the Direction-Adaptive RWKV Module(DARM) and the Stage-Adaptive Squeeze-and-Excitation Module(SASE). DARM employs Dual-RWKV and QuadScan mechanisms to aggregate contextual cues across images, mitigating directional bias while preserving global context and maintaining high computational efficiency. SASE dynamically adapts its architecture to different feature extraction stages, balancing high-resolution detail preservation and semantic relationship capture. Experiments demonstrate that U-RWKV achieves state-of-the-art segmentation performance with high computational efficiency, offering a practical solution for democratizing advanced medical imaging technologies in resource-constrained environments. The code is available at https://github.com/hbyecoding/U-RWKV.
△ Less
Submitted 15 July, 2025;
originally announced July 2025.
-
Hybrid-View Attention Network for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound
Authors:
Zetian Feng,
Juan Fu,
Xuebin Zou,
Hongsheng Ye,
Hong Wu,
Jianhua Zhou,
Yi Wang
Abstract:
Prostate cancer (PCa) is a leading cause of cancer-related mortality in men, and accurate identification of clinically significant PCa (csPCa) is critical for timely intervention. Transrectal ultrasound (TRUS) is widely used for prostate biopsy; however, its low contrast and anisotropic spatial resolution pose diagnostic challenges. To address these limitations, we propose a novel hybrid-view atte…
▽ More
Prostate cancer (PCa) is a leading cause of cancer-related mortality in men, and accurate identification of clinically significant PCa (csPCa) is critical for timely intervention. Transrectal ultrasound (TRUS) is widely used for prostate biopsy; however, its low contrast and anisotropic spatial resolution pose diagnostic challenges. To address these limitations, we propose a novel hybrid-view attention (HVA) network for csPCa classification in 3D TRUS that leverages complementary information from transverse and sagittal views. Our approach integrates a CNN-transformer hybrid architecture, where convolutional layers extract fine-grained local features and transformer-based HVA models global dependencies. Specifically, the HVA comprises intra-view attention to refine features within a single view and cross-view attention to incorporate complementary information across views. Furthermore, a hybrid-view adaptive fusion module dynamically aggregates features along both channel and spatial dimensions, enhancing the overall representation. Experiments are conducted on an in-house dataset containing 590 subjects who underwent prostate biopsy. Comparative and ablation results prove the efficacy of our method. The code is available at https://github.com/mock1ngbrd/HVAN.
△ Less
Submitted 9 July, 2025; v1 submitted 4 July, 2025;
originally announced July 2025.
-
SComCP: Task-Oriented Semantic Communication for Collaborative Perception
Authors:
Jipeng Gan,
Yucheng Sheng,
Hua Zhang,
Le Liang,
Hao Ye,
Chongtao Guo,
Shi Jin
Abstract:
Reliable detection of surrounding objects is critical for the safe operation of connected automated vehicles (CAVs). However, inherent limitations such as the restricted perception range and occlusion effects compromise the reliability of single-vehicle perception systems in complex traffic environments. Collaborative perception has emerged as a promising approach by fusing sensor data from surrou…
▽ More
Reliable detection of surrounding objects is critical for the safe operation of connected automated vehicles (CAVs). However, inherent limitations such as the restricted perception range and occlusion effects compromise the reliability of single-vehicle perception systems in complex traffic environments. Collaborative perception has emerged as a promising approach by fusing sensor data from surrounding CAVs with diverse viewpoints, thereby improving environmental awareness. Although collaborative perception holds great promise, its performance is bottlenecked by wireless communication constraints, as unreliable and bandwidth-limited channels hinder the transmission of sensor data necessary for real-time perception. To address these challenges, this paper proposes SComCP, a novel task-oriented semantic communication framework for collaborative perception. Specifically, SComCP integrates an importance-aware feature selection network that selects and transmits semantic features most relevant to the perception task, significantly reducing communication overhead without sacrificing accuracy. Furthermore, we design a semantic codec network based on a joint source and channel coding (JSCC) architecture, which enables bidirectional transformation between semantic features and noise-tolerant channel symbols, thereby ensuring stable perception under adverse wireless conditions. Extensive experiments demonstrate the effectiveness of the proposed framework. In particular, compared to existing approaches, SComCP can maintain superior perception performance across various channel conditions, especially in low signal-to-noise ratio (SNR) scenarios. In addition, SComCP exhibits strong generalization capability, enabling the framework to maintain high performance across diverse channel conditions, even when trained with a specific channel model.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Authors:
Sheng Liu,
Tianlang Chen,
Pan Lu,
Haotian Ye,
Yizheng Chen,
Lei Xing,
James Zou
Abstract:
Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different proble…
▽ More
Test-time compute has emerged as a powerful paradigm for improving the performance of large language models (LLMs), where generating multiple outputs or refining individual chains can significantly boost answer accuracy. However, existing methods like Best-of-N, majority voting, and self-reflection typically apply reasoning in a uniform way across inputs, overlooking the fact that different problems may require different levels of reasoning depth. In this work, we propose Fractional Reasoning, a training-free and model-agnostic framework that enables continuous control over reasoning intensity at inference time, going beyond the limitations of fixed instructional prompts. Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor, allowing the model to tailor its reasoning process to the complexity of each input. This supports two key modes of test-time scaling: (1) improving output quality in breadth-based strategies (e.g., Best-of-N, majority voting), and (2) enhancing the correctness of individual reasoning chains in depth-based strategies (e.g., self-reflection). Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
△ Less
Submitted 25 September, 2025; v1 submitted 18 June, 2025;
originally announced June 2025.
-
Physics-Guided Multi-View Graph Neural Network for Schizophrenia Classification via Structural-Functional Coupling
Authors:
Badhan Mazumder,
Ayush Kanyal,
Lei Wu,
Vince D. Calhoun,
Dong Hye Ye
Abstract:
Clinical studies reveal disruptions in brain structural connectivity (SC) and functional connectivity (FC) in neuropsychiatric disorders such as schizophrenia (SZ). Traditional approaches might rely solely on SC due to limited functional data availability, hindering comprehension of cognitive and behavioral impairments in individuals with SZ by neglecting the intricate SC-FC interrelationship. To…
▽ More
Clinical studies reveal disruptions in brain structural connectivity (SC) and functional connectivity (FC) in neuropsychiatric disorders such as schizophrenia (SZ). Traditional approaches might rely solely on SC due to limited functional data availability, hindering comprehension of cognitive and behavioral impairments in individuals with SZ by neglecting the intricate SC-FC interrelationship. To tackle the challenge, we propose a novel physics-guided deep learning framework that leverages a neural oscillation model to describe the dynamics of a collection of interconnected neural oscillators, which operate via nerve fibers dispersed across the brain's structure. Our proposed framework utilizes SC to simultaneously generate FC by learning SC-FC coupling from a system dynamics perspective. Additionally, it employs a novel multi-view graph neural network (GNN) with a joint loss to perform correlation-based SC-FC fusion and classification of individuals with SZ. Experiments conducted on a clinical dataset exhibited improved performance, demonstrating the robustness of our proposed approach.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach
Authors:
Hao Fang,
Kai Huang,
Hao Ye,
Chongtao Guo,
Le Liang,
Xiao Li,
Shi Jin
Abstract:
The pursuit of rate maximization in wireless communication frequently encounters substantial challenges associated with user fairness. This paper addresses these challenges by exploring a novel power allocation approach for delay optimization, utilizing graph neural networks (GNNs)-based reinforcement learning (RL) in device-to-device (D2D) communication. The proposed approach incorporates not onl…
▽ More
The pursuit of rate maximization in wireless communication frequently encounters substantial challenges associated with user fairness. This paper addresses these challenges by exploring a novel power allocation approach for delay optimization, utilizing graph neural networks (GNNs)-based reinforcement learning (RL) in device-to-device (D2D) communication. The proposed approach incorporates not only channel state information but also factors such as packet delay, the number of backlogged packets, and the number of transmitted packets into the components of the state information. We adopt a centralized RL method, where a central controller collects and processes the state information. The central controller functions as an agent trained using the proximal policy optimization (PPO) algorithm. To better utilize topology information in the communication network and enhance the generalization of the proposed method, we embed GNN layers into both the actor and critic networks of the PPO algorithm. This integration allows for efficient parameter updates of GNNs and enables the state information to be parameterized as a low-dimensional embedding, which is leveraged by the agent to optimize power allocation strategies. Simulation results demonstrate that the proposed method effectively reduces average delay while ensuring user fairness, outperforms baseline methods, and exhibits scalability and generalization capability.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Breast Cancer Classification in Deep Ultraviolet Fluorescence Images Using a Patch-Level Vision Transformer Framework
Authors:
Pouya Afshin,
David Helminiak,
Tongtong Lu,
Tina Yen,
Julie M. Jorns,
Mollie Patton,
Bing Yu,
Dong Hye Ye
Abstract:
Breast-conserving surgery (BCS) aims to completely remove malignant lesions while maximizing healthy tissue preservation. Intraoperative margin assessment is essential to achieve a balance between thorough cancer resection and tissue conservation. A deep ultraviolet fluorescence scanning microscope (DUV-FSM) enables rapid acquisition of whole surface images (WSIs) for excised tissue, providing con…
▽ More
Breast-conserving surgery (BCS) aims to completely remove malignant lesions while maximizing healthy tissue preservation. Intraoperative margin assessment is essential to achieve a balance between thorough cancer resection and tissue conservation. A deep ultraviolet fluorescence scanning microscope (DUV-FSM) enables rapid acquisition of whole surface images (WSIs) for excised tissue, providing contrast between malignant and normal tissues. However, breast cancer classification with DUV WSIs is challenged by high resolutions and complex histopathological features. This study introduces a DUV WSI classification framework using a patch-level vision transformer (ViT) model, capturing local and global features. Grad-CAM++ saliency weighting highlights relevant spatial regions, enhances result interpretability, and improves diagnostic accuracy for benign and malignant tissue classification. A comprehensive 5-fold cross-validation demonstrates the proposed approach significantly outperforms conventional deep learning methods, achieving a classification accuracy of 98.33%.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
HippoMM: Hippocampal-inspired Multimodal Memory for Long Audiovisual Event Understanding
Authors:
Yueqian Lin,
Qinsi Wang,
Hancheng Ye,
Yuzhe Fu,
Hai "Helen" Li,
Yiran Chen
Abstract:
Comprehending extended audiovisual experiences remains a fundamental challenge for computational systems. Current approaches struggle with temporal integration and cross-modal associations that humans accomplish effortlessly through hippocampal-cortical networks. We introduce HippoMM, a biologically-inspired architecture that transforms hippocampal mechanisms into computational advantages for mult…
▽ More
Comprehending extended audiovisual experiences remains a fundamental challenge for computational systems. Current approaches struggle with temporal integration and cross-modal associations that humans accomplish effortlessly through hippocampal-cortical networks. We introduce HippoMM, a biologically-inspired architecture that transforms hippocampal mechanisms into computational advantages for multimodal understanding. HippoMM implements three key innovations: (i) hippocampus-inspired pattern separation and completion specifically designed for continuous audiovisual streams, (ii) short-to-long term memory consolidation that transforms perceptual details into semantic abstractions, and (iii) cross-modal associative retrieval pathways enabling modality-crossing queries. Unlike existing retrieval systems with static indexing schemes, HippoMM dynamically forms integrated episodic representations through adaptive temporal segmentation and dual-process memory encoding. Evaluations on our challenging HippoVlog benchmark demonstrate that HippoMM significantly outperforms state-of-the-art approaches (78.2% vs. 64.2% accuracy) while providing substantially faster response times (20.4s vs. 112.5s). Our results demonstrate that translating neuroscientific memory principles into computational architectures provides a promising foundation for next-generation multimodal understanding systems. The code and benchmark dataset are publicly available at https://github.com/linyueqian/HippoMM.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Adaptive Wavelet Filters as Practical Texture Feature Amplifiers for Parkinson's Disease Screening in OCT
Authors:
Xiaoqing Zhang,
Hanfeng Shi,
Xiangyu Li,
Haili Ye,
Tao Xu,
Na Li,
Yan Hu,
Fan Lv,
Jiangfan Chen,
Jiang Liu
Abstract:
Parkinson's disease (PD) is a prevalent neurodegenerative disorder globally. The eye's retina is an extension of the brain and has great potential in PD screening. Recent studies have suggested that texture features extracted from retinal layers can be adopted as biomarkers for PD diagnosis under optical coherence tomography (OCT) images. Frequency domain learning techniques can enhance the featur…
▽ More
Parkinson's disease (PD) is a prevalent neurodegenerative disorder globally. The eye's retina is an extension of the brain and has great potential in PD screening. Recent studies have suggested that texture features extracted from retinal layers can be adopted as biomarkers for PD diagnosis under optical coherence tomography (OCT) images. Frequency domain learning techniques can enhance the feature representations of deep neural networks (DNNs) by decomposing frequency components involving rich texture features. Additionally, previous works have not exploited texture features for automated PD screening in OCT. Motivated by the above analysis, we propose a novel Adaptive Wavelet Filter (AWF) that serves as the Practical Texture Feature Amplifier to fully leverage the merits of texture features to boost the PD screening performance of DNNs with the aid of frequency domain learning. Specifically, AWF first enhances texture feature representation diversities via channel mixer, then emphasizes informative texture feature representations with the well-designed adaptive wavelet filtering token mixer. By combining the AWFs with the DNN stem, AWFNet is constructed for automated PD screening. Additionally, we introduce a novel Balanced Confidence (BC) Loss by mining the potential of sample-wise predicted probabilities of all classes and class frequency prior, to further boost the PD screening performance and trustworthiness of AWFNet. The extensive experiments manifest the superiority of our AWFNet and BC over state-of-the-art methods in terms of PD screening performance and trustworthiness.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
arXiv:2501.11553
[pdf]
cs.RO
cond-mat.mtrl-sci
eess.SY
physics.app-ph
physics.bio-ph
physics.med-ph
Clinically Ready Magnetic Microrobots for Targeted Therapies
Authors:
Fabian C. Landers,
Lukas Hertle,
Vitaly Pustovalov,
Derick Sivakumaran,
Oliver Brinkmann,
Kirstin Meiners,
Pascal Theiler,
Valentin Gantenbein,
Andrea Veciana,
Michael Mattmann,
Silas Riss,
Simone Gervasoni,
Christophe Chautems,
Hao Ye,
Semih Sevim,
Andreas D. Flouris,
Josep PuigmartÃ-Luis,
Tiago Sotto Mayor,
Pedro Alves,
Tessa Lühmann,
Xiangzhong Chen,
Nicole Ochsenbein,
Ueli Moehrlen,
Philipp Gruber,
Miriam Weisskopf
, et al. (3 additional authors not shown)
Abstract:
Systemic drug administration often causes off-target effects limiting the efficacy of advanced therapies. Targeted drug delivery approaches increase local drug concentrations at the diseased site while minimizing systemic drug exposure. We present a magnetically guided microrobotic drug delivery system capable of precise navigation under physiological conditions. This platform integrates a clinica…
▽ More
Systemic drug administration often causes off-target effects limiting the efficacy of advanced therapies. Targeted drug delivery approaches increase local drug concentrations at the diseased site while minimizing systemic drug exposure. We present a magnetically guided microrobotic drug delivery system capable of precise navigation under physiological conditions. This platform integrates a clinical electromagnetic navigation system, a custom-designed release catheter, and a dissolvable capsule for accurate therapeutic delivery. In vitro tests showed precise navigation in human vasculature models, and in vivo experiments confirmed tracking under fluoroscopy and successful navigation in large animal models. The microrobot balances magnetic material concentration, contrast agent loading, and therapeutic drug capacity, enabling effective hosting of therapeutics despite the integration complexity of its components, offering a promising solution for precise targeted drug delivery.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
EEG-GMACN: Interpretable EEG Graph Mutual Attention Convolutional Network
Authors:
Haili Ye,
Stephan Goerttler,
Fei He
Abstract:
Electroencephalogram (EEG) is a valuable technique to record brain electrical activity through electrodes placed on the scalp. Analyzing EEG signals contributes to the understanding of neurological conditions and developing brain-computer interface. Graph Signal Processing (GSP) has emerged as a promising method for EEG spatial-temporal analysis, by further considering the topological relationship…
▽ More
Electroencephalogram (EEG) is a valuable technique to record brain electrical activity through electrodes placed on the scalp. Analyzing EEG signals contributes to the understanding of neurological conditions and developing brain-computer interface. Graph Signal Processing (GSP) has emerged as a promising method for EEG spatial-temporal analysis, by further considering the topological relationships between electrodes. However, existing GSP studies lack interpretability of electrode importance and the credibility of prediction confidence. This work proposes an EEG Graph Mutual Attention Convolutional Network (EEG-GMACN), by introducing an 'Inverse Graph Weight Module' to output interpretable electrode graph weights, enhancing the clinical credibility and interpretability of EEG classification results. Additionally, we incorporate a mutual attention mechanism module into the model to improve its capability to distinguish critical electrodes and introduce credibility calibration to assess the uncertainty of prediction results. This study enhances the transparency and effectiveness of EEG analysis, paving the way for its widespread use in clinical and neuroscience research.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
GCS-M3VLT: Guided Context Self-Attention based Multi-modal Medical Vision Language Transformer for Retinal Image Captioning
Authors:
Teja Krishna Cherukuri,
Nagur Shareef Shaik,
Jyostna Devi Bodapati,
Dong Hye Ye
Abstract:
Retinal image analysis is crucial for diagnosing and treating eye diseases, yet generating accurate medical reports from images remains challenging due to variability in image quality and pathology, especially with limited labeled data. Previous Transformer-based models struggled to integrate visual and textual information under limited supervision. In response, we propose a novel vision-language…
▽ More
Retinal image analysis is crucial for diagnosing and treating eye diseases, yet generating accurate medical reports from images remains challenging due to variability in image quality and pathology, especially with limited labeled data. Previous Transformer-based models struggled to integrate visual and textual information under limited supervision. In response, we propose a novel vision-language model for retinal image captioning that combines visual and textual features through a guided context self-attention mechanism. This approach captures both intricate details and the global clinical context, even in data-scarce scenarios. Extensive experiments on the DeepEyeNet dataset demonstrate a 0.023 BLEU@4 improvement, along with significant qualitative advancements, highlighting the effectiveness of our model in generating comprehensive medical captions.
△ Less
Submitted 22 December, 2024;
originally announced December 2024.
-
Meta-Learning Empowered Graph Neural Networks for Radio Resource Management
Authors:
Kai Huang,
Le Liang,
Xinping Yi,
Hao Ye,
Shi Jin,
Geoffrey Ye Li
Abstract:
In this paper, we consider a radio resource management (RRM) problem in the dynamic wireless networks, comprising multiple communication links that share the same spectrum resource. To achieve high network throughput while ensuring fairness across all links, we formulate a resilient power optimization problem with per-user minimum-rate constraints. We obtain the corresponding Lagrangian dual probl…
▽ More
In this paper, we consider a radio resource management (RRM) problem in the dynamic wireless networks, comprising multiple communication links that share the same spectrum resource. To achieve high network throughput while ensuring fairness across all links, we formulate a resilient power optimization problem with per-user minimum-rate constraints. We obtain the corresponding Lagrangian dual problem and parameterize all variables with neural networks, which can be trained in an unsupervised manner due to the provably acceptable duality gap. We develop a meta-learning approach with graph neural networks (GNNs) as parameterization that exhibits fast adaptation and scalability to varying network configurations. We formulate the objective of meta-learning by amalgamating the Lagrangian functions of different network configurations and utilize a first-order meta-learning algorithm, called Reptile, to obtain the meta-parameters. Numerical results verify that our method can efficiently improve the overall throughput and ensure the minimum rate performance. We further demonstrate that using the meta-parameters as initialization, our method can achieve fast adaptation to new wireless network configurations and reduce the number of required training data samples.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Less is More: Skim Transformer for Light Field Image Super-resolution
Authors:
Zeke Zexi Hu,
Haodong Chen,
Hui Ye,
Xiaoming Chen,
Vera Yuk Ying Chung,
Yiran Shen,
Weidong Cai
Abstract:
A light field image captures scenes through an array of micro-lenses, providing a rich representation that encompasses spatial and angular information. While this richness comes at the cost of significant data redundancy, most existing light field methods still tend to indiscriminately utilize all the information from sub-aperture images (SAIs) in an attempt to harness every visual cue regardless…
▽ More
A light field image captures scenes through an array of micro-lenses, providing a rich representation that encompasses spatial and angular information. While this richness comes at the cost of significant data redundancy, most existing light field methods still tend to indiscriminately utilize all the information from sub-aperture images (SAIs) in an attempt to harness every visual cue regardless of their disparity significance. However, this paradigm inevitably leads to disparity entanglement, a fundamental cause of inefficiency in light field image processing. To address this limitation, we introduce the Skim Transformer, a novel architecture inspired by the ``less is more" philosophy. Unlike conventional light field Transformers, our Skim Transformer features a multi-branch structure where each branch is dedicated to a specific disparity range by constructing its attention score matrix over a skimmed subset of SAIs, rather than all of them. Building upon this core component, we present SkimLFSR, an efficient yet powerful network for light field super-resolution (LFSR). Requiring only 67\% of parameters, SkimLFSR achieves state-of-the-art results surpassing the best existing method by an average of 0.59 dB and 0.35 dB in PSNR at the 2x and 4x tasks, respectively. Through in-depth analyses, we reveal that SkimLFSR, guided by the predefined skimmed SAI sets as prior knowledge, demonstrates distinct disparity-aware behaviors in attending to visual cues. These findings highlight its effectiveness and adaptability as a promising paradigm for light field image processing.
△ Less
Submitted 9 August, 2025; v1 submitted 21 July, 2024;
originally announced July 2024.
-
Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound
Authors:
Hong Wu,
Juan Fu,
Hongsheng Ye,
Yuming Zhong,
Xuebin Zou,
Jianhua Zhou,
Yi Wang
Abstract:
Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica…
▽ More
Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classification using multi-modality TRUS. The proposed framework employs two separate 3D ResNet-50 to extract distinctive features from B-mode and shear wave elastography (SWE). Additionally, an attention module is incorporated to effectively refine B-mode features and aggregate the extracted features from both modalities. Furthermore, we utilize few shot segmentation task to enhance the capacity of classification encoder. Due to the limited availability of csPCa masks, a prototype correction module is employed to extract representative prototypes of csPCa. The performance of the framework is assessed on a large-scale dataset consisting of 512 TRUS videos with biopsy-proved prostate cancer. The results demonstrate the strong capability in accurately identifying csPCa, achieving an area under the curve (AUC) of 0.86. Moreover, the framework generates visual class activation mapping (CAM), which can serve as valuable assistance for localizing csPCa. These CAM images may offer valuable guidance during TRUS-guided targeted biopsies, enhancing the efficacy of the biopsy procedure.The code is available at https://github.com/2313595986/SmileCode.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Artificial Intelligence for Neuro MRI Acquisition: A Review
Authors:
Hongjia Yang,
Guanhua Wang,
Ziyu Li,
Haoxiang Li,
Jialan Zheng,
Yuxin Hu,
Xiaozhi Cao,
Congyu Liao,
Huihui Ye,
Qiyuan Tian
Abstract:
Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti…
▽ More
Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potential in enhancing the efficiency and throughput of acquisition steps. This review discusses several pivotal AI-based methods in neuro MRI acquisition, focusing on their technological advances, impact on clinical practice, and potential risks.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Distributed Matrix Pencil Formulations for Prescribed-Time Leader-Following Consensus of MASs with Unknown Sensor Sensitivity
Authors:
Hefu Ye,
Changyun Wen,
Yongduan Song
Abstract:
In this paper, we address the problem of prescribed-time leader-following consensus of heterogeneous multi-agent systems (MASs) in the presence of unknown sensor sensitivity. Under a connected undirected topology, we propose a time-varying dual observer/controller design framework that makes use of regular local and inaccurate feedback to achieve consensus tracking within a prescribed time. In par…
▽ More
In this paper, we address the problem of prescribed-time leader-following consensus of heterogeneous multi-agent systems (MASs) in the presence of unknown sensor sensitivity. Under a connected undirected topology, we propose a time-varying dual observer/controller design framework that makes use of regular local and inaccurate feedback to achieve consensus tracking within a prescribed time. In particular, the developed analysis framework is applicable to MASs equipped with sensors of different sensitivities. One of the design innovations involves constructing a distributed matrix pencil formulation based on worst-case sensors, yielding control parameters with sufficient robustness yet relatively low conservatism. Another novelty is the construction of the control gains, which consists of the product of a proportional coefficient obtained from the matrix pencil formulation and a classic time-varying function that grows to infinity or a novel bounded time-varying function. Furthermore, it is possible to extend the prescribed-time distributed protocol to infinite time domain by introducing the bounded time-varying gain technique without sacrificing the ultimate control accuracy, and the corresponding technical proof is comprehensive. The effectiveness of the method is demonstrated through a group of 5 single-link robot manipulators.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Multi-modality transrectal ultrasound video classification for identification of clinically significant prostate cancer
Authors:
Hong Wu,
Juan Fu,
Hongsheng Ye,
Yuming Zhong,
Xuebin Zhou,
Jianhua Zhou,
Yi Wang
Abstract:
Prostate cancer is the most common noncutaneous cancer in the world. Recently, multi-modality transrectal ultrasound (TRUS) has increasingly become an effective tool for the guidance of prostate biopsies. With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos. The frame…
▽ More
Prostate cancer is the most common noncutaneous cancer in the world. Recently, multi-modality transrectal ultrasound (TRUS) has increasingly become an effective tool for the guidance of prostate biopsies. With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos. The framework utilizes two 3D ResNet-50 models to extract features from B-mode images and shear wave elastography images, respectively. An adaptive spatial fusion module is introduced to aggregate two modalities' features. An orthogonal regularized loss is further used to mitigate feature redundancy. The proposed framework is evaluated on an in-house dataset containing 512 TRUS videos, and achieves favorable performance in identifying csPCa with an area under curve (AUC) of 0.84. Furthermore, the visualized class activation mapping (CAM) images generated from the proposed framework may provide valuable guidance for the localization of csPCa, thus facilitating the TRUS-guided targeted biopsy. Our code is publicly available at https://github.com/2313595986/ProstateTRUS.
△ Less
Submitted 17 February, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
On Purely Data-Driven Massive MIMO Detectors
Authors:
Hao Ye,
Le Liang
Abstract:
The development of learning-based detectors for massive multi-input multi-output (MIMO) systems has been hindered by the inherent complexities arising from the problem's high dimensionality. To enhance scalability, most previous studies have adopted model-driven methodologies that integrate deep neural networks (DNNs) within existing iterative detection frameworks. However, these methods often lac…
▽ More
The development of learning-based detectors for massive multi-input multi-output (MIMO) systems has been hindered by the inherent complexities arising from the problem's high dimensionality. To enhance scalability, most previous studies have adopted model-driven methodologies that integrate deep neural networks (DNNs) within existing iterative detection frameworks. However, these methods often lack flexibility and involve substantial computational complexity. In this paper, we introduce ChannelNet, a purely data-driven learning-based massive MIMO detector that overcomes these limitations. ChannelNet exploits the inherent symmetry of MIMO systems by incorporating channel-embedded layers and antenna-wise shared feature processors. These modules maintain equivariance to antenna permutations and enable ChannelNet to scale efficiently to large numbers of antennas and high modulation orders with low computational complexity, specifically $\mathcal{O}(N_t N_r)$, where $N_t$ and $N_r$ denote the numbers of transmit and receive antennas, respectively. Theoretically, ChannelNet can approximate any continuous permutation-symmetric function and the optimal maximum likelihood detection (ML) function with arbitrary precision under any continuous channel distribution. Empirical evaluations demonstrate that ChannelNet consistently outperforms or matches state-of-the-art detectors across different numbers of antennas, modulation schemes, and channel distributions, all while significantly reducing computational overhead. This study highlights the potential of purely data-driven designs in advancing efficient and scalable detectors for massive MIMO systems.
△ Less
Submitted 29 July, 2025; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Semantic Communication for Cooperative Perception based on Importance Map
Authors:
Yucheng Sheng,
Hao Ye,
Le Liang,
Shi Jin,
Geoffrey Ye Li
Abstract:
Cooperative perception, which has a broader perception field than single-vehicle perception, has played an increasingly important role in autonomous driving to conduct 3D object detection. Through vehicle-to-vehicle (V2V) communication technology, various connected automated vehicles (CAVs) can share their sensory information (LiDAR point clouds) for cooperative perception. We employ an importance…
▽ More
Cooperative perception, which has a broader perception field than single-vehicle perception, has played an increasingly important role in autonomous driving to conduct 3D object detection. Through vehicle-to-vehicle (V2V) communication technology, various connected automated vehicles (CAVs) can share their sensory information (LiDAR point clouds) for cooperative perception. We employ an importance map to extract significant semantic information and propose a novel cooperative perception semantic communication scheme with intermediate fusion. Meanwhile, our proposed architecture can be extended to the challenging time-varying multipath fading channel. To alleviate the distortion caused by the time-varying multipath fading, we adopt explicit orthogonal frequency-division multiplexing (OFDM) blocks combined with channel estimation and channel equalization. Simulation results demonstrate that our proposed model outperforms the traditional separate source-channel coding over various channel models. Moreover, a robustness study indicates that only part of semantic information is key to cooperative perception. Although our proposed model has only been trained over one specific channel, it has the ability to learn robust coded representations of semantic information that remain resilient to various channel models, demonstrating its generality and robustness.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Aperture Diffraction for Compact Snapshot Spectral Imaging
Authors:
Tao Lv,
Hao Ye,
Quan Yuan,
Zhan Shi,
Yibo Wang,
Shuming Wang,
Xun Cao
Abstract:
We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multip…
▽ More
We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multiplexed to discrete encoding locations on the mosaic filter sensor by diffraction-based spatial-spectral projection engineering generated from the orthogonal mask. The orthogonal projection is uniformly accepted to obtain a weakly calibration-dependent data form to enhance modulation robustness. Meanwhile, the Cascade Shift-Shuffle Spectral Transformer (CSST) with strong perception of the diffraction degeneration is designed to solve a sparsity-constrained inverse problem, realizing the volume reconstruction from 2D measurements with Large amount of aliasing. Our system is evaluated by elaborating the imaging optical theory and reconstruction algorithm with demonstrating the experimental imaging under a single exposure. Ultimately, we achieve the sub-super-pixel spatial resolution and high spectral resolution imaging. The code will be available at: https://github.com/Krito-ex/CSST.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold
Authors:
Jun Chen,
Haishan Ye,
Mengmeng Wang,
Tianxin Huang,
Guang Dai,
Ivor W. Tsang,
Yong Liu
Abstract:
The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios.…
▽ More
The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios. This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold. The optimization problem is distributed among a network of agents, where each agent is associated with a local function, and the communication between agents occurs over an undirected connected graph. Since the Stiefel manifold is a non-convex set, a global function is represented as a finite sum of possibly non-convex (but smooth) local functions. The proposed method is free from expensive Riemannian geometric operations such as retractions, exponential maps, and vector transports, thereby reducing the computational complexity required by each agent. To the best of our knowledge, DRCGD is the first decentralized Riemannian conjugate gradient algorithm to achieve global convergence over the Stiefel manifold.
△ Less
Submitted 12 March, 2024; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Accelerated Nonconvex ADMM with Self-Adaptive Penalty for Rank-Constrained Model Identification
Authors:
Qingyuan Liu,
Zhengchao Huang,
Hao Ye,
Dexian Huang,
Chao Shang
Abstract:
The alternating direction method of multipliers (ADMM) has been widely adopted in low-rank approximation and low-order model identification tasks; however, the performance of nonconvex ADMM is highly reliant on the choice of penalty parameter. To accelerate ADMM for solving rank-constrained identification problems, this paper proposes a new self-adaptive strategy for automatic penalty update. Guid…
▽ More
The alternating direction method of multipliers (ADMM) has been widely adopted in low-rank approximation and low-order model identification tasks; however, the performance of nonconvex ADMM is highly reliant on the choice of penalty parameter. To accelerate ADMM for solving rank-constrained identification problems, this paper proposes a new self-adaptive strategy for automatic penalty update. Guided by first-order analysis of the increment of the augmented Lagrangian, the self-adaptive penalty updating enables effective and balanced minimization of both primal and dual residuals and thus ensures a stable convergence. Moreover, improved efficiency can be obtained within the Anderson acceleration scheme. Numerical examples show that the proposed strategy significantly accelerates the convergence of nonconvex ADMM while alleviating the critical reliance on tedious tuning of penalty parameters.
△ Less
Submitted 8 September, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Accelerated MR Fingerprinting with Low-Rank and Generative Subspace Modeling
Authors:
Hengfa Lu,
Huihui Ye,
Lawrence L. Wald,
Bo Zhao
Abstract:
Magnetic Resonance (MR) Fingerprinting is an emerging multi-parametric quantitative MR imaging technique, for which image reconstruction methods utilizing low-rank and subspace constraints have achieved state-of-the-art performance. However, this class of methods often suffers from an ill-conditioned model-fitting issue, which degrades the performance as the data acquisition lengths become short a…
▽ More
Magnetic Resonance (MR) Fingerprinting is an emerging multi-parametric quantitative MR imaging technique, for which image reconstruction methods utilizing low-rank and subspace constraints have achieved state-of-the-art performance. However, this class of methods often suffers from an ill-conditioned model-fitting issue, which degrades the performance as the data acquisition lengths become short and/or the signal-to-noise ratio becomes low. To address this problem, we present a new image reconstruction method for MR Fingerprinting, integrating low-rank and subspace modeling with a deep generative prior. Specifically, the proposed method captures the strong spatiotemporal correlation of contrast-weighted time-series images in MR Fingerprinting via a low-rank factorization. Further, it utilizes an untrained convolutional generative neural network to represent the spatial subspace of the low-rank model, while estimating the temporal subspace of the model from simulated magnetization evolutions generated based on spin physics. Here the architecture of the generative neural network serves as an effective regularizer for the ill-conditioned inverse problem without additional spatial training data that are often expensive to acquire. The proposed formulation results in a non-convex optimization problem, for which we develop an algorithm based on variable splitting and alternating direction method of multipliers.We evaluate the performance of the proposed method with numerical simulations and in vivo experiments and demonstrate that the proposed method outperforms the state-of-the-art low-rank and subspace reconstruction.
△ Less
Submitted 24 May, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
Mpox-AISM: AI-Mediated Super Monitoring for Mpox and Like-Mpox
Authors:
Yubiao Yue,
Minghua Jiang,
Xinyue Zhang,
Jialong Xu,
Huacong Ye,
Fan Zhang,
Zhenzhang Li,
Yang Li
Abstract:
Swift and accurate diagnosis for earlier-stage monkeypox (mpox) patients is crucial to avoiding its spread. However, the similarities between common skin disorders and mpox and the need for professional diagnosis unavoidably impaired the diagnosis of earlier-stage mpox patients and contributed to mpox outbreak. To address the challenge, we proposed "Super Monitoring", a real-time visualization tec…
▽ More
Swift and accurate diagnosis for earlier-stage monkeypox (mpox) patients is crucial to avoiding its spread. However, the similarities between common skin disorders and mpox and the need for professional diagnosis unavoidably impaired the diagnosis of earlier-stage mpox patients and contributed to mpox outbreak. To address the challenge, we proposed "Super Monitoring", a real-time visualization technique employing artificial intelligence (AI) and Internet technology to diagnose earlier-stage mpox cheaply, conveniently, and quickly. Concretely, AI-mediated "Super Monitoring" (mpox-AISM) integrates deep learning models, data augmentation, self-supervised learning, and cloud services. According to publicly accessible datasets, mpox-AISM's Precision, Recall, Specificity, and F1-score in diagnosing mpox reach 99.3%, 94.1%, 99.9%, and 96.6%, respectively, and it achieves 94.51% accuracy in diagnosing mpox, six like-mpox skin disorders, and normal skin. With the Internet and communication terminal, mpox-AISM has the potential to perform real-time and accurate diagnosis for earlier-stage mpox in real-world scenarios, thereby preventing mpox outbreak.
△ Less
Submitted 15 June, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Securing Biomedical Images from Unauthorized Training with Anti-Learning Perturbation
Authors:
Yixin Liu,
Haohui Ye,
Kai Zhang,
Lichao Sun
Abstract:
The volume of open-source biomedical data has been essential to the development of various spheres of the healthcare community since more `free' data can provide individual researchers more chances to contribute. However, institutions often hesitate to share their data with the public due to the risk of data exploitation by unauthorized third parties for another commercial usage (e.g., training AI…
▽ More
The volume of open-source biomedical data has been essential to the development of various spheres of the healthcare community since more `free' data can provide individual researchers more chances to contribute. However, institutions often hesitate to share their data with the public due to the risk of data exploitation by unauthorized third parties for another commercial usage (e.g., training AI models). This phenomenon might hinder the development of the whole healthcare research community. To address this concern, we propose a novel approach termed `unlearnable biomedical image' for protecting biomedical data by injecting imperceptible but delusive noises into the data, making them unexploitable for AI models. We formulate the problem as a bi-level optimization and propose three kinds of anti-learning perturbation generation approaches to solve the problem. Our method is an important step toward encouraging more institutions to contribute their data for the long-term development of the research community.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
Deep Learning Approach for Dynamic Sampling for Multichannel Mass Spectrometry Imaging
Authors:
David Helminiak,
Hang Hu,
Julia Laskin,
Dong Hye Ye
Abstract:
Mass Spectrometry Imaging (MSI), using traditional rectilinear scanning, takes hours to days for high spatial resolution acquisitions. Given that most pixels within a sample's field of view are often neither relevant to underlying biological structures nor chemically informative, MSI presents as a prime candidate for integration with sparse and dynamic sampling algorithms. During a scan, stochasti…
▽ More
Mass Spectrometry Imaging (MSI), using traditional rectilinear scanning, takes hours to days for high spatial resolution acquisitions. Given that most pixels within a sample's field of view are often neither relevant to underlying biological structures nor chemically informative, MSI presents as a prime candidate for integration with sparse and dynamic sampling algorithms. During a scan, stochastic models determine which locations probabilistically contain information critical to the generation of low-error reconstructions. Decreasing the number of required physical measurements thereby minimizes overall acquisition times. A Deep Learning Approach for Dynamic Sampling (DLADS), utilizing a Convolutional Neural Network (CNN) and encapsulating molecular mass intensity distributions within a third dimension, demonstrates a simulated 70% throughput improvement for Nanospray Desorption Electrospray Ionization (nano-DESI) MSI tissues. Evaluations are conducted between DLADS and a Supervised Learning Approach for Dynamic Sampling, with Least-Squares regression (SLADS-LS) and a Multi-Layer Perceptron (MLP) network (SLADS-Net). When compared with SLADS-LS, limited to a single m/z channel, as well as multichannel SLADS-LS and SLADS-Net, DLADS respectively improves regression performance by 36.7%, 7.0%, and 6.2%, resulting in gains to reconstruction quality of 6.0%, 2.1%, and 3.4% for acquisition of targeted m/z.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Adaptive Control with Global Exponential Stability for Parameter-Varying Nonlinear Systems under Unknown Control Gains
Authors:
Hefu Ye,
Haijia Wu,
Kai Zhao,
Yongduan Song
Abstract:
It is nontrivial to achieve exponential stability even for time-invariant nonlinear systems with matched uncertainties and persistent excitation (PE) condition. In this paper, without the need for PE condition, we address the problem of global exponential stabilization of strict-feedback systems with mismatched uncertainties and unknown yet time-varying control gains. The resultant control, embedd…
▽ More
It is nontrivial to achieve exponential stability even for time-invariant nonlinear systems with matched uncertainties and persistent excitation (PE) condition. In this paper, without the need for PE condition, we address the problem of global exponential stabilization of strict-feedback systems with mismatched uncertainties and unknown yet time-varying control gains. The resultant control, embedded with time-varying feedback gains, is capable of ensuring global exponential stability of parametric-strict-feedback systems in the absence of persistence of excitation. By using the enhanced Nussbaum function, the previous results are extended to more general nonlinear systems where the sign and magnitude of the time-varying control gain are unknown. In particular, the argument of the Nussbaum function is guaranteed to be always positive with the aid of nonlinear damping design, which is critical to perform a straightforward technical analysis of the boundedness of the Nussbaum function. Finally, the global exponential stability of parameter-varying strict-feedback systems, the boundedness of the control input and the update rate, and the asymptotic constancy of the parameter estimate are established. Numerical simulations are carried out to verify the effectiveness and benefits of the proposed methods.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Prescribed-Time Control and Its Latest Developments
Authors:
Hefu Ye,
Yongduan Song,
Frank L. Lewis
Abstract:
Prescribed-time (PT) control, originated from \textit{Song et al.}, has gained increasing attention among control community. The salient feature of PT control lies in its ability to achieve system stability within a finite settling time user-assignable in advance irrespective of initial conditions. It is such a unique feature that has enticed many follow-up studies on this technically important ar…
▽ More
Prescribed-time (PT) control, originated from \textit{Song et al.}, has gained increasing attention among control community. The salient feature of PT control lies in its ability to achieve system stability within a finite settling time user-assignable in advance irrespective of initial conditions. It is such a unique feature that has enticed many follow-up studies on this technically important area, motivating numerous research advancements. In this article, we provide a comprehensive survey on the recent developments in PT control. Through a concise introduction to the concept of PT control, and a unique taxonomy covering: 1) from robust PT control to adaptive PT control; 2) from PT control for single-input-single-output (SISO) systems to multi-input-multi-output (MIMO) systems; and 3) from PT control for single systems to multi-agent systems, we present an accessible review of this interesting topic. We highlight key techniques, fundamental assumptions adopted in various developments as well as some new design ideas. We also discuss several possibles future research directions towards PT control.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Robust Adaptive Prescribed-Time Control for Parameter-Varying Nonlinear Systems
Authors:
Hefu Ye,
Yongduan Song
Abstract:
It is an interesting open problem to achieve adaptive prescribed-time control for strict-feedback systems with unknown and fast or even abrupt time-varying parameters. In this paper we present a solution with the aid of several design and analysis innovations. First, by using a spatiotemporal transformation, we convert the original system operational over finite time interval into one operational…
▽ More
It is an interesting open problem to achieve adaptive prescribed-time control for strict-feedback systems with unknown and fast or even abrupt time-varying parameters. In this paper we present a solution with the aid of several design and analysis innovations. First, by using a spatiotemporal transformation, we convert the original system operational over finite time interval into one operational over infinite time interval, allowing for Lyapunov asymptotic design and recasting prescribed-time stabilization on finite time domain into asymptotic stabilization on infinite time domain. Second, to deal with time-varying parameters with unknown variation boundaries, we use congelation of variables method and establish three separate adaptive laws for parameter estimation (two for the unknown parameters in the feedback path and one for the unknown parameter in the input path), in doing so we utilize two tuning functions to eliminate over-parametrization. Third, to achieve asymptotic convergence for the transformed system, we make use of nonlinear damping design and non-regressor-based design to cope with time-varying perturbations, and finally, we derive the prescribed-time control scheme from the asymptotic controller via inverse temporal-scale transformation. The boundedness of all closed-loop signals and control input is proved rigorously through Lyapunov analysis, squeeze theorem, and two novel lemmas built upon the method of variation of constants. Numerical simulation verifies the effectiveness of the proposed method.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Stain-Adaptive Self-Supervised Learning for Histopathology Image Analysis
Authors:
Hai-Li Ye,
Da-Han Wang
Abstract:
It is commonly recognized that color variations caused by differences in stains is a critical issue for histopathology image analysis. Existing methods adopt color matching, stain separation, stain transfer or the combination of them to alleviate the stain variation problem. In this paper, we propose a novel Stain-Adaptive Self-Supervised Learning(SASSL) method for histopathology image analysis. O…
▽ More
It is commonly recognized that color variations caused by differences in stains is a critical issue for histopathology image analysis. Existing methods adopt color matching, stain separation, stain transfer or the combination of them to alleviate the stain variation problem. In this paper, we propose a novel Stain-Adaptive Self-Supervised Learning(SASSL) method for histopathology image analysis. Our SASSL integrates a domain-adversarial training module into the SSL framework to learn distinctive features that are robust to both various transformations and stain variations. The proposed SASSL is regarded as a general method for domain-invariant feature extraction which can be flexibly combined with arbitrary downstream histopathology image analysis modules (e.g. nuclei/tissue segmentation) by fine-tuning the features for specific downstream tasks. We conducted experiments on publicly available pathological image analysis datasets including the PANDA, BreastPathQ, and CAMELYON16 datasets, achieving the state-of-the-art performance. Experimental results demonstrate that the proposed method can robustly improve the feature extraction ability of the model, and achieve stable performance improvement in downstream tasks.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Low-dose CT reconstruction by self-supervised learning in the projection domain
Authors:
Long Zhou,
Xiaozhuang Wang,
Min Hou,
Ping Li,
Chunlong Fu,
Yanjun Ren,
Tingting Shao,
Xi Hu,
Jihong Sun,
Hongwei Ye
Abstract:
In the intention of minimizing excessive X-ray radiation administration to patients, low-dose computed tomography (LDCT) has become a distinct trend in radiology. However, while lowering the radiation dose reduces the risk to the patient, it also increases noise and artifacts, compromising image quality and clinical diagnosis. In most supervised learning methods, paired CT images are required, but…
▽ More
In the intention of minimizing excessive X-ray radiation administration to patients, low-dose computed tomography (LDCT) has become a distinct trend in radiology. However, while lowering the radiation dose reduces the risk to the patient, it also increases noise and artifacts, compromising image quality and clinical diagnosis. In most supervised learning methods, paired CT images are required, but such images are unlikely to be available in the clinic. We present a self-supervised learning model (Noise2Projection) that fully exploits the raw projection images to reduce noise and improve the quality of reconstructed LDCT images. Unlike existing self-supervised algorithms, the proposed method only requires noisy CT projection images and reduces noise by exploiting the correlation between nearby projection images. We trained and tested the model using clinical data and the quantitative and qualitative results suggest that our model can effectively reduce LDCT image noise while also drastically removing artifacts in LDCT images.
△ Less
Submitted 13 March, 2022;
originally announced March 2022.
-
Adaptive Control with Guaranteed Transient Behavior and Zero Steady-State Error for Systems with Time-Varying Parameters
Authors:
Hefu Ye,
Yongduan Song
Abstract:
It is nontrivial to achieve global zero-error regulation for uncertain nonlinear systems. The underlying problem becomes even more challenging if mismatched uncertainties and unknown time-varying control gain are involved, yet certain performance specifications are also pursued. In this work, we present an adaptive control method, which, without the persistent excitation (PE) condition, is able to…
▽ More
It is nontrivial to achieve global zero-error regulation for uncertain nonlinear systems. The underlying problem becomes even more challenging if mismatched uncertainties and unknown time-varying control gain are involved, yet certain performance specifications are also pursued. In this work, we present an adaptive control method, which, without the persistent excitation (PE) condition, is able to ensure global zero-error regulation with guaranteed output performance for parametric strict-feedback systems involving fast time-varying parameters in the feedback path and input path. The development of our control scheme benefits from generalized t-dependent and x-dependent functions, a novel coordinate transformation and "congelation of variables" method. Both theoretical analysis and numerical simulation verify the effectiveness and benefits of the proposed method.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
Backstepping Design Embedded With Time-Varying Command Filters
Authors:
Hefu Ye,
Yongduan Song
Abstract:
If embedded with command filter properly, the implementation of backstepping design could be dramatically simplified. In this paper, we introduce a command filter with time-varying gain and integrate it with backstepping design, resulting in a new set of backstepping control algorithms with low complexity even for high-order strict-feedback systems. Furthermore, with the aid of "softening" sign fu…
▽ More
If embedded with command filter properly, the implementation of backstepping design could be dramatically simplified. In this paper, we introduce a command filter with time-varying gain and integrate it with backstepping design, resulting in a new set of backstepping control algorithms with low complexity even for high-order strict-feedback systems. Furthermore, with the aid of "softening" sign function based compensator, zero-error output tracking is ensured while at the same time maintaining prescribed transient performance. Numerical simulation is carried out to verify the effectiveness and benefits of the proposed method.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Prescribed-time Control for Linear Systems in Canonical Form Via Nonlinear Feedback
Authors:
Hefu Ye,
Yongduan Song
Abstract:
For systems in canonical form with nonvanishing uncertainties/disturbances, this work presents an approach to full state regulation within prescribed time irrespective of initial conditions. By introducing the smooth hyperbolic-tangent-like function, a nonlinear and time-varying state feedback control scheme is constructed, which is further extended to address output feedback based prescribed-time…
▽ More
For systems in canonical form with nonvanishing uncertainties/disturbances, this work presents an approach to full state regulation within prescribed time irrespective of initial conditions. By introducing the smooth hyperbolic-tangent-like function, a nonlinear and time-varying state feedback control scheme is constructed, which is further extended to address output feedback based prescribed-time regulation by invoking the prescribed-time observer, all are applicable over the entire operational time zone. As an alternative to full state regulation within user-assignable time interval, the proposed method analytically bridges the divide between linear and nonlinear feedback based prescribed-time control, and is able to achieve asymptotic stability, exponential stability and prescribed-time stability with a unified control structure.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Accurate parameter estimation using scan-specific unsupervised deep learning for relaxometry and MR fingerprinting
Authors:
Mengze Gao,
Huihui Ye,
Tae Hyung Kim,
Zijing Zhang,
Seohee So,
Berkin Bilgic
Abstract:
We propose an unsupervised convolutional neural network (CNN) for relaxation parameter estimation. This network incorporates signal relaxation and Bloch simulations while taking advantage of residual learning and spatial relations across neighboring voxels. Quantification accuracy and robustness to noise is shown to be significantly improved compared to standard parameter estimation methods in num…
▽ More
We propose an unsupervised convolutional neural network (CNN) for relaxation parameter estimation. This network incorporates signal relaxation and Bloch simulations while taking advantage of residual learning and spatial relations across neighboring voxels. Quantification accuracy and robustness to noise is shown to be significantly improved compared to standard parameter estimation methods in numerical simulations and in vivo data for multi-echo T2 and T2* mapping. The combination of the proposed network with subspace modeling and MR fingerprinting (MRF) from highly undersampled data permits high quality T1 and T2 mapping.
△ Less
Submitted 12 December, 2021; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Unsupervised PET Reconstruction from a Bayesian Perspective
Authors:
Chenyu Shen,
Wenjun Xia,
Hongwei Ye,
Mingzheng Hou,
Hu Chen,
Yan Liu,
Jiliu Zhou,
Yi Zhang
Abstract:
Positron emission tomography (PET) reconstruction has become an ill-posed inverse problem due to low-count projection data, and a robust algorithm is urgently required to improve imaging quality. Recently, the deep image prior (DIP) has drawn much attention and has been successfully applied in several image restoration tasks, such as denoising and inpainting, since it does not need any labels (ref…
▽ More
Positron emission tomography (PET) reconstruction has become an ill-posed inverse problem due to low-count projection data, and a robust algorithm is urgently required to improve imaging quality. Recently, the deep image prior (DIP) has drawn much attention and has been successfully applied in several image restoration tasks, such as denoising and inpainting, since it does not need any labels (reference image). However, overfitting is a vital defect of this framework. Hence, many methods have been proposed to mitigate this problem, and DeepRED is a typical representation that combines DIP and regularization by denoising (RED). In this article, we leverage DeepRED from a Bayesian perspective to reconstruct PET images from a single corrupted sinogram without any supervised or auxiliary information. In contrast to the conventional denoisers customarily used in RED, a DnCNN-like denoiser, which can add an adaptive constraint to DIP and facilitate the computation of derivation, is employed. Moreover, to further enhance the regularization, Gaussian noise is injected into the gradient updates, deriving a Markov chain Monte Carlo (MCMC) sampler. Experimental studies on brain and whole-body datasets demonstrate that our proposed method can achieve better performance in terms of qualitative and quantitative results compared to several classic and state-of-the-art methods.
△ Less
Submitted 29 October, 2021;
originally announced October 2021.
-
2020 CATARACTS Semantic Segmentation Challenge
Authors:
Imanol Luengo,
Maria Grammatikopoulou,
Rahim Mohammadi,
Chris Walsh,
Chinedu Innocent Nwoye,
Deepak Alapatt,
Nicolas Padoy,
Zhen-Liang Ni,
Chen-Chen Fan,
Gui-Bin Bian,
Zeng-Guang Hou,
Heonjin Ha,
Jiacheng Wang,
Haojie Wang,
Dong Guo,
Lu Wang,
Guotai Wang,
Mobarakol Islam,
Bharat Giddwani,
Ren Hongliang,
Theodoros Pissas,
Claudio Ravasio,
Martin Huber,
Jeremy Birch,
Joan M. Nunez Do Rio
, et al. (15 additional authors not shown)
Abstract:
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc…
▽ More
Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presence information. In 2020, we released pixel-wise semantic annotations for anatomy and instruments for 4670 images sampled from 25 videos of the CATARACTS training set. The 2020 CATARACTS Semantic Segmentation Challenge, which was a sub-challenge of the 2020 MICCAI Endoscopic Vision (EndoVis) Challenge, presented three sub-tasks to assess participating solutions on anatomical structure and instrument segmentation. Their performance was assessed on a hidden test set of 531 images from 10 videos of the CATARACTS test set.
△ Less
Submitted 24 February, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
BUDA-SAGE with self-supervised denoising enables fast, distortion-free, high-resolution T2, T2*, para- and dia-magnetic susceptibility mapping
Authors:
Zijing Zhang,
Long Wang,
Jaejin Cho,
Congyu Liao,
Hyeong-Geol Shin,
Xiaozhi Cao,
Jongho Lee,
Jinmin Xu,
Tao Zhang,
Huihui Ye,
Kawin Setsompop,
Huafeng Liu,
Berkin Bilgic
Abstract:
To rapidly obtain high resolution T2, T2* and quantitative susceptibility mapping (QSM) source separation maps with whole-brain coverage and high geometric fidelity. We propose Blip Up-Down Acquisition for Spin And Gradient Echo imaging (BUDA-SAGE), an efficient echo-planar imaging (EPI) sequence for quantitative mapping. The acquisition includes multiple T2*-, T2'- and T2-weighted contrasts. We a…
▽ More
To rapidly obtain high resolution T2, T2* and quantitative susceptibility mapping (QSM) source separation maps with whole-brain coverage and high geometric fidelity. We propose Blip Up-Down Acquisition for Spin And Gradient Echo imaging (BUDA-SAGE), an efficient echo-planar imaging (EPI) sequence for quantitative mapping. The acquisition includes multiple T2*-, T2'- and T2-weighted contrasts. We alternate the phase-encoding polarities across the interleaved shots in this multi-shot navigator-free acquisition. A field map estimated from interim reconstructions was incorporated into the joint multi-shot EPI reconstruction with a structured low rank constraint to eliminate geometric distortion. A self-supervised MR-Self2Self (MR-S2S) neural network (NN) was utilized to perform denoising after BUDA reconstruction to boost SNR. Employing Slider encoding allowed us to reach 1 mm isotropic resolution by performing super-resolution reconstruction on BUDA-SAGE volumes acquired with 2 mm slice thickness. Quantitative T2 and T2* maps were obtained using Bloch dictionary matching on the reconstructed echoes. QSM was estimated using nonlinear dipole inversion (NDI) on the gradient echoes. Starting from the estimated R2 and R2* maps, R2' information was derived and used in source separation QSM reconstruction, which provided additional para- and dia-magnetic susceptibility maps. In vivo results demonstrate the ability of BUDA-SAGE to provide whole-brain, distortion-free, high-resolution multi-contrast images and quantitative T2 and T2* maps, as well as yielding para- and dia-magnetic susceptibility maps. Derived quantitative maps showed comparable values to conventional mapping methods in phantom and in vivo measurements. BUDA-SAGE acquisition with self-supervised denoising and Slider encoding enabled rapid, distortion-free, whole-brain T2, T2* mapping at 1 mm3 isotropic resolution in 90 seconds.
△ Less
Submitted 9 September, 2021; v1 submitted 28 August, 2021;
originally announced August 2021.
-
Accelerated MRI Reconstruction with Separable and Enhanced Low-Rank Hankel Regularization
Authors:
Xinlin Zhang,
Hengfa Lu,
Di Guo,
Zongying Lai,
Huihui Ye,
Xi Peng,
Bo Zhao,
Xiaobo Qu
Abstract:
The combination of the sparse sampling and the low-rank structured matrix reconstruction has shown promising performance, enabling a significant reduction of the magnetic resonance imaging data acquisition time. However, the low-rank structured approaches demand considerable memory consumption and are time-consuming due to a noticeable number of matrix operations performed on the huge-size block H…
▽ More
The combination of the sparse sampling and the low-rank structured matrix reconstruction has shown promising performance, enabling a significant reduction of the magnetic resonance imaging data acquisition time. However, the low-rank structured approaches demand considerable memory consumption and are time-consuming due to a noticeable number of matrix operations performed on the huge-size block Hankel-like matrix. In this work, we proposed a novel framework to utilize the low-rank property but meanwhile to achieve faster reconstructions and promising results. The framework allows us to enforce the low-rankness of Hankel matrices constructing from 1D vectors instead of 2D matrices from 1D vectors and thus avoid the construction of huge block Hankel matrix for 2D k-space matrices. Moreover, under this framework, we can easily incorporate other information, such as the smooth phase of the image and the low-rankness in the parameter dimension, to further improve the image quality. We built and validated two models for parallel and parameter magnetic resonance imaging experiments, respectively. Our retrospective in-vivo results indicate that the proposed approaches enable faster reconstructions than the state-of-the-art approaches, e.g., about 8x faster than STDLRSPIRiT, and faithful removal of undersampling artifacts.
△ Less
Submitted 24 July, 2021;
originally announced July 2021.
-
CoDR: Computation and Data Reuse Aware CNN Accelerator
Authors:
Alireza Khadem,
Haojie Ye,
Trevor Mudge
Abstract:
Computation and Data Reuse is critical for the resource-limited Convolutional Neural Network (CNN) accelerators. This paper presents Universal Computation Reuse to exploit weight sparsity, repetition, and similarity simultaneously in a convolutional layer. Moreover, CoDR decreases the cost of weight memory access by proposing a customized Run-Length Encoding scheme and the number of memory accesse…
▽ More
Computation and Data Reuse is critical for the resource-limited Convolutional Neural Network (CNN) accelerators. This paper presents Universal Computation Reuse to exploit weight sparsity, repetition, and similarity simultaneously in a convolutional layer. Moreover, CoDR decreases the cost of weight memory access by proposing a customized Run-Length Encoding scheme and the number of memory accesses to the intermediate results by introducing an input and output stationary dataflow. Compared to two recent compressed CNN accelerators with the same area of 2.85 mm^2, CoDR decreases SRAM access by 5.08x and 7.99x, and consumes 3.76x and 6.84x less energy.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
AIM 2020 Challenge on Learned Image Signal Processing Pipeline
Authors:
Andrey Ignatov,
Radu Timofte,
Zhilu Zhang,
Ming Liu,
Haolin Wang,
Wangmeng Zuo,
Jiawei Zhang,
Ruimao Zhang,
Zhanglin Peng,
Sijie Ren,
Linhui Dai,
Xiaohong Liu,
Chengqi Li,
Jun Chen,
Yuichi Ito,
Bhavya Vasudeva,
Puneesh Deora,
Umapada Pal,
Zhenyu Guo,
Yu Zhu,
Tian Liang,
Chenghua Li,
Cong Leng,
Zhihong Pan,
Baopu Li
, et al. (14 additional authors not shown)
Abstract:
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB mapping problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of com…
▽ More
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB mapping problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of complex computer vision subtasks, such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The target metric used in this challenge combined fidelity scores (PSNR and SSIM) with solutions' perceptual results measured in a user study. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical image signal processing pipeline modeling.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
A Multi-resolution Model for Histopathology Image Classification and Localization with Multiple Instance Learning
Authors:
Jiayun Li,
Wenyuan Li,
Anthony Sisk,
Huihui Ye,
W. Dean Wallace,
William Speier,
Corey W. Arnold
Abstract:
Histopathological images provide rich information for disease diagnosis. Large numbers of histopathological images have been digitized into high resolution whole slide images, opening opportunities in developing computational image analysis tools to reduce pathologists' workload and potentially improve inter- and intra- observer agreement. Most previous work on whole slide image analysis has focus…
▽ More
Histopathological images provide rich information for disease diagnosis. Large numbers of histopathological images have been digitized into high resolution whole slide images, opening opportunities in developing computational image analysis tools to reduce pathologists' workload and potentially improve inter- and intra- observer agreement. Most previous work on whole slide image analysis has focused on classification or segmentation of small pre-selected regions-of-interest, which requires fine-grained annotation and is non-trivial to extend for large-scale whole slide analysis. In this paper, we proposed a multi-resolution multiple instance learning model that leverages saliency maps to detect suspicious regions for fine-grained grade prediction. Instead of relying on expensive region- or pixel-level annotations, our model can be trained end-to-end with only slide-level labels. The model is developed on a large-scale prostate biopsy dataset containing 20,229 slides from 830 patients. The model achieved 92.7% accuracy, 81.8% Cohen's Kappa for benign, low grade (i.e. Grade group 1) and high grade (i.e. Grade group >= 2) prediction, an area under the receiver operating characteristic curve (AUROC) of 98.2% and an average precision (AP) of 97.4% for differentiating malignant and benign slides. The model obtained an AUROC of 99.4% and an AP of 99.8% for cancer detection on an external dataset.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.