Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention

Authors: Zhengkang Fan, Chengkun Sun, Russell Terry, Jie Xu, Longin Jan Latecki

Abstract: Accurate prediction of malignancy in renal tumors is crucial for informing clinical decisions and optimizing treatment strategies. However, existing imaging modalities lack the necessary accuracy to reliably predict malignancy before surgical intervention. While deep learning has shown promise in malignancy prediction using 3D CT images, traditional approaches often rely on manual segmentation to… ▽ More Accurate prediction of malignancy in renal tumors is crucial for informing clinical decisions and optimizing treatment strategies. However, existing imaging modalities lack the necessary accuracy to reliably predict malignancy before surgical intervention. While deep learning has shown promise in malignancy prediction using 3D CT images, traditional approaches often rely on manual segmentation to isolate the tumor region and reduce noise, which enhances predictive performance. Manual segmentation, however, is labor-intensive, costly, and dependent on expert knowledge. In this study, a deep learning framework was developed utilizing an Organ Focused Attention (OFA) loss function to modify the attention of image patches so that organ patches attend only to other organ patches. Hence, no segmentation of 3D renal CT images is required at deployment time for malignancy prediction. The proposed framework achieved an AUC of 0.685 and an F1-score of 0.872 on a private dataset from the UF Integrated Data Repository (IDR), and an AUC of 0.760 and an F1-score of 0.852 on the publicly available KiTS21 dataset. These results surpass the performance of conventional models that rely on segmentation-based cropping for noise reduction, demonstrating the frameworks ability to enhance predictive accuracy without explicit segmentation input. The findings suggest that this approach offers a more efficient and reliable method for malignancy prediction, thereby enhancing clinical decision-making in renal cancer diagnosis. △ Less

Submitted 25 February, 2026; originally announced February 2026.

Comments: 5 pages, 2 figures, Accepted at IEEE ISBI 2026

arXiv:2506.23584 [pdf, ps, other]

A Clinically-Grounded Two-Stage Framework for Renal CT Report Generation

Authors: Renjie Liang, Zhengkang Fan, Jinqian Pan, Chenkun Sun, Bruce Daniel Steinberg, Russell Terry, Jie Xu

Abstract: Objective Renal cancer is a common malignancy and a major cause of cancer-related deaths. Computed tomography (CT) is central to early detection, staging, and treatment planning. However, the growing CT workload increases radiologists' burden and risks incomplete documentation. Automatically generating accurate reports remains challenging because it requires integrating visual interpretation with… ▽ More Objective Renal cancer is a common malignancy and a major cause of cancer-related deaths. Computed tomography (CT) is central to early detection, staging, and treatment planning. However, the growing CT workload increases radiologists' burden and risks incomplete documentation. Automatically generating accurate reports remains challenging because it requires integrating visual interpretation with clinical reasoning. Advances in artificial intelligence (AI), especially large language and vision-language models, offer potential to reduce workload and enhance diagnostic quality. Methods We propose a clinically informed, two-stage framework for automatic renal CT report generation. In Stage 1, a multi-task learning model detects structured clinical features from each 2D image. In Stage 2, a vision-language model generates free-text reports conditioned on the image and the detected features. To evaluate clinical fidelity, generated clinical features are extracted from the reports and compared with expert-annotated ground truth. Results Experiments on an expert-labeled dataset show that incorporating detected features improves both report quality and clinical accuracy. The model achieved an average AUC of 0.75 for key imaging features and a METEOR score of 0.33, demonstrating higher clinical consistency and fewer template-driven errors. Conclusion Linking structured feature detection with conditioned report generation provides a clinically grounded approach to integrate structured prediction and narrative drafting for renal CT reporting. This method enhances interpretability and clinical faithfulness, underscoring the value of domain-relevant evaluation metrics for medical AI development. △ Less

Submitted 16 October, 2025; v1 submitted 30 June, 2025; originally announced June 2025.

arXiv:2409.13154 [pdf, other]

Beyond Skip Connection: Pooling and Unpooling Design for Elimination Singularities

Authors: Chengkun Sun, Jinqian Pan, Zhuoli Jin, Russell Stevens Terry, Jiang Bian, Jie Xu

Abstract: Training deep Convolutional Neural Networks (CNNs) presents unique challenges, including the pervasive issue of elimination singularities, consistent deactivation of nodes leading to degenerate manifolds within the loss landscape. These singularities impede efficient learning by disrupting feature propagation. To mitigate this, we introduce Pool Skip, an architectural enhancement that strategicall… ▽ More Training deep Convolutional Neural Networks (CNNs) presents unique challenges, including the pervasive issue of elimination singularities, consistent deactivation of nodes leading to degenerate manifolds within the loss landscape. These singularities impede efficient learning by disrupting feature propagation. To mitigate this, we introduce Pool Skip, an architectural enhancement that strategically combines a Max Pooling, a Max Unpooling, a 3 times 3 convolution, and a skip connection. This configuration helps stabilize the training process and maintain feature integrity across layers. We also propose the Weight Inertia hypothesis, which underpins the development of Pool Skip, providing theoretical insights into mitigating degradation caused by elimination singularities through dimensional and affine compensation. We evaluate our method on a variety of benchmarks, focusing on both 2D natural and 3D medical imaging applications, including tasks such as classification and segmentation. Our findings highlight Pool Skip's effectiveness in facilitating more robust CNN training and improving model performance. △ Less

Submitted 10 December, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.13146 [pdf, other]

GASA-UNet: Global Axial Self-Attention U-Net for 3D Medical Image Segmentation

Authors: Chengkun Sun, Russell Stevens Terry, Jiang Bian, Jie Xu

Abstract: Accurate segmentation of multiple organs and the differentiation of pathological tissues in medical imaging are crucial but challenging, especially for nuanced classifications and ambiguous organ boundaries. To tackle these challenges, we introduce GASA-UNet, a refined U-Net-like model featuring a novel Global Axial Self-Attention (GASA) block. This block processes image data as a 3D entity, with… ▽ More Accurate segmentation of multiple organs and the differentiation of pathological tissues in medical imaging are crucial but challenging, especially for nuanced classifications and ambiguous organ boundaries. To tackle these challenges, we introduce GASA-UNet, a refined U-Net-like model featuring a novel Global Axial Self-Attention (GASA) block. This block processes image data as a 3D entity, with each 2D plane representing a different anatomical cross-section. Voxel features are defined within this spatial context, and a Multi-Head Self-Attention (MHSA) mechanism is utilized on extracted 1D patches to facilitate connections across these planes. Positional embeddings (PE) are incorporated into our attention framework, enriching voxel features with spatial context and enhancing tissue classification and organ edge delineation. Our model has demonstrated promising improvements in segmentation performance, particularly for smaller anatomical structures, as evidenced by enhanced Dice scores and Normalized Surface Dice (NSD) on three benchmark datasets, i.e., BTCV, AMOS, and KiTS23. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.13116 [pdf, other]

BGDB: Bernoulli-Gaussian Decision Block with Improved Denoising Diffusion Probabilistic Models

Authors: Chengkun Sun, Jinqian Pan, Russell Stevens Terry, Jiang Bian, Jie Xu

Abstract: Generative models can enhance discriminative classifiers by constructing complex feature spaces, thereby improving performance on intricate datasets. Conventional methods typically augment datasets with more detailed feature representations or increase dimensionality to make nonlinear data linearly separable. Utilizing a generative model solely for feature space processing falls short of unlocking… ▽ More Generative models can enhance discriminative classifiers by constructing complex feature spaces, thereby improving performance on intricate datasets. Conventional methods typically augment datasets with more detailed feature representations or increase dimensionality to make nonlinear data linearly separable. Utilizing a generative model solely for feature space processing falls short of unlocking its full potential within a classifier and typically lacks a solid theoretical foundation. We base our approach on a novel hypothesis: the probability information (logit) derived from a single model training can be used to generate the equivalent of multiple training sessions. Leveraging the central limit theorem, this synthesized probability information is anticipated to converge toward the true probability more accurately. To achieve this goal, we propose the Bernoulli-Gaussian Decision Block (BGDB), a novel module inspired by the Central Limit Theorem and the concept that the mean of multiple Bernoulli trials approximates the probability of success in a single trial. Specifically, we utilize Improved Denoising Diffusion Probabilistic Models (IDDPM) to model the probability of Bernoulli Trials. Our approach shifts the focus from reconstructing features to reconstructing logits, transforming the logit from a single iteration into logits analogous to those from multiple experiments. We provide the theoretical foundations of our approach through mathematical analysis and validate its effectiveness through experimental evaluation using various datasets for multiple imaging tasks, including both classification and segmentation. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2312.03738 [pdf, ps, other]

Syntactic Fusion: Enhancing Aspect-Level Sentiment Analysis Through Multi-Tree Graph Integration

Authors: Jane Sunny, Tom Padraig, Roggie Terry, Woods Ali

Abstract: Recent progress in aspect-level sentiment classification has been propelled by the incorporation of graph neural networks (GNNs) leveraging syntactic structures, particularly dependency trees. Nevertheless, the performance of these models is often hampered by the innate inaccuracies of parsing algorithms. To mitigate this challenge, we introduce SynthFusion, an innovative graph ensemble method tha… ▽ More Recent progress in aspect-level sentiment classification has been propelled by the incorporation of graph neural networks (GNNs) leveraging syntactic structures, particularly dependency trees. Nevertheless, the performance of these models is often hampered by the innate inaccuracies of parsing algorithms. To mitigate this challenge, we introduce SynthFusion, an innovative graph ensemble method that amalgamates predictions from multiple parsers. This strategy blends diverse dependency relations prior to the application of GNNs, enhancing robustness against parsing errors while avoiding extra computational burdens. SynthFusion circumvents the pitfalls of overparameterization and diminishes the risk of overfitting, prevalent in models with stacked GNN layers, by optimizing graph connectivity. Our empirical evaluations on the SemEval14 and Twitter14 datasets affirm that SynthFusion not only outshines models reliant on single dependency trees but also eclipses alternative ensemble techniques, achieving this without an escalation in model complexity. △ Less

Submitted 28 November, 2023; originally announced December 2023.

arXiv:2305.19956 [pdf, other]

doi 10.1016/j.compmedimag.2024.102326

MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images

Authors: Hongxu Jiang, Muhammad Imran, Preethika Muralidharan, Anjali Patel, Jake Pensa, Muxuan Liang, Tarik Benidir, Joseph R. Grajo, Jason P. Joseph, Russell Terry, John Michael DiBianco, Li-Ming Su, Yuyin Zhou, Wayne G. Brisbane, Wei Shao

Abstract: Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging… ▽ More Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging due to artifacts and indistinct borders between the prostate, bladder, and urethra in the midline. This paper presents MicroSegNet, a multi-scale annotation-guided transformer UNet model designed specifically to tackle these challenges. During the training process, MicroSegNet focuses more on regions that are hard to segment (hard regions), characterized by discrepancies between expert and non-expert annotations. We achieve this by proposing an annotation-guided binary cross entropy (AG-BCE) loss that assigns a larger weight to prediction errors in hard regions and a lower weight to prediction errors in easy regions. The AG-BCE loss was seamlessly integrated into the training process through the utilization of multi-scale deep supervision, enabling MicroSegNet to capture global contextual dependencies and local information at various scales. We trained our model using micro-US images from 55 patients, followed by evaluation on 20 patients. Our MicroSegNet model achieved a Dice coefficient of 0.939 and a Hausdorff distance of 2.02 mm, outperforming several state-of-the-art segmentation methods, as well as three human annotators with different experience levels. Our code is publicly available at https://github.com/mirthAI/MicroSegNet and our dataset is publicly available at https://zenodo.org/records/10475293. △ Less

Submitted 25 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Journal ref: Computerized Medical Imaging and Graphics (2024): 102326

Showing 1–7 of 7 results for author: Terry, R