arXiv:2603.25886 [pdf, ps, other]

Automated Quality Assessment of Blind Sweep Obstetric Ultrasound for Improved Diagnosis

Authors: Prasiddha Bhandari, Kanchan Poudel, Nishant Luitel, Bishram Acharya, Angelina Ghimire, Tyler Wellman, Kilian Koepsell, Pradeep Raj Regmi, Bishesh Khanal

Abstract: Blind Sweep Obstetric Ultrasound (BSOU) enables scalable fetal imaging in low-resource settings by allowing minimally trained operators to acquire standardized sweep videos for automated Artificial Intelligence(AI) interpretation. However, the reliability of such AI systems depends critically on the quality of the acquired sweeps, and little is known about how deviations from the intended protocol… ▽ More Blind Sweep Obstetric Ultrasound (BSOU) enables scalable fetal imaging in low-resource settings by allowing minimally trained operators to acquire standardized sweep videos for automated Artificial Intelligence(AI) interpretation. However, the reliability of such AI systems depends critically on the quality of the acquired sweeps, and little is known about how deviations from the intended protocol affect downstream predictions. In this work, we present a systematic evaluation of BSOU quality and its impact on three key AI tasks: sweep-tag classification, fetal presentation classification, and placenta-location classification. We simulate plausible acquisition deviations, including reversed sweep direction, probe inversion, and incomplete sweeps, to quantify model robustness, and we develop automated quality-assessment models capable of detecting these perturbations. To approximate real-world deployment, we simulate a feedback loop in which flagged sweeps are re-acquired, showing that such correction improves downstream task performance. Our findings highlight the sensitivity of BSOU-based AI models to acquisition variability and demonstrate that automated quality assessment can play a central role in building reliable, scalable AI-assisted prenatal ultrasound workflows, particularly in low-resource environments. △ Less

Submitted 26 March, 2026; originally announced March 2026.

arXiv:2603.22291 [pdf, ps, other]

Evaluating Large Language Models' Responses to Sexual and Reproductive Health Queries in Nepali

Authors: Medha Sharma, Supriya Khadka, Udit Chandra Aryal, Bishnu Hari Bhatta, Bijayan Bhattarai, Santosh Dahal, Kamal Gautam, Pushpa Joshi, Saugat Kafle, Shristi Khadka, Shushila Khadka, Binod Lamichhane, Shilpa Lamichhane, Anusha Parajuli, Sabina Pokharel, Suvekshya Sitaula, Neha Verma, Bishesh Khanal

Abstract: As Large Language Models (LLMs) become integrated into daily life, they are increasingly used for personal queries, including Sexual and Reproductive Health (SRH), allowing users to chat anonymously without fear of judgment. However, current evaluation methods primarily focus on accuracy, often for objective queries in high-resource languages, and lack criteria to assess usability and safety, espe… ▽ More As Large Language Models (LLMs) become integrated into daily life, they are increasingly used for personal queries, including Sexual and Reproductive Health (SRH), allowing users to chat anonymously without fear of judgment. However, current evaluation methods primarily focus on accuracy, often for objective queries in high-resource languages, and lack criteria to assess usability and safety, especially for low-resource languages and culturally sensitive domains like SRH. This paper introduces LLM Evaluation Framework (LEAF), that conducts assessments across multiple criteria: accuracy, language, usability gaps (including relevance, adequacy, and cultural appropriateness), and safety gaps (safety, sensitivity, and confidentiality). Using the LEAF framework, we assessed 14K SRH queries in Nepali from over 9K users. Responses were manually annotated by SRH experts according to the framework. Results revealed that only 35.1% of the responses were "proper", meaning they were accurate, adequate and had no major usability or safety related gaps. Insights include differences in performance between ChatGPT versions, such as similar accuracy but varying usability and safety aspects. This evaluation highlights significant limitations of current LLMs and underscores the need for improvement. The LEAF Framework is adaptable across domains and languages, particularly where usability and safety are critical, offering a pathway to better address sensitive topics. △ Less

Submitted 4 March, 2026; originally announced March 2026.

arXiv:2601.06500 [pdf]

The AI Pyramid A Conceptual Framework for Workforce Capability in the Age of AI

Authors: Alok Khatri, Bishesh Khanal

Abstract: Artificial intelligence (AI) represents a qualitative shift in technological change by extending cognitive labor itself rather than merely automating routine tasks. Recent evidence shows that generative AI disproportionately affects highly educated, white collar work, challenging existing assumptions about workforce vulnerability and rendering traditional approaches to digital or AI literacy insuf… ▽ More Artificial intelligence (AI) represents a qualitative shift in technological change by extending cognitive labor itself rather than merely automating routine tasks. Recent evidence shows that generative AI disproportionately affects highly educated, white collar work, challenging existing assumptions about workforce vulnerability and rendering traditional approaches to digital or AI literacy insufficient. This paper introduces the concept of AI Nativity, the capacity to integrate AI fluidly into everyday reasoning, problem solving, and decision making, and proposes the AI Pyramid, a conceptual framework for organizing human capability in an AI mediated economy. The framework distinguishes three interdependent capability layers: AI Native capability as a universal baseline for participation in AI augmented environments; AI Foundation capability for building, integrating, and sustaining AI enabled systems; and AI Deep capability for advancing frontier AI knowledge and applications. Crucially, the pyramid is not a career ladder but a system level distribution of capabilities required at scale. Building on this structure, the paper argues that effective AI workforce development requires treating capability formation as infrastructure rather than episodic training, centered on problem based learning embedded in work contexts and supported by dynamic skill ontologies and competency based measurement. The framework has implications for organizations, education systems, and governments seeking to align learning, measurement, and policy with the evolving demands of AI mediated work, while addressing productivity, resilience, and inequality at societal scale. △ Less

Submitted 20 February, 2026; v1 submitted 10 January, 2026; originally announced January 2026.

Comments: 14 pages

arXiv:2512.08143 [pdf, ps, other]

PolyLingua: Margin-based Inter-class Transformer for Robust Cross-domain Language Detection

Authors: Ali Lotfi Rezaabad, Bikram Khanal, Shashwat Chaurasia, Lu Zeng, Dezhi Hong, Hossein Bashashati, Thomas Butler, Megan Ganji

Abstract: Language identification is a crucial first step in multilingual systems such as chatbots and virtual assistants, enabling linguistically and culturally accurate user experiences. Errors at this stage can cascade into downstream failures, setting a high bar for accuracy. Yet, existing language identification tools struggle with key cases -- such as music requests where the song title and user langu… ▽ More Language identification is a crucial first step in multilingual systems such as chatbots and virtual assistants, enabling linguistically and culturally accurate user experiences. Errors at this stage can cascade into downstream failures, setting a high bar for accuracy. Yet, existing language identification tools struggle with key cases -- such as music requests where the song title and user language differ. Open-source tools like LangDetect, FastText are fast but less accurate, while large language models, though effective, are often too costly for low-latency or low-resource settings. We introduce PolyLingua, a lightweight Transformer-based model for in-domain language detection and fine-grained language classification. It employs a two-level contrastive learning framework combining instance-level separation and class-level alignment with adaptive margins, yielding compact and well-separated embeddings even for closely related languages. Evaluated on two challenging datasets -- Amazon Massive (multilingual digital assistant utterances) and a Song dataset (music requests with frequent code-switching) -- PolyLingua achieves 99.25% F1 and 98.15% F1, respectively, surpassing Sonnet 3.5 while using 10x fewer parameters, making it ideal for compute- and latency-constrained environments. △ Less

Submitted 10 December, 2025; v1 submitted 8 December, 2025; originally announced December 2025.

arXiv:2511.06169 [pdf, ps, other]

Local K-Similarity Constraint for Federated Learning with Label Noise

Authors: Sanskar Amgain, Prashant Shrestha, Bidur Khanal, Alina Devkota, Yash Raj Shrestha, Seungryul Baek, Prashnna Gyawali, Binod Bhattarai

Abstract: Federated learning on clients with noisy labels is a challenging problem, as such clients can infiltrate the global model, impacting the overall generalizability of the system. Existing methods proposed to handle noisy clients assume that a sufficient number of clients with clean labels are available, which can be leveraged to learn a robust global model while dampening the impact of noisy clients… ▽ More Federated learning on clients with noisy labels is a challenging problem, as such clients can infiltrate the global model, impacting the overall generalizability of the system. Existing methods proposed to handle noisy clients assume that a sufficient number of clients with clean labels are available, which can be leveraged to learn a robust global model while dampening the impact of noisy clients. This assumption fails when a high number of heterogeneous clients contain noisy labels, making the existing approaches ineffective. In such scenarios, it is important to locally regularize the clients before communication with the global model, to ensure the global model isn't corrupted by noisy clients. While pre-trained self-supervised models can be effective for local regularization, existing centralized approaches relying on pretrained initialization are impractical in a federated setting due to the potentially large size of these models, which increases communication costs. In that line, we propose a regularization objective for client models that decouples the pre-trained and classification models by enforcing similarity between close data points within the client. We leverage the representation space of a self-supervised pretrained model to evaluate the closeness among examples. This regularization, when applied with the standard objective function for the downstream task in standard noisy federated settings, significantly improves performance, outperforming existing state-of-the-art federated methods in multiple computer vision and medical image classification benchmarks. Unlike other techniques that rely on self-supervised pretrained initialization, our method does not require the pretrained model and classifier backbone to share the same architecture, making it architecture-agnostic. △ Less

Submitted 8 November, 2025; originally announced November 2025.

arXiv:2509.15558 [pdf, ps, other]

From Development to Deployment of AI-assisted Telehealth and Screening for Vision- and Hearing-threatening diseases in resource-constrained settings: Field Observations, Challenges and Way Forward

Authors: Mahesh Shakya, Bijay Adhikari, Nirsara Shrestha, Bipin Koirala, Arun Adhikari, Prasanta Poudyal, Luna Mathema, Sarbagya Buddhacharya, Bijay Khatri, Bishesh Khanal

Abstract: Vision- and hearing-threatening diseases cause preventable disability, especially in resource-constrained settings(RCS) with few specialists and limited screening setup. Large scale AI-assisted screening and telehealth has potential to expand early detection, but practical deployment is challenging in paper-based workflows and limited documented field experience exist to build upon. We provide ins… ▽ More Vision- and hearing-threatening diseases cause preventable disability, especially in resource-constrained settings(RCS) with few specialists and limited screening setup. Large scale AI-assisted screening and telehealth has potential to expand early detection, but practical deployment is challenging in paper-based workflows and limited documented field experience exist to build upon. We provide insights on challenges and ways forward in development to adoption of scalable AI-assisted Telehealth and screening in such settings. Specifically, we find that iterative, interdisciplinary collaboration through early prototyping, shadow deployment and continuous feedback is important to build shared understanding as well as reduce usability hurdles when transitioning from paper-based to AI-ready workflows. We find public datasets and AI models highly useful despite poor performance due to domain shift. In addition, we find the need for automated AI-based image quality check to capture gradable images for robust screening in high-volume camps. Our field learning stress the importance of treating AI development and workflow digitization as an end-to-end, iterative co-design process. By documenting these practical challenges and lessons learned, we aim to address the gap in contextual, actionable field knowledge for building real-world AI-assisted telehealth and mass-screening programs in RCS. △ Less

Submitted 18 September, 2025; originally announced September 2025.

Comments: Accepted to MIRASOL (Medical Image Computing in Resource Constrained Settings Workshop & KI) Workshop, 2025

arXiv:2505.07001 [pdf, ps, other]

Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models

Authors: Bidur Khanal, Sandesh Pokhrel, Sanjay Bhandari, Ramesh Rana, Nikesh Shrestha, Ram Bahadur Gurung, Cristian Linte, Angus Watson, Yash Raj Shrestha, Binod Bhattarai

Abstract: Vision-Language Models (VLMs) are becoming increasingly popular in the medical domain, bridging the gap between medical images and clinical language. Existing VLMs demonstrate an impressive ability to comprehend medical images and text queries to generate detailed, descriptive diagnostic medical reports. However, hallucination--the tendency to generate descriptions that are inconsistent with the v… ▽ More Vision-Language Models (VLMs) are becoming increasingly popular in the medical domain, bridging the gap between medical images and clinical language. Existing VLMs demonstrate an impressive ability to comprehend medical images and text queries to generate detailed, descriptive diagnostic medical reports. However, hallucination--the tendency to generate descriptions that are inconsistent with the visual content--remains a significant issue in VLMs, with particularly severe implications in the medical field. To facilitate VLM research on gastrointestinal (GI) image analysis and study hallucination, we curate a multimodal image-text GI dataset: Gut-VLM. This dataset is created using a two-stage pipeline: first, descriptive medical reports of Kvasir-v2 images are generated using ChatGPT, which introduces some hallucinated or incorrect texts. In the second stage, medical experts systematically review these reports, and identify and correct potential inaccuracies to ensure high-quality, clinically reliable annotations. Unlike traditional datasets that contain only descriptive texts, our dataset also features tags identifying hallucinated sentences and their corresponding corrections. A common approach to reducing hallucination in VLM is to finetune the model on a small-scale, problem-specific dataset. However, we take a different strategy using our dataset. Instead of finetuning the VLM solely for generating textual reports, we finetune it to detect and correct hallucinations, an approach we call hallucination-aware finetuning. Our results show that this approach is better than simply finetuning for descriptive report generation. Additionally, we conduct an extensive evaluation of state-of-the-art VLMs across several metrics, establishing a benchmark. GitHub Repo: https://github.com/bhattarailab/Hallucination-Aware-VLM. △ Less

Submitted 22 June, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

Comments: Accepted at MICCAI 2025

arXiv:2503.13470 [pdf, ps, other]

Multimodal Latent Fusion of ECG Leads for Early Assessment of Pulmonary Hypertension

Authors: Mohammod N. I. Suvon, Shuo Zhou, Prasun C. Tripathi, Wenrui Fan, Samer Alabed, Bishesh Khanal, Venet Osmani, Andrew J. Swift, Chen, Chen, Haiping Lu

Abstract: Recent advancements in early assessment of pulmonary hypertension (PH) primarily focus on applying machine learning methods to centralized diagnostic modalities, such as 12-lead electrocardiogram (12L-ECG). Despite their potential, these approaches fall short in decentralized clinical settings, e.g., point-of-care and general practice, where handheld 6-lead ECG (6L-ECG) can offer an alternative bu… ▽ More Recent advancements in early assessment of pulmonary hypertension (PH) primarily focus on applying machine learning methods to centralized diagnostic modalities, such as 12-lead electrocardiogram (12L-ECG). Despite their potential, these approaches fall short in decentralized clinical settings, e.g., point-of-care and general practice, where handheld 6-lead ECG (6L-ECG) can offer an alternative but is limited by the scarcity of labeled data for developing reliable models. To address this, we propose a lead-specific electrocardiogram multimodal variational autoencoder (\textsc{LS-EMVAE}), which incorporates a hierarchical modality expert (HiME) fusion mechanism and a latent representation alignment loss. HiME combines mixture-of-experts and product-of-experts to enable flexible, adaptive latent fusion, while the alignment loss improves coherence among lead-specific and shared representations. To alleviate data scarcity and enhance representation learning, we adopt a transfer learning strategy: the model is first pre-trained on a large unlabeled 12L-ECG dataset and then fine-tuned on smaller task-specific labeled 6L-ECG datasets. We validate \textsc{LS-EMVAE} across two retrospective cohorts in a 6L-ECG setting: 892 subjects from the ASPIRE registry for (1) PH detection and (2) phenotyping pre-/post-capillary PH, and 16,416 subjects from UK Biobank for (3) predicting elevated pulmonary atrial wedge pressure, where it consistently outperforms unimodal and multimodal baseline methods and demonstrates strong generalizability and interpretability. The code is available at https://github.com/Shef-AIRE/LS-EMVAE. △ Less

Submitted 8 September, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

arXiv:2412.14100 [pdf, other]

Parameter-efficient Fine-tuning for improved Convolutional Baseline for Brain Tumor Segmentation in Sub-Saharan Africa Adult Glioma Dataset

Authors: Bijay Adhikari, Pratibha Kulung, Jakesh Bohaju, Laxmi Kanta Poudel, Confidence Raymond, Dong Zhang, Udunna C Anazodo, Bishesh Khanal, Mahesh Shakya

Abstract: Automating brain tumor segmentation using deep learning methods is an ongoing challenge in medical imaging. Multiple lingering issues exist including domain-shift and applications in low-resource settings which brings a unique set of challenges including scarcity of data. As a step towards solving these specific problems, we propose Convolutional adapter-inspired Parameter-efficient Fine-tuning (P… ▽ More Automating brain tumor segmentation using deep learning methods is an ongoing challenge in medical imaging. Multiple lingering issues exist including domain-shift and applications in low-resource settings which brings a unique set of challenges including scarcity of data. As a step towards solving these specific problems, we propose Convolutional adapter-inspired Parameter-efficient Fine-tuning (PEFT) of MedNeXt architecture. To validate our idea, we show our method performs comparable to full fine-tuning with the added benefit of reduced training compute using BraTS-2021 as pre-training dataset and BraTS-Africa as the fine-tuning dataset. BraTS-Africa consists of a small dataset (60 train / 35 validation) from the Sub-Saharan African population with marked shift in the MRI quality compared to BraTS-2021 (1251 train samples). We first show that models trained on BraTS-2021 dataset do not generalize well to BraTS-Africa as shown by 20% reduction in mean dice on BraTS-Africa validation samples. Then, we show that PEFT can leverage both the BraTS-2021 and BraTS-Africa dataset to obtain mean dice of 0.8 compared to 0.72 when trained only on BraTS-Africa. Finally, We show that PEFT (0.80 mean dice) results in comparable performance to full fine-tuning (0.77 mean dice) which may show PEFT to be better on average but the boxplots show that full finetuning results is much lesser variance in performance. Nevertheless, on disaggregation of the dice metrics, we find that the model has tendency to oversegment as shown by high specificity (0.99) compared to relatively low sensitivity(0.75). The source code is available at https://github.com/CAMERA-MRI/SPARK2024/tree/main/PEFT_MedNeXt △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: Accepted to "The International Brain Tumor Segmentation (BraTS) challenge organized at MICCAI 2024 conference"

arXiv:2412.11451 [pdf, other]

doi 10.1007/s11227-025-06966-9

Data-Dependent Generalization Bounds for Parameterized Quantum Models Under Noise

Authors: Bikram Khanal, Pablo Rivas

Abstract: Quantum machine learning offers a transformative approach to solving complex problems, but the inherent noise hinders its practical implementation in near-term quantum devices. This obstacle makes it difficult to understand the generalizability of quantum circuit models. Designing robust quantum machine learning models under noise requires a principled understanding of complexity and generalizatio… ▽ More Quantum machine learning offers a transformative approach to solving complex problems, but the inherent noise hinders its practical implementation in near-term quantum devices. This obstacle makes it difficult to understand the generalizability of quantum circuit models. Designing robust quantum machine learning models under noise requires a principled understanding of complexity and generalization, extending beyond classical capacity measures. This study investigates the generalization properties of parameterized quantum machine learning models under the influence of noise. We present a data-dependent generalization bound grounded in the quantum Fisher information matrix. We leverage statistical learning theory to relate the parameter space volumes and training sizes to estimate the generalization capability of the trained model. We provide a structured characterization of complexity in quantum models by integrating local parameter neighborhoods and effective dimensions defined through quantum Fisher information matrix eigenvalues. We also analyze the tightness of the bound and discuss the tradeoff between model expressiveness and generalization performance. △ Less

Submitted 3 February, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

Comments: The Journal of Supercomputing

arXiv:2412.08163 [pdf, other]

NLPineers@ NLU of Devanagari Script Languages 2025: Hate Speech Detection using Ensembling of BERT-based models

Authors: Anmol Guragain, Nadika Poudel, Rajesh Piryani, Bishesh Khanal

Abstract: This paper explores hate speech detection in Devanagari-scripted languages, focusing on Hindi and Nepali, for Subtask B of the CHIPSAL@COLING 2025 Shared Task. Using a range of transformer-based models such as XLM-RoBERTa, MURIL, and IndicBERT, we examine their effectiveness in navigating the nuanced boundary between hate speech and free expression. Our best performing model, implemented as ensemb… ▽ More This paper explores hate speech detection in Devanagari-scripted languages, focusing on Hindi and Nepali, for Subtask B of the CHIPSAL@COLING 2025 Shared Task. Using a range of transformer-based models such as XLM-RoBERTa, MURIL, and IndicBERT, we examine their effectiveness in navigating the nuanced boundary between hate speech and free expression. Our best performing model, implemented as ensemble of multilingual BERT models achieve Recall of 0.7762 (Rank 3/31 in terms of recall) and F1 score of 0.6914 (Rank 17/31). To address class imbalance, we used backtranslation for data augmentation, and cosine similarity to preserve label consistency after augmentation. This work emphasizes the need for hate speech detection in Devanagari-scripted languages and presents a foundation for further research. △ Less

Submitted 12 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

arXiv:2412.05996 [pdf]

Paddy Disease Detection and Classification Using Computer Vision Techniques: A Mobile Application to Detect Paddy Disease

Authors: Bimarsha Khanal, Paras Poudel, Anish Chapagai, Bijan Regmi, Sitaram Pokhrel, Salik Ram Khanal

Abstract: Plant diseases significantly impact our food supply, causing problems for farmers, economies reliant on agriculture, and global food security. Accurate and timely plant disease diagnosis is crucial for effective treatment and minimizing yield losses. Despite advancements in agricultural technology, a precise and early diagnosis remains a challenge, especially in underdeveloped regions where agricu… ▽ More Plant diseases significantly impact our food supply, causing problems for farmers, economies reliant on agriculture, and global food security. Accurate and timely plant disease diagnosis is crucial for effective treatment and minimizing yield losses. Despite advancements in agricultural technology, a precise and early diagnosis remains a challenge, especially in underdeveloped regions where agriculture is crucial and agricultural experts are scarce. However, adopting Deep Learning applications can assist in accurately identifying diseases without needing plant pathologists. In this study, the effectiveness of various computer vision models for detecting paddy diseases is evaluated and proposed the best deep learning-based disease detection system. Both classification and detection using the Paddy Doctor dataset, which contains over 20,000 annotated images of paddy leaves for disease diagnosis are tested and evaluated. For detection, we utilized the YOLOv8 model-based model were used for paddy disease detection and CNN models and the Vision Transformer were used for disease classification. The average mAP50 of 69% for detection tasks was achieved and the Vision Transformer classification accuracy was 99.38%. It was found that detection models are effective at identifying multiple diseases simultaneously with less computing power, whereas classification models, though computationally expensive, exhibit better performance for classifying single diseases. Additionally, a mobile application was developed to enable farmers to identify paddy diseases instantly. Experiments with the app showed encouraging results in utilizing the trained models for both disease classification and treatment guidance. △ Less

Submitted 8 December, 2024; originally announced December 2024.

Comments: 21 pages,12 figures and 2 tables

arXiv:2411.09598 [pdf, other]

Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images

Authors: Bipasha Kundu, Bidur Khanal, Richard Simon, Cristian A. Linte

Abstract: Accurate left atrium (LA) segmentation from pre-operative scans is crucial for diagnosing atrial fibrillation, treatment planning, and supporting surgical interventions. While deep learning models are key in medical image segmentation, they often require extensive manually annotated data. Foundation models trained on larger datasets have reduced this dependency, enhancing generalizability and robu… ▽ More Accurate left atrium (LA) segmentation from pre-operative scans is crucial for diagnosing atrial fibrillation, treatment planning, and supporting surgical interventions. While deep learning models are key in medical image segmentation, they often require extensive manually annotated data. Foundation models trained on larger datasets have reduced this dependency, enhancing generalizability and robustness through transfer learning. We explore DINOv2, a self-supervised learning vision transformer trained on natural images, for LA segmentation using MRI. The challenges for LA's complex anatomy, thin boundaries, and limited annotated data make accurate segmentation difficult before & during the image-guided intervention. We demonstrate DINOv2's ability to provide accurate & consistent segmentation, achieving a mean Dice score of .871 & a Jaccard Index of .792 for end-to-end fine-tuning. Through few-shot learning across various data sizes & patient counts, DINOv2 consistently outperforms baseline models. These results suggest that DINOv2 effectively adapts to MRI with limited data, highlighting its potential as a competitive tool for segmentation & encouraging broader use in medical imaging. △ Less

Submitted 14 November, 2024; originally announced November 2024.

Comments: 6 pages, 3 figures, SPIE Medical Imaging, 2025

arXiv:2410.08005 [pdf, other]

NLP-Guided Synthesis: Transitioning from Sequential Programs to Distributed Programs

Authors: Arun Sanjel, Bikram Khanal, Greg Speegle, Pablo Rivas

Abstract: As the need for large-scale data processing grows, distributed programming frameworks like PySpark have become increasingly popular. However, the task of converting traditional, sequential code to distributed code remains a significant hurdle, often requiring specialized knowledge and substantial time investment. While existing tools have made strides in automating this conversion, they often fall… ▽ More As the need for large-scale data processing grows, distributed programming frameworks like PySpark have become increasingly popular. However, the task of converting traditional, sequential code to distributed code remains a significant hurdle, often requiring specialized knowledge and substantial time investment. While existing tools have made strides in automating this conversion, they often fall short in terms of speed, flexibility, and overall applicability. In this paper, we introduce ROOP, a groundbreaking tool designed to address these challenges. Utilizing a BERT-based Natural Language Processing (NLP) model, ROOP automates the translation of Python code to its PySpark equivalent, offering a streamlined solution for leveraging distributed computing resources. We evaluated ROOP using a diverse set of 14 Python programs comprising 26 loop fragments. Our results are promising: ROOP achieved a near-perfect translation accuracy rate, successfully converting 25 out of the 26 loop fragments. Notably, for simpler operations, ROOP demonstrated remarkable efficiency, completing translations in as little as 44 seconds. Moreover, ROOP incorporates a built-in testing mechanism to ensure the functional equivalence of the original and translated code, adding an extra layer of reliability. This research opens up new avenues for automating the transition from sequential to distributed programming, making the process more accessible and efficient for developers. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.05239 [pdf, other]

TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models

Authors: Rabin Adhikari, Safal Thapaliya, Manish Dhakal, Bishesh Khanal

Abstract: Vision-Language Models (VLMs) have shown impressive performance in vision tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt tuning techniques, including textual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant… ▽ More Vision-Language Models (VLMs) have shown impressive performance in vision tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt tuning techniques, including textual, visual, and multimodal prompting, offer efficient alternatives by leveraging learnable prompts. However, their application to Vision-Language Segmentation Models (VLSMs) and evaluation under significant domain shifts remain unexplored. This work presents an open-source benchmarking framework, TuneVLSeg, to integrate various unimodal and multimodal prompt tuning techniques into VLSMs, making prompt tuning usable for downstream segmentation datasets with any number of classes. TuneVLSeg includes $6$ prompt tuning strategies on various prompt depths used in $2$ VLSMs totaling of $8$ different combinations. We test various prompt tuning on $8$ diverse medical datasets, including $3$ radiology datasets (breast tumor, echocardiograph, chest X-ray pathologies) and $5$ non-radiology datasets (polyp, ulcer, skin cancer), and two natural domain segmentation datasets. Our study found that textual prompt tuning struggles under significant domain shifts, from natural-domain images to medical data. Furthermore, visual prompt tuning, with fewer hyperparameters than multimodal prompt tuning, often achieves performance competitive to multimodal approaches, making it a valuable first attempt. Our work advances the understanding and applicability of different prompt-tuning techniques for robust domain-specific segmentation. The source code is available at https://github.com/naamiinepal/tunevlseg. △ Less

Submitted 8 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

Comments: Accepted at ACCV 2024 (oral presentation)

arXiv:2409.11233 [pdf, other]

Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

Authors: Bishwash Khanal, Jeffery M. Capone

Abstract: Large language models (LLMs) offer powerful capabilities but incur substantial computational costs, driving the need for efficient compression techniques. This study evaluates the impact of popular compression methods - Magnitude Pruning, SparseGPT, and Wanda - on the LLaMA-2-7B model, focusing on the trade-offs between model size reduction, downstream task performance, and the role of calibration… ▽ More Large language models (LLMs) offer powerful capabilities but incur substantial computational costs, driving the need for efficient compression techniques. This study evaluates the impact of popular compression methods - Magnitude Pruning, SparseGPT, and Wanda - on the LLaMA-2-7B model, focusing on the trade-offs between model size reduction, downstream task performance, and the role of calibration data. Our findings reveal that while SparseGPT and Wanda preserve perplexity even at 50% sparsity, they suffer significant degradation on downstream tasks, highlighting the inadequacy of perplexity as the sole evaluation metric. To address this, we introduce Jensen-Shannon (JS) Divergence as a more comprehensive metric that captures nuanced changes in model behavior post-compression. We further demonstrate that task-specific calibration data significantly enhances the downstream performance of compressed models compared to general calibration data. This research underscores the necessity for diverse evaluation metrics and careful calibration data selection to fully understand the complexities of LLM compression and its implications for practical applications. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.07632 [pdf, other]

Learning Robust Observable to Address Noise in Quantum Machine Learning

Authors: Bikram Khanal, Pablo Rivas

Abstract: Quantum Machine Learning (QML) has emerged as a promising field that combines the power of quantum computing with the principles of machine learning. One of the significant challenges in QML is dealing with noise in quantum systems, especially in the Noisy Intermediate-Scale Quantum (NISQ) era. Noise in quantum systems can introduce errors in quantum computations and degrade the performance of qua… ▽ More Quantum Machine Learning (QML) has emerged as a promising field that combines the power of quantum computing with the principles of machine learning. One of the significant challenges in QML is dealing with noise in quantum systems, especially in the Noisy Intermediate-Scale Quantum (NISQ) era. Noise in quantum systems can introduce errors in quantum computations and degrade the performance of quantum algorithms. In this paper, we propose a framework for learning observables that are robust against noisy channels in quantum systems. We demonstrate that it is possible to learn observables that remain invariant under the effects of noise and show that this can be achieved through a machine-learning approach. We present a toy example using a Bell state under a depolarization channel to illustrate the concept of robust observables. We then describe a machine-learning framework for learning such observables across six two-qubit quantum circuits and five noisy channels. Our results show that it is possible to learn observables that are more robust to noise than conventional observables. We discuss the implications of this finding for quantum machine learning, including potential applications in enhancing the stability of QML models in noisy environments. By developing techniques for learning robust observables, we can improve the performance and reliability of quantum machine learning models in the presence of noise, contributing to the advancement of practical QML applications in the NISQ era. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.07626 [pdf, other]

doi 10.1007/s42484-024-00204-w

Generalization Error Bound for Quantum Machine Learning in NISQ Era -- A Survey

Authors: Bikram Khanal, Pablo Rivas, Arun Sanjel, Korn Sooksatra, Ernesto Quevedo, Alejandro Rodriguez

Abstract: Despite the mounting anticipation for the quantum revolution, the success of Quantum Machine Learning (QML) in the Noisy Intermediate-Scale Quantum (NISQ) era hinges on a largely unexplored factor: the generalization error bound, a cornerstone of robust and reliable machine learning models. Current QML research, while exploring novel algorithms and applications extensively, is predominantly situat… ▽ More Despite the mounting anticipation for the quantum revolution, the success of Quantum Machine Learning (QML) in the Noisy Intermediate-Scale Quantum (NISQ) era hinges on a largely unexplored factor: the generalization error bound, a cornerstone of robust and reliable machine learning models. Current QML research, while exploring novel algorithms and applications extensively, is predominantly situated in the context of noise-free, ideal quantum computers. However, Quantum Circuit (QC) operations in NISQ-era devices are susceptible to various noise sources and errors. In this article, we conduct a Systematic Mapping Study (SMS) to explore the state-of-the-art generalization bound for supervised QML in NISQ-era and analyze the latest practices in the field. Our study systematically summarizes the existing computational platforms with quantum hardware, datasets, optimization techniques, and the common properties of the bounds found in the literature. We further present the performance accuracy of various approaches in classical benchmark datasets like the MNIST and IRIS datasets. The SMS also highlights the limitations and challenges in QML in the NISQ era and discusses future research directions to advance the field. Using a detailed Boolean operators query in five reliable indexers, we collected 544 papers and filtered them to a small set of 37 relevant articles. This filtration was done following the best practice of SMS with well-defined research questions and inclusion and exclusion criteria. △ Less

Submitted 3 February, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

Comments: Quantum Machine Intelligence

Journal ref: Quantum Mach. Intell. 6, 90 (2024)

arXiv:2408.06814 [pdf, other]

Structure-preserving Planar Simplification for Indoor Environments

Authors: Bishwash Khanal, Sanjay Rijal, Manish Awale, Vaghawan Ojha

Abstract: This paper presents a novel approach for structure-preserving planar simplification of indoor scene point clouds for both simulated and real-world environments. Initially, the scene point cloud undergoes preprocessing steps, including noise reduction and Manhattan world alignment, to ensure robustness and coherence in subsequent analyses. We segment each captured scene into structured (walls-ceili… ▽ More This paper presents a novel approach for structure-preserving planar simplification of indoor scene point clouds for both simulated and real-world environments. Initially, the scene point cloud undergoes preprocessing steps, including noise reduction and Manhattan world alignment, to ensure robustness and coherence in subsequent analyses. We segment each captured scene into structured (walls-ceiling-floor) and non-structured (indoor objects) scenes. Leveraging a RANSAC algorithm, we extract primitive planes from the input point cloud, facilitating the segmentation and simplification of the structured scene. The best-fitting wall meshes are then generated from the primitives, followed by adjacent mesh merging with the vertex-translation algorithm which preserves the mesh layout. To accurately represent ceilings and floors, we employ the mesh clipping algorithm which clips the ceiling and floor meshes with respect to wall normals. In the case of indoor scenes, we apply a surface reconstruction technique to enhance the fidelity. This paper focuses on the intricate steps of the proposed scene simplification methodology, addressing complex scenarios such as multi-story and slanted walls and ceilings. We also conduct qualitative and quantitative performance comparisons against popular surface reconstruction, shape approximation, and floorplan generation approaches. △ Less

Submitted 21 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

arXiv:2407.05973 [pdf, other]

Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise

Authors: Bidur Khanal, Tianhong Dai, Binod Bhattarai, Cristian Linte

Abstract: The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular fo… ▽ More The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular focus on maximizing performance using noisy datasets, without incorporating experts-in-the-loop for actively cleaning the noisy labels. To mitigate these challenges, we propose a two-phase approach that combines Learning with Noisy Labels (LNL) and active learning. This approach not only improves the robustness of medical image classification in the presence of noisy labels, but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget. Furthermore, we introduce a novel Variance of Gradients approach in LNL phase, which complements the loss-based sample selection by also sampling under-represented samples. Using two imbalanced noisy medical classification datasets, we demonstrate that that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples. △ Less

Submitted 24 October, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted at MICCAI 2024

arXiv:2406.11877 [pdf]

Solar Power Prediction Using Satellite Data in Different Parts of Nepal

Authors: Raj Krishna Nepal, Bibek Khanal, Vibek Ghimire, Kismat Neupane, Atul Pokharel, Kshitij Niraula, Baburam Tiwari, Nawaraj Bhattarai, Khem N. Poudyal, Nawaraj Karki, Mohan B Dangi, John Biden

Abstract: Due to the unavailability of solar irradiance data for many potential sites of Nepal, the paper proposes predicting solar irradiance based on alternative meteorological parameters. The study focuses on five distinct regions in Nepal and utilizes a dataset spanning almost ten years, obtained from CERES SYN1deg and MERRA-2. Machine learning models such as Random Forest, XGBoost, K-Nearest Neighbors,… ▽ More Due to the unavailability of solar irradiance data for many potential sites of Nepal, the paper proposes predicting solar irradiance based on alternative meteorological parameters. The study focuses on five distinct regions in Nepal and utilizes a dataset spanning almost ten years, obtained from CERES SYN1deg and MERRA-2. Machine learning models such as Random Forest, XGBoost, K-Nearest Neighbors, and deep learning models like LSTM and ANN-MLP are employed and evaluated for their performance. The results indicate high accuracy in predicting solar irradiance, with R-squared(R2) scores close to unity for both train and test datasets. The impact of parameter integration on model performance is analyzed, revealing the significance of various parameters in enhancing predictive accuracy. Each model demonstrates strong performance across all parameters, consistently achieving MAE values below 6, RMSE values under 10, MBE within |2|, and nearly unity R2 values. Upon removal of various solar parameters such as "Solar_Irradiance_Clear_Sky", "UVA", etc. from the datasets, the model's performance is significantly affected. This exclusion leads to considerable increases in MAE, reaching up to 82, RMSE up to 135, and MBE up to |7|. Among the models, KNN displays the weakest performance, with an R2 of 0.7582546. Conversely, ANN exhibits the strongest performance, boasting an R2 value of 0.9245877. Hence, the study concludes that Artificial Neural Network (ANN) performs exceptionally well, showcasing its versatility even under sparse data parameter conditions. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 20 pages, 12 figures, 5 tables

arXiv:2405.06196 [pdf, other]

VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks

Authors: Manish Dhakal, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

Abstract: Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must s… ▽ More Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must spend substantial time delineating the target structure of interest. VLSMs for medical images resort to fine-tuning base VLM or VLSM pretrained on open-domain natural image datasets due to fewer annotated medical image datasets; this fine-tuning is resource-consuming and expensive as it usually requires updating all or a significant fraction of the pretrained parameters. Recently, lightweight blocks called adapters have been proposed in VLMs that keep the pretrained model frozen and only train adapters during fine-tuning, substantially reducing the computing resources required. We introduce a novel adapter, VLSM-Adapter, that can fine-tune pretrained vision-language segmentation models using transformer encoders. Our experiments in widely used CLIP-based segmentation models show that with only 3 million trainable parameters, the VLSM-Adapter outperforms state-of-the-art and is comparable to the upper bound end-to-end fine-tuning. The source code is available at: https://github.com/naamiinepal/vlsm-adapter. △ Less

Submitted 27 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted at MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

arXiv:2405.03789 [pdf, other]

On Adversarial Examples for Text Classification by Perturbing Latent Representations

Authors: Korn Sooksatra, Bikram Khanal, Pablo Rivas

Abstract: Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-th… ▽ More Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the robustness of a text classifier by using the gradients of the classifier. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 7 pages

MSC Class: 68T01; 68T50 ACM Class: I.2.7

arXiv:2404.07330 [pdf, other]

A Modified Depolarization Approach for Efficient Quantum Machine Learning

Authors: Bikram Khanal, Pablo Rivas

Abstract: Quantum Computing in the Noisy Intermediate-Scale Quantum (NISQ) era has shown promising applications in machine learning, optimization, and cryptography. Despite the progress, challenges persist due to system noise, errors, and decoherence that complicate the simulation of quantum systems. The depolarization channel is a standard tool for simulating a quantum system's noise. However, modeling suc… ▽ More Quantum Computing in the Noisy Intermediate-Scale Quantum (NISQ) era has shown promising applications in machine learning, optimization, and cryptography. Despite the progress, challenges persist due to system noise, errors, and decoherence that complicate the simulation of quantum systems. The depolarization channel is a standard tool for simulating a quantum system's noise. However, modeling such noise for practical applications is computationally expensive when we have limited hardware resources, as is the case in the NISQ era. We propose a modified representation for a single-qubit depolarization channel with two Kraus operators based only on X and Z Pauli matrices. Our approach reduces the computational complexity from six to four matrix multiplications per execution of a channel. Experiments on a Quantum Machine Learning (QML) model on the Iris dataset across various circuit depths and depolarization rates validate that our approach maintains the model's accuracy while improving efficiency. This simplified noise model enables more scalable simulations of quantum circuits under depolarization, advancing capabilities in the NISQ era. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.11936 [pdf, other]

AI-Assisted Cervical Cancer Screening

Authors: Kanchan Poudel, Lisasha Poudel, Prabin Raj Shakya, Atit Poudel, Archana Shrestha, Bishesh Khanal

Abstract: Visual Inspection with Acetic Acid (VIA) remains the most feasible cervical cancer screening test in resource-constrained settings of low- and middle-income countries (LMICs), which are often performed screening camps or primary/community health centers by nurses instead of the preferred but unavailable expert Gynecologist. To address the highly subjective nature of the test, various handheld devi… ▽ More Visual Inspection with Acetic Acid (VIA) remains the most feasible cervical cancer screening test in resource-constrained settings of low- and middle-income countries (LMICs), which are often performed screening camps or primary/community health centers by nurses instead of the preferred but unavailable expert Gynecologist. To address the highly subjective nature of the test, various handheld devices integrating cameras or smartphones have been recently explored to capture cervical images during VIA and aid decision-making via telemedicine or AI models. Most studies proposing AI models retrospectively use a relatively small number of already collected images from specific devices, digital cameras, or smartphones; the challenges and protocol for quality image acquisition during VIA in resource-constrained camp settings, challenges in getting gold standard, data imbalance, etc. are often overlooked. We present a novel approach and describe the end-to-end design process to build a robust smartphone-based AI-assisted system that does not require buying a separate integrated device: the proposed protocol for quality image acquisition in resource-constrained settings, dataset collected from 1,430 women during VIA performed by nurses in screening camps, preprocessing pipeline, and training and evaluation of a deep-learning-based classification model aimed to identify (pre)cancerous lesions. Our work shows that the readily available smartphones and a suitable protocol can capture the cervix images with the required details for the VIA test well; the deep-learning-based classification model provides promising results to assist nurses in VIA screening; and provides a direction for large-scale data collection and validation in resource-constrained settings. △ Less

Submitted 4 September, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2402.16734 [pdf, other]

Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Authors: Bidur Khanal, Prashant Shrestha, Sanskar Amgain, Bishesh Khanal, Binod Bhattarai, Cristian A. Linte

Abstract: Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of… ▽ More Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image classification. In this paper, we investigate the architectural robustness of ViT against label noise and compare it to that of CNNs. We use two medical image classification datasets -- COVID-DU-Ex, and NCT-CRC-HE-100K -- both corrupted by injecting label noise at various rates. Additionally, we show that pretraining is crucial for ensuring ViT's improved robustness against label noise in supervised training. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2401.07990 [pdf, other]

How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

Authors: Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Cristian Linte

Abstract: Noisy labels can significantly impact medical image classification, particularly in deep learning, by corrupting learned features. Self-supervised pretraining, which doesn't rely on labeled data, can enhance robustness against noisy labels. However, this robustness varies based on factors like the number of classes, dataset complexity, and training size. In medical images, subtle inter-class diffe… ▽ More Noisy labels can significantly impact medical image classification, particularly in deep learning, by corrupting learned features. Self-supervised pretraining, which doesn't rely on labeled data, can enhance robustness against noisy labels. However, this robustness varies based on factors like the number of classes, dataset complexity, and training size. In medical images, subtle inter-class differences and modality-specific characteristics add complexity. Previous research hasn't comprehensively explored the interplay between self-supervised learning and robustness against noisy labels in medical image classification, considering all these factors. In this study, we address three key questions: i) How does label noise impact various medical image classification datasets? ii) Which types of medical image datasets are more challenging to learn and more affected by label noise? iii) How do different self-supervised pretraining methods enhance robustness across various medical image datasets? Our results show that DermNet, among five datasets (Fetal plane, DermNet, COVID-DU-Ex, MURA, NCT-CRC-HE-100K), is the most challenging but exhibits greater robustness against noisy labels. Additionally, contrastive learning stands out among the eight self-supervised methods as the most effective approach to enhance robustness against noisy labels. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.06224 [pdf, other]

Medical Vision Language Pretraining: A survey

Authors: Prashant Shrestha, Sanskar Amgain, Bidur Khanal, Cristian A. Linte, Binod Bhattarai

Abstract: Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medica… ▽ More Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models can be trained to acquire vast knowledge and learn robust feature representations. Such pretrained models have the potential to enhance multiple downstream medical tasks simultaneously, reducing the dependency on labeled data. However, despite recent progress and its potential, there is no such comprehensive survey paper that has explored the various aspects and advancements in medical VLP. In this paper, we specifically review existing works through the lens of different pretraining objectives, architectures, downstream evaluation tasks, and datasets utilized for pretraining and downstream tasks. Subsequently, we delve into current challenges in medical VLP, discussing existing and potential solutions, and conclude by highlighting future directions. To the best of our knowledge, this is the first survey focused on medical VLP. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2309.13587 [pdf, other]

Benchmarking Encoder-Decoder Architectures for Biplanar X-ray to 3D Shape Reconstruction

Authors: Mahesh Shakya, Bishesh Khanal

Abstract: Various deep learning models have been proposed for 3D bone shape reconstruction from two orthogonal (biplanar) X-ray images. However, it is unclear how these models compare against each other since they are evaluated on different anatomy, cohort and (often privately held) datasets. Moreover, the impact of the commonly optimized image-based segmentation metrics such as dice score on the estimation… ▽ More Various deep learning models have been proposed for 3D bone shape reconstruction from two orthogonal (biplanar) X-ray images. However, it is unclear how these models compare against each other since they are evaluated on different anatomy, cohort and (often privately held) datasets. Moreover, the impact of the commonly optimized image-based segmentation metrics such as dice score on the estimation of clinical parameters relevant in 2D-3D bone shape reconstruction is not well known. To move closer toward clinical translation, we propose a benchmarking framework that evaluates tasks relevant to real-world clinical scenarios, including reconstruction of fractured bones, bones with implants, robustness to population shift, and error in estimating clinical parameters. Our open-source platform provides reference implementations of 8 models (many of whose implementations were not publicly available), APIs to easily collect and preprocess 6 public datasets, and the implementation of automatic clinical parameter and landmark extraction methods. We present an extensive evaluation of 8 2D-3D models on equal footing using 6 public datasets comprising images for four different anatomies. Our results show that attention-based methods that capture global spatial relationships tend to perform better across all anatomies and datasets; performance on clinically relevant subgroups may be overestimated without disaggregated reporting; ribs are substantially more difficult to reconstruct compared to femur, hip and spine; and the dice score improvement does not always bring a corresponding improvement in the automatic estimation of clinically relevant parameters. △ Less

Submitted 26 September, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: accepted to NeurIPS 2023

arXiv:2309.12829 [pdf, other]

doi 10.1007/978-3-031-44521-7_9

Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

Authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari, Bishesh Khanal

Abstract: Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially… ▽ More Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially aiding in accurate and explainable segmentation. However, the lack of readily available data in echocardiography hampers the training of VLSMs. In this study, we explore using synthetic datasets from Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography segmentation. We evaluate results for two popular VLSMs (CLIPSeg and CRIS) using seven different kinds of language prompts derived from several attributes, automatically extracted from echocardiography images, segmentation masks, and their metadata. Our results show improved metrics and faster convergence when pretraining VLSMs on SDM-generated synthetic images before finetuning on real images. The code, configs, and prompts are available at https://github.com/naamiinepal/synthetic-boost. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: Accepted at the 4th International Workshop of Advances in Simplifying Medical UltraSound (ASMUS)

arXiv:2309.12325 [pdf]

FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Authors: Karim Lekadir, Aasa Feragen, Abdul Joseph Fofanah, Alejandro F Frangi, Alena Buyx, Anais Emelie, Andrea Lara, Antonio R Porras, An-Wen Chan, Arcadi Navarro, Ben Glocker, Benard O Botwe, Bishesh Khanal, Brigit Beger, Carol C Wu, Celia Cintas, Curtis P Langlotz, Daniel Rueckert, Deogratias Mzurikwao, Dimitrios I Fotiadis, Doszhan Zhussupov, Enzo Ferrante, Erik Meijering, Eva Weicken, Fabio A González , et al. (95 additional authors not shown)

Abstract: Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted… ▽ More Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI. △ Less

Submitted 8 July, 2024; v1 submitted 11 August, 2023; originally announced September 2023.

ACM Class: I.2.0; I.4.0; I.5.0

arXiv:2308.07706 [pdf, other]

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

Authors: Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal

Abstract: Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an addition… ▽ More Medical image segmentation allows quantifying target structure size and shape, aiding in disease diagnosis, prognosis, surgery planning, and comprehension.Building upon recent advancements in foundation Vision-Language Models (VLMs) from natural image-text pairs, several studies have proposed adapting them to Vision-Language Segmentation Models (VLSMs) that allow using language text as an additional input to segmentation models. Introducing auxiliary information via text with human-in-the-loop prompting during inference opens up unique opportunities, such as open vocabulary segmentation and potentially more robust segmentation models against out-of-distribution data. Although transfer learning from natural to medical images has been explored for image-only segmentation models, the joint representation of vision-language in segmentation problems remains underexplored. This study introduces the first systematic study on transferring VLSMs to 2D medical images, using carefully curated $11$ datasets encompassing diverse modalities and insightful language prompts and experiments. Our findings demonstrate that although VLSMs show competitive performance compared to image-only models for segmentation after finetuning in limited medical image datasets, not all VLSMs utilize the additional information from language prompts, with image features playing a dominant role. While VLSMs exhibit enhanced performance in handling pooled datasets with diverse modalities and show potential robustness to domain shifts compared to conventional segmentation models, our results suggest that novel approaches are required to enable VLSMs to leverage the various auxiliary information available through language prompts. The code and datasets are available at https://github.com/naamiinepal/medvlsm. △ Less

Submitted 20 June, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: Medical Imaging with Deep Learning (MIDL) 2024 (Oral)

arXiv:2308.04551 [pdf, other]

Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining

Authors: Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Cristian A. Linte

Abstract: Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have… ▽ More Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based self-supervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels -- NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted at MICCAI 2023 DEMI Workshop

arXiv:2306.12376 [pdf, other]

M-VAAL: Multimodal Variational Adversarial Active Learning for Downstream Medical Image Analysis Tasks

Authors: Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Danail Stoyanov, Cristian A. Linte

Abstract: Acquiring properly annotated data is expensive in the medical field as it requires experts, time-consuming protocols, and rigorous validation. Active learning attempts to minimize the need for large annotated samples by actively sampling the most informative examples for annotation. These examples contribute significantly to improving the performance of supervised machine learning models, and thus… ▽ More Acquiring properly annotated data is expensive in the medical field as it requires experts, time-consuming protocols, and rigorous validation. Active learning attempts to minimize the need for large annotated samples by actively sampling the most informative examples for annotation. These examples contribute significantly to improving the performance of supervised machine learning models, and thus, active learning can play an essential role in selecting the most appropriate information in deep learning-based diagnosis, clinical assessments, and treatment planning. Although some existing works have proposed methods for sampling the best examples for annotation in medical image analysis, they are not task-agnostic and do not use multimodal auxiliary information in the sampler, which has the potential to increase robustness. Therefore, in this work, we propose a Multimodal Variational Adversarial Active Learning (M-VAAL) method that uses auxiliary information from additional modalities to enhance the active sampling. We applied our method to two datasets: i) brain tumor segmentation and multi-label classification using the BraTS2018 dataset, and ii) chest X-ray image classification using the COVID-QU-Ex dataset. Our results show a promising direction toward data-efficient learning under limited annotations. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2304.05339 [pdf, other]

doi 10.59275/j.melba.2024-a333

Deep-learning Assisted Detection and Quantification of (oo)cysts of Giardia and Cryptosporidium on Smartphone Microscopy Images

Authors: Suprim Nakarmi, Sanam Pudasaini, Safal Thapaliya, Pratima Upretee, Retina Shrestha, Basant Giri, Bhanu Bhakta Neupane, Bishesh Khanal

Abstract: The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identificatio… ▽ More The consumption of microbial-contaminated food and water is responsible for the deaths of millions of people annually. Smartphone-based microscopy systems are portable, low-cost, and more accessible alternatives for the detection of Giardia and Cryptosporidium than traditional brightfield microscopes. However, the images from smartphone microscopes are noisier and require manual cyst identification by trained technicians, usually unavailable in resource-limited settings. Automatic detection of (oo)cysts using deep-learning-based object detection could offer a solution for this limitation. We evaluate the performance of four state-of-the-art object detectors to detect (oo)cysts of Giardia and Cryptosporidium on a custom dataset that includes both smartphone and brightfield microscopic images from vegetable samples. Faster RCNN, RetinaNet, You Only Look Once (YOLOv8s), and Deformable Detection Transformer (Deformable DETR) deep-learning models were employed to explore their efficacy and limitations. Our results show that while the deep-learning models perform better with the brightfield microscopy image dataset than the smartphone microscopy image dataset, the smartphone microscopy predictions are still comparable to the prediction performance of non-experts. Also, we publicly release brightfield and smartphone microscopy datasets with the benchmark results for the detection of Giardia and Cryptosporidium, independently captured on reference (or standard lab setting) and vegetable samples. Our code and dataset are available at https://github.com/naamiinepal/smartphone_microscopy and https://doi.org/10.5281/zenodo.7813183, respectively. △ Less

Submitted 6 August, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: 21 pages (including supplementary information), 5 figures, 7 tables, Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:014

Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

arXiv:2210.05425 [pdf]

COVID-19-related Nepali Tweets Classification in a Low Resource Setting

Authors: Rabin Adhikari, Safal Thapaliya, Nirajan Basnet, Samip Poudel, Aman Shakya, Bishesh Khanal

Abstract: Billions of people across the globe have been using social media platforms in their local languages to voice their opinions about the various topics related to the COVID-19 pandemic. Several organizations, including the World Health Organization, have developed automated social media analysis tools that classify COVID-19-related tweets into various topics. However, these tools that help combat the… ▽ More Billions of people across the globe have been using social media platforms in their local languages to voice their opinions about the various topics related to the COVID-19 pandemic. Several organizations, including the World Health Organization, have developed automated social media analysis tools that classify COVID-19-related tweets into various topics. However, these tools that help combat the pandemic are limited to very few languages, making several countries unable to take their benefit. While multi-lingual or low-resource language-specific tools are being developed, they still need to expand their coverage, such as for the Nepali language. In this paper, we identify the eight most common COVID-19 discussion topics among the Twitter community using the Nepali language, set up an online platform to automatically gather Nepali tweets containing the COVID-19-related keywords, classify the tweets into the eight topics, and visualize the results across the period in a web-based dashboard. We compare the performance of two state-of-the-art multi-lingual language models for Nepali tweet classification, one generic (mBERT) and the other Nepali language family-specific model (MuRIL). Our results show that the models' relative performance depends on the data size, with MuRIL doing better for a larger dataset. The annotated data, models, and the web-based dashboard are open-sourced at https://github.com/naamiinepal/covid-tweet-classification. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted at the 7th Social Media Mining for Health (#SMM4H) Workshop, co-located at Coling 2022

arXiv:2208.00400 [pdf, other]

FixMatchSeg: Fixing FixMatch for Semi-Supervised Semantic Segmentation

Authors: Pratima Upretee, Bishesh Khanal

Abstract: Supervised deep learning methods for semantic medical image segmentation are getting increasingly popular in the past few years.However, in resource constrained settings, getting large number of annotated images is very difficult as it mostly requires experts, is expensive and time-consuming.Semi-supervised segmentation can be an attractive solution where a very few labeled images are used along w… ▽ More Supervised deep learning methods for semantic medical image segmentation are getting increasingly popular in the past few years.However, in resource constrained settings, getting large number of annotated images is very difficult as it mostly requires experts, is expensive and time-consuming.Semi-supervised segmentation can be an attractive solution where a very few labeled images are used along with a large number of unlabeled ones. While the gap between supervised and semi-supervised methods have been dramatically reduced for classification problems in the past couple of years, there still remains a larger gap in segmentation methods. In this work, we adapt a state-of-the-art semi-supervised classification method FixMatch to semantic segmentation task, introducing FixMatchSeg. FixMatchSeg is evaluated in four different publicly available datasets of different anatomy and different modality: cardiac ultrasound, chest X-ray, retinal fundus image, and skin images. When there are few labels, we show that FixMatchSeg performs on par with strong supervised baselines. △ Less

Submitted 2 August, 2022; v1 submitted 31 July, 2022; originally announced August 2022.

Comments: 2 figures, 4 tables, 9 pages + 2 pages references

arXiv:2106.15475 [pdf, other]

How Does Heterogeneous Label Noise Impact Generalization in Neural Nets?

Authors: Bidur Khanal, Christopher Kanan

Abstract: Incorrectly labeled examples, or label noise, is common in real-world computer vision datasets. While the impact of label noise on learning in deep neural networks has been studied in prior work, these studies have exclusively focused on homogeneous label noise, i.e., the degree of label noise is the same across all categories. However, in the real-world, label noise is often heterogeneous, with s… ▽ More Incorrectly labeled examples, or label noise, is common in real-world computer vision datasets. While the impact of label noise on learning in deep neural networks has been studied in prior work, these studies have exclusively focused on homogeneous label noise, i.e., the degree of label noise is the same across all categories. However, in the real-world, label noise is often heterogeneous, with some categories being affected to a greater extent than others. Here, we address this gap in the literature. We hypothesized that heterogeneous label noise would only affect the classes that had label noise unless there was transfer from those classes to the classes without label noise. To test this hypothesis, we designed a series of computer vision studies using MNIST, CIFAR-10, CIFAR-100, and MS-COCO where we imposed heterogeneous label noise during the training of multi-class, multi-task, and multi-label systems. Our results provide evidence in support of our hypothesis: label noise only affects the class affected by it unless there is transfer. △ Less

Submitted 26 September, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

arXiv:2105.05501 [pdf, other]

Label Geometry Aware Discriminator for Conditional Generative Networks

Authors: Suman Sapkota, Bidur Khanal, Binod Bhattarai, Bishesh Khanal, Tae-Kyun Kim

Abstract: Multi-domain image-to-image translation with conditional Generative Adversarial Networks (GANs) can generate highly photo realistic images with desired target classes, yet these synthetic images have not always been helpful to improve downstream supervised tasks such as image classification. Improving downstream tasks with synthetic examples requires generating images with high fidelity to the unk… ▽ More Multi-domain image-to-image translation with conditional Generative Adversarial Networks (GANs) can generate highly photo realistic images with desired target classes, yet these synthetic images have not always been helpful to improve downstream supervised tasks such as image classification. Improving downstream tasks with synthetic examples requires generating images with high fidelity to the unknown conditional distribution of the target class, which many labeled conditional GANs attempt to achieve by adding soft-max cross-entropy loss based auxiliary classifier in the discriminator. As recent studies suggest that the soft-max loss in Euclidean space of deep feature does not leverage their intrinsic angular distribution, we propose to replace this loss in auxiliary classifier with an additive angular margin (AAM) loss that takes benefit of the intrinsic angular distribution, and promotes intra-class compactness and inter-class separation to help generator synthesize high fidelity images. We validate our method on RaFD and CIFAR-100, two challenging face expression and natural image classification data set. Our method outperforms state-of-the-art methods in several different evaluation criteria including recently proposed GAN-train and GAN-test metrics designed to assess the impact of synthetic data on downstream classification task, assessing the usefulness in data augmentation for supervised tasks with prediction accuracy score and average confidence score, and the well known FID metric. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2005.09349 [pdf, other]

Uncertainty Estimation in Deep 2D Echocardiography Segmentation

Authors: Lavsen Dahal, Aayush Kafle, Bishesh Khanal

Abstract: 2D echocardiography is the most common imaging modality for cardiovascular diseases. The portability and relatively low-cost nature of Ultrasound (US) enable the US devices needed for performing echocardiography to be made widely available. However, acquiring and interpreting cardiac US images is operator dependent, limiting its use to only places where experts are present. Recently, Deep Learning… ▽ More 2D echocardiography is the most common imaging modality for cardiovascular diseases. The portability and relatively low-cost nature of Ultrasound (US) enable the US devices needed for performing echocardiography to be made widely available. However, acquiring and interpreting cardiac US images is operator dependent, limiting its use to only places where experts are present. Recently, Deep Learning (DL) has been used in 2D echocardiography for automated view classification, and structure and function assessment. Although these recent works show promise in developing computer-guided acquisition and automated interpretation of echocardiograms, most of these methods do not model and estimate uncertainty which can be important when testing on data coming from a distribution further away from that of the training data. Uncertainty estimates can be beneficial both during the image acquisition phase (by providing real-time feedback to the operator on acquired image's quality), and during automated measurement and interpretation. The performance of uncertainty models and quantification metric may depend on the prediction task and the models being compared. Hence, to gain insight of uncertainty modelling for left ventricular segmentation from US images, we compare three ensembling based uncertainty models quantified using four different metrics (one newly proposed) on state-of-the-art baseline networks using two publicly available echocardiogram datasets. We further demonstrate how uncertainty estimation can be used to automatically reject poor quality images and improve state-of-the-art segmentation results. △ Less

Submitted 19 May, 2020; originally announced May 2020.

arXiv:1910.14202 [pdf, other]

Automatic Cobb Angle Detection using Vertebra Detector and Vertebra Corners Regression

Authors: Bidur Khanal, Lavsen Dahal, Prashant Adhikari, Bishesh Khanal

Abstract: Correct evaluation and treatment of Scoliosis require accurate estimation of spinal curvature. Current gold standard is to manually estimate Cobb Angles in spinal X-ray images which is time consuming and has high inter-rater variability. We propose an automatic method with a novel framework that first detects vertebrae as objects followed by a landmark detector that estimates the 4 landmark corner… ▽ More Correct evaluation and treatment of Scoliosis require accurate estimation of spinal curvature. Current gold standard is to manually estimate Cobb Angles in spinal X-ray images which is time consuming and has high inter-rater variability. We propose an automatic method with a novel framework that first detects vertebrae as objects followed by a landmark detector that estimates the 4 landmark corners of each vertebra separately. Cobb Angles are calculated using the slope of each vertebra obtained from the predicted landmarks. For inference on test data, we perform pre and post processings that include cropping, outlier rejection and smoothing of the predicted landmarks. The results were assessed in AASCE MICCAI challenge 2019 which showed a promise with a SMAPE score of 25.69 on the challenge test set. △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: Accepted to MICCAI 2019 CSI Workshop & Challenge: Computational Methods and Clinical Applications for Spine Imaging

arXiv:1908.02582 [pdf, other]

Confident Head Circumference Measurement from Ultrasound with Real-time Feedback for Sonographers

Authors: Samuel Budd, Matthew Sinclair, Bishesh Khanal, Jacqueline Matthew, David Lloyd, Alberto Gomez, Nicolas Toussaint, Emma Robinson, Bernhard Kainz

Abstract: Manual estimation of fetal Head Circumference (HC) from Ultrasound (US) is a key biometric for monitoring the healthy development of fetuses. Unfortunately, such measurements are subject to large inter-observer variability, resulting in low early-detection rates of fetal abnormalities. To address this issue, we propose a novel probabilistic Deep Learning approach for real-time automated estimation… ▽ More Manual estimation of fetal Head Circumference (HC) from Ultrasound (US) is a key biometric for monitoring the healthy development of fetuses. Unfortunately, such measurements are subject to large inter-observer variability, resulting in low early-detection rates of fetal abnormalities. To address this issue, we propose a novel probabilistic Deep Learning approach for real-time automated estimation of fetal HC. This system feeds back statistics on measurement robustness to inform users how confident a deep neural network is in evaluating suitable views acquired during free-hand ultrasound examination. In real-time scenarios, this approach may be exploited to guide operators to scan planes that are as close as possible to the underlying distribution of training images, for the purpose of improving inter-operator consistency. We train on free-hand ultrasound data from over 2000 subjects (2848 training/540 test) and show that our method is able to predict HC measurements within 1.81$\pm$1.65mm deviation from the ground truth, with 50% of the test images fully contained within the predicted confidence margins, and an average of 1.82$\pm$1.78mm deviation from the margin for the remaining cases that are not fully contained. △ Less

Submitted 7 August, 2019; originally announced August 2019.

Comments: Accepted at MICCAI 2019; Demo video available on Twitter (@sambuddinc)

arXiv:1903.02429 [pdf, other]

doi 10.1007/978-3-030-20351-1_17

Controlling Meshes via Curvature: Spin Transformations for Pose-Invariant Shape Processing

Authors: Loic Le Folgoc, Daniel C. Castro, Jeremy Tan, Bishesh Khanal, Konstantinos Kamnitsas, Ian Walker, Amir Alansary, Ben Glocker

Abstract: We investigate discrete spin transformations, a geometric framework to manipulate surface meshes by controlling mean curvature. Applications include surface fairing -- flowing a mesh onto say, a reference sphere -- and mesh extrusion -- e.g., rebuilding a complex shape from a reference sphere and curvature specification. Because they operate in curvature space, these operations can be conducted ve… ▽ More We investigate discrete spin transformations, a geometric framework to manipulate surface meshes by controlling mean curvature. Applications include surface fairing -- flowing a mesh onto say, a reference sphere -- and mesh extrusion -- e.g., rebuilding a complex shape from a reference sphere and curvature specification. Because they operate in curvature space, these operations can be conducted very stably across large deformations with no need for remeshing. Spin transformations add to the algorithmic toolbox for pose-invariant shape analysis. Mathematically speaking, mean curvature is a shape invariant and in general fully characterizes closed shapes (together with the metric). Computationally speaking, spin transformations make that relationship explicit. Our work expands on a discrete formulation of spin transformations. Like their smooth counterpart, discrete spin transformations are naturally close to conformal (angle-preserving). This quasi-conformality can nevertheless be relaxed to satisfy the desired trade-off between area distortion and angle preservation. We derive such constraints and propose a formulation in which they can be efficiently incorporated. The approach is showcased on subcortical structures. △ Less

Submitted 6 March, 2019; originally announced March 2019.

Comments: Accepted for publication at the 26th international conference on Information Processing in Medical Imaging (IPMI 2019)

Journal ref: IPMI 2019. LNCS, vol 11492, pp 221-234. Springer, Cham

arXiv:1903.01905

FastReg: Fast Non-Rigid Registration via Accelerated Optimisation on the Manifold of Diffeomorphisms

Authors: Daniel Grzech, Loïc le Folgoc, Mattias P. Heinrich, Bishesh Khanal, Jakub Moll, Julia A. Schnabel, Ben Glocker, Bernhard Kainz

Abstract: We present an implementation of a new approach to diffeomorphic non-rigid registration of medical images. The method is based on optical flow and warps images via gradient flow with the standard $L^2$ inner product. To compute the transformation, we rely on accelerated optimisation on the manifold of diffeomorphisms. We achieve regularity properties of Sobolev gradient flows, which are expensive t… ▽ More We present an implementation of a new approach to diffeomorphic non-rigid registration of medical images. The method is based on optical flow and warps images via gradient flow with the standard $L^2$ inner product. To compute the transformation, we rely on accelerated optimisation on the manifold of diffeomorphisms. We achieve regularity properties of Sobolev gradient flows, which are expensive to compute, owing to a novel method of averaging the gradients in time rather than space. We successfully register brain MRI and challenging abdominal CT scans at speeds orders of magnitude faster than previous approaches. We make our code available in a public repository: https://github.com/dgrzech/fastreg △ Less

Submitted 24 April, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

Comments: There is an ongoing dispute about the presentation of this paper. It will be withdrawn until the dispute is resoved

arXiv:1808.00793 [pdf, other]

Weakly Supervised Localisation for Fetal Ultrasound Images

Authors: Nicolas Toussaint, Bishesh Khanal, Matthew Sinclair, Alberto Gomez, Emily Skelton, Jacqueline Matthew, Julia A. Schnabel

Abstract: This paper addresses the task of detecting and localising fetal anatomical regions in 2D ultrasound images, where only image-level labels are present at training, i.e. without any localisation or segmentation information. We examine the use of convolutional neural network architectures coupled with soft proposal layers. The resulting network simultaneously performs anatomical region detection (cla… ▽ More This paper addresses the task of detecting and localising fetal anatomical regions in 2D ultrasound images, where only image-level labels are present at training, i.e. without any localisation or segmentation information. We examine the use of convolutional neural network architectures coupled with soft proposal layers. The resulting network simultaneously performs anatomical region detection (classification) and localisation tasks. We generate a proposal map describing the attention of the network for a particular class. The network is trained on 85,500 2D fetal Ultrasound images and their associated labels. Labels correspond to six anatomical regions: head, spine, thorax, abdomen, limbs, and placenta. Detection achieves an average accuracy of 90\% on individual regions, and show that the proposal maps correlate well with relevant anatomical structures. This work presents itself as a powerful and essential step towards subsequent tasks such as fetal position and pose estimation, organ-specific segmentation, or image-guided navigation. Code and additional material is available at https://ntoussaint.github.io/fetalnav △ Less

Submitted 2 August, 2018; originally announced August 2018.

Comments: 4th Workshop on Deep Learning for Medical Image Analysis, MICCAI 2018, Granada, Spain

arXiv:1807.10583 [pdf, other]

EchoFusion: Tracking and Reconstruction of Objects in 4D Freehand Ultrasound Imaging without External Trackers

Authors: Bishesh Khanal, Alberto Gomez, Nicolas Toussaint, Steven McDonagh, Veronika Zimmer, Emily Skelton, Jacqueline Matthew, Daniel Grzech, Robert Wright, Chandni Gupta, Benjamin Hou, Daniel Rueckert, Julia A. Schnabel, Bernhard Kainz

Abstract: Ultrasound (US) is the most widely used fetal imaging technique. However, US images have limited capture range, and suffer from view dependent artefacts such as acoustic shadows. Compounding of overlapping 3D US acquisitions into a high-resolution volume can extend the field of view and remove image artefacts, which is useful for retrospective analysis including population based studies. However,… ▽ More Ultrasound (US) is the most widely used fetal imaging technique. However, US images have limited capture range, and suffer from view dependent artefacts such as acoustic shadows. Compounding of overlapping 3D US acquisitions into a high-resolution volume can extend the field of view and remove image artefacts, which is useful for retrospective analysis including population based studies. However, such volume reconstructions require information about relative transformations between probe positions from which the individual volumes were acquired. In prenatal US scans, the fetus can move independently from the mother, making external trackers such as electromagnetic or optical tracking unable to track the motion between probe position and the moving fetus. We provide a novel methodology for image-based tracking and volume reconstruction by combining recent advances in deep learning and simultaneous localisation and mapping (SLAM). Tracking semantics are established through the use of a Residual 3D U-Net and the output is fed to the SLAM algorithm. As a proof of concept, experiments are conducted on US volumes taken from a whole body fetal phantom, and from the heads of real fetuses. For the fetal head segmentation, we also introduce a novel weak annotation approach to minimise the required manual effort for ground truth annotation. We evaluate our method qualitatively, and quantitatively with respect to tissue discrimination accuracy and tracking robustness. △ Less

Submitted 19 July, 2018; originally announced July 2018.

Comments: MICCAI Workshop on Perinatal, Preterm and Paediatric Image analysis (PIPPI), 2018

arXiv:1806.07486 [pdf, other]

doi 10.1007/978-3-030-00928-1_45

Standard Plane Detection in 3D Fetal Ultrasound Using an Iterative Transformation Network

Authors: Yuanwei Li, Bishesh Khanal, Benjamin Hou, Amir Alansary, Juan J. Cerrolaza, Matthew Sinclair, Jacqueline Matthew, Chandni Gupta, Caroline Knight, Bernhard Kainz, Daniel Rueckert

Abstract: Standard scan plane detection in fetal brain ultrasound (US) forms a crucial step in the assessment of fetal development. In clinical settings, this is done by manually manoeuvring a 2D probe to the desired scan plane. With the advent of 3D US, the entire fetal brain volume containing these standard planes can be easily acquired. However, manual standard plane identification in 3D volume is labour… ▽ More Standard scan plane detection in fetal brain ultrasound (US) forms a crucial step in the assessment of fetal development. In clinical settings, this is done by manually manoeuvring a 2D probe to the desired scan plane. With the advent of 3D US, the entire fetal brain volume containing these standard planes can be easily acquired. However, manual standard plane identification in 3D volume is labour-intensive and requires expert knowledge of fetal anatomy. We propose a new Iterative Transformation Network (ITN) for the automatic detection of standard planes in 3D volumes. ITN uses a convolutional neural network to learn the relationship between a 2D plane image and the transformation parameters required to move that plane towards the location/orientation of the standard plane in the 3D volume. During inference, the current plane image is passed iteratively to the network until it converges to the standard plane location. We explore the effect of using different transformation representations as regression outputs of ITN. Under a multi-task learning framework, we introduce additional classification probability outputs to the network to act as confidence measures for the regressed transformation parameters in order to further improve the localisation accuracy. When evaluated on 72 US volumes of fetal brain, our method achieves an error of 3.83mm/12.7 degrees and 3.80mm/12.6 degrees for the transventricular and transcerebellar planes respectively and takes 0.46s per plane. Source code is publicly available at https://github.com/yuanwei1989/plane-detection. △ Less

Submitted 6 October, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

Comments: 8 pages, 2 figures, accepted for MICCAI 2018; Added link to source code

Journal ref: LNCS 11070 (2018) 392-400

arXiv:1806.06987 [pdf, other]

doi 10.1007/978-3-030-00928-1_64

Fast Multiple Landmark Localisation Using a Patch-based Iterative Network

Authors: Yuanwei Li, Amir Alansary, Juan J. Cerrolaza, Bishesh Khanal, Matthew Sinclair, Jacqueline Matthew, Chandni Gupta, Caroline Knight, Bernhard Kainz, Daniel Rueckert

Abstract: We propose a new Patch-based Iterative Network (PIN) for fast and accurate landmark localisation in 3D medical volumes. PIN utilises a Convolutional Neural Network (CNN) to learn the spatial relationship between an image patch and anatomical landmark positions. During inference, patches are repeatedly passed to the CNN until the estimated landmark position converges to the true landmark location.… ▽ More We propose a new Patch-based Iterative Network (PIN) for fast and accurate landmark localisation in 3D medical volumes. PIN utilises a Convolutional Neural Network (CNN) to learn the spatial relationship between an image patch and anatomical landmark positions. During inference, patches are repeatedly passed to the CNN until the estimated landmark position converges to the true landmark location. PIN is computationally efficient since the inference stage only selectively samples a small number of patches in an iterative fashion rather than a dense sampling at every location in the volume. Our approach adopts a multi-task learning framework that combines regression and classification to improve localisation accuracy. We extend PIN to localise multiple landmarks by using principal component analysis, which models the global anatomical relationships between landmarks. We have evaluated PIN using 72 3D ultrasound images from fetal screening examinations. PIN achieves quantitatively an average landmark localisation error of 5.59mm and a runtime of 0.44s to predict 10 landmarks per volume. Qualitatively, anatomical 2D standard scan planes derived from the predicted landmark locations are visually similar to the clinical ground truth. Source code is publicly available at https://github.com/yuanwei1989/landmark-detection. △ Less

Submitted 6 October, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

Comments: 8 pages, 4 figures, Accepted for MICCAI 2018

Journal ref: LNCS 11070 (2018) 563-571

arXiv:1806.00411 [pdf, other]

Adapted and Oversegmenting Graphs: Application to Geometric Deep Learning

Authors: Alberto Gomez, Veronika A. Zimmer, Bishesh Khanal, Nicolas Toussaint, Julia A. Schnabel

Abstract: We propose a novel iterative method to adapt a a graph to d-dimensional image data. The method drives the nodes of the graph towards image features. The adaptation process naturally lends itself to a measure of feature saliency which can then be used to retain meaningful nodes and edges in the graph. From the adapted graph, we also propose the computation of a dual graph, which inherits the salien… ▽ More We propose a novel iterative method to adapt a a graph to d-dimensional image data. The method drives the nodes of the graph towards image features. The adaptation process naturally lends itself to a measure of feature saliency which can then be used to retain meaningful nodes and edges in the graph. From the adapted graph, we also propose the computation of a dual graph, which inherits the saliency measure from the adapted graph, and whose edges run along image features, hence producing an oversegmenting graph. The proposed method is computationally efficient and fully parallelisable. We propose two distance measures to find image saliency along graph edges, and evaluate the performance on synthetic images and on natural images from publicly available databases. In both cases, the most salient nodes of the graph achieve average boundary recall over 90%. We also apply our method to image classification on the MNIST hand-written digit dataset, using a recently proposed Deep Geometric Learning architecture, and achieving state-of-the-art classification accuracy, for a graph-based method, of 97.86%. △ Less

Submitted 5 September, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

Comments: Submited to CVIU

arXiv:1805.01026 [pdf, other]

Computing CNN Loss and Gradients for Pose Estimation with Riemannian Geometry

Authors: Benjamin Hou, Nina Miolane, Bishesh Khanal, Matthew C. H. Lee, Amir Alansary, Steven McDonagh, Jo V. Hajnal, Daniel Rueckert, Ben Glocker, Bernhard Kainz

Abstract: Pose estimation, i.e. predicting a 3D rigid transformation with respect to a fixed co-ordinate frame in, SE(3), is an omnipresent problem in medical image analysis with applications such as: image rigid registration, anatomical standard plane detection, tracking and device/camera pose estimation. Deep learning methods often parameterise a pose with a representation that separates rotation and tran… ▽ More Pose estimation, i.e. predicting a 3D rigid transformation with respect to a fixed co-ordinate frame in, SE(3), is an omnipresent problem in medical image analysis with applications such as: image rigid registration, anatomical standard plane detection, tracking and device/camera pose estimation. Deep learning methods often parameterise a pose with a representation that separates rotation and translation. As commonly available frameworks do not provide means to calculate loss on a manifold, regression is usually performed using the L2-norm independently on the rotation's and the translation's parameterisations, which is a metric for linear spaces that does not take into account the Lie group structure of SE(3). In this paper, we propose a general Riemannian formulation of the pose estimation problem. We propose to train the CNN directly on SE(3) equipped with a left-invariant Riemannian metric, coupling the prediction of the translation and rotation defining the pose. At each training step, the ground truth and predicted pose are elements of the manifold, where the loss is calculated as the Riemannian geodesic distance. We then compute the optimisation direction by back-propagating the gradient with respect to the predicted pose on the tangent space of the manifold SE(3) and update the network weights. We thoroughly evaluate the effectiveness of our loss function by comparing its performance with popular and most commonly used existing methods, on tasks such as image-based localisation and intensity-based 2D/3D registration. We also show that hyper-parameters, used in our loss function to weight the contribution between rotations and translations, can be intrinsically calculated from the dataset to achieve greater performance margins. △ Less

Submitted 17 July, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

Showing 1–50 of 51 results for author: Khanal, B