-
Disentangling Prompt Element Level Risk Factors for Hallucinations and Omissions in Mental Health LLM Responses
Authors:
Congning Ni,
Sarvech Qadir,
Bryan Steitz,
Mihir Sachin Vaidya,
Qingyuan Song,
Lantian Xia,
Shelagh Mulvaney,
Siru Liu,
Hyeyoung Ryu,
Leah Hecht,
Amy Bucher,
Christopher Symons,
Laurie Novak,
Susannah L. Rose,
Murat Kantarcioglu,
Bradley Malin,
Zhijun Yin
Abstract:
Mental health concerns are often expressed outside clinical settings, including in high-distress help seeking, where safety-critical guidance may be needed. Consumer health informatics systems increasingly incorporate large language models (LLMs) for mental health question answering, yet many evaluations underrepresent narrative, high-distress inquiries. We introduce UTCO (User, Topic, Context, To…
▽ More
Mental health concerns are often expressed outside clinical settings, including in high-distress help seeking, where safety-critical guidance may be needed. Consumer health informatics systems increasingly incorporate large language models (LLMs) for mental health question answering, yet many evaluations underrepresent narrative, high-distress inquiries. We introduce UTCO (User, Topic, Context, Tone), a prompt construction framework that represents an inquiry as four controllable elements for systematic stress testing. Using 2,075 UTCO-generated prompts, we evaluated Llama 3.3 and annotated hallucinations (fabricated or incorrect clinical content) and omissions (missing clinically necessary or safety-critical guidance). Hallucinations occurred in 6.5% of responses and omissions in 13.2%, with omissions concentrated in crisis and suicidal ideation prompts. Across regression, element-specific matching, and similarity-matched comparisons, failures were most consistently associated with context and tone, while user-background indicators showed no systematic differences after balancing. These findings support evaluating omissions as a primary safety outcome and moving beyond static benchmark question sets.
△ Less
Submitted 10 March, 2026;
originally announced April 2026.
-
Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine
Authors:
Omid Rohanian,
Mohammadmahdi Nouriborji,
Olena Seminog,
Rodrigo Furst,
Thomas Mendy,
Shanthi Levanita,
Zaharat Kadri-Alabi,
Nusrat Jabin,
Daniela Toale,
Georgina Humphreys,
Emilia Antonio,
Adrian Bucher,
Alice Norton,
David A. Clifton
Abstract:
This paper introduces the Pandemic PACT Advanced Categorisation Engine (PPACE) along with its associated dataset. PPACE is a fine-tuned model developed to automatically classify research abstracts from funded biomedical projects according to WHO-aligned research priorities. This task is crucial for monitoring research trends and identifying gaps in global health preparedness and response. Our appr…
▽ More
This paper introduces the Pandemic PACT Advanced Categorisation Engine (PPACE) along with its associated dataset. PPACE is a fine-tuned model developed to automatically classify research abstracts from funded biomedical projects according to WHO-aligned research priorities. This task is crucial for monitoring research trends and identifying gaps in global health preparedness and response. Our approach builds on human-annotated projects, which are allocated one or more categories from a predefined list. A large language model is then used to generate `rationales' explaining the reasoning behind these annotations. This augmented data, comprising expert annotations and rationales, is subsequently used to fine-tune a smaller, more efficient model. Developed as part of the Pandemic PACT project, which aims to track and analyse research funding and clinical evidence for a wide range of diseases with outbreak potential, PPACE supports informed decision-making by research funders, policymakers, and independent researchers. We introduce and release both the trained model and the instruction-based dataset used for its training. Our evaluation shows that PPACE significantly outperforms its baselines. The release of PPACE and its associated dataset offers valuable resources for researchers in multilabel biomedical document classification and supports advancements in aligning biomedical research with key global health priorities.
△ Less
Submitted 19 July, 2024; v1 submitted 14 July, 2024;
originally announced July 2024.
-
When Generative AI Meets Workplace Learning: Creating A Realistic & Motivating Learning Experience With A Generative PCA
Authors:
Andreas Bucher,
Birgit Schenk,
Mateusz Dolata,
Gerhard Schwabe
Abstract:
Workplace learning is used to train employees systematically, e.g., via e-learning or in 1:1 training. However, this is often deemed ineffective and costly. Whereas pure e-learning lacks the possibility of conversational exercise and personal contact, 1:1 training with human instructors involves a high level of personnel and organizational costs. Hence, pedagogical conversational agents (PCAs), ba…
▽ More
Workplace learning is used to train employees systematically, e.g., via e-learning or in 1:1 training. However, this is often deemed ineffective and costly. Whereas pure e-learning lacks the possibility of conversational exercise and personal contact, 1:1 training with human instructors involves a high level of personnel and organizational costs. Hence, pedagogical conversational agents (PCAs), based on generative AI, seem to compensate for the disadvantages of both forms. Following Action Design Research, this paper describes an organizational communication training with a Generative PCA (GenPCA). The evaluation shows promising results: the agent was perceived positively among employees and contributed to an improvement in self-determined learning. However, the integration of such agent comes not without limitations. We conclude with suggestions concerning the didactical methods, which are supported by a GenPCA, and possible improvements of such an agent for workplace learning.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Real-World Federated Learning in Radiology: Hurdles to overcome and Benefits to gain
Authors:
Markus R. Bujotzek,
Ünal Akünal,
Stefan Denner,
Peter Neher,
Maximilian Zenk,
Eric Frodl,
Astha Jaiswal,
Moon Kim,
Nicolai R. Krekiehn,
Manuel Nickel,
Richard Ruppel,
Marcus Both,
Felix Döllinger,
Marcel Opitz,
Thorsten Persigehl,
Jens Kleesiek,
Tobias Penzkofer,
Klaus Maier-Hein,
Rickmer Braren,
Andreas Bucher
Abstract:
Objective: Federated Learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles, leaving behind a significant knowledge gap.…
▽ More
Objective: Federated Learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles, leaving behind a significant knowledge gap. Minding efforts to implement real-world FL, there is a notable lack of comprehensive assessment comparing FL to less complex alternatives. Materials & Methods: We extensively reviewed FL literature, categorizing insights along with our findings according to their nature and phase while establishing a FL initiative, summarized to a comprehensive guide. We developed our own FL infrastructure within the German Radiological Cooperative Network (RACOON) and demonstrated its functionality by training FL models on lung pathology segmentation tasks across six university hospitals. We extensively evaluated FL against less complex alternatives in three distinct evaluation scenarios. Results: The proposed guide outlines essential steps, identified hurdles, and proposed solutions for establishing successful FL initiatives conducting real-world experiments. Our experimental results show that FL outperforms less complex alternatives in all evaluation scenarios, justifying the effort required to translate FL into real-world applications. Discussion & Conclusion: Our proposed guide aims to aid future FL researchers in circumventing pitfalls and accelerating translation of FL into radiological applications. Our results underscore the value of efforts needed to translate FL into real-world applications by demonstrating advantageous performance over alternatives, and emphasize the importance of strategic organization, robust management of distributed data and infrastructure in real-world settings.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Continual atlas-based segmentation of prostate MRI
Authors:
Amin Ranem,
Camila González,
Daniel Pinto dos Santos,
Andreas M. Bucher,
Ahmed E. Othman,
Anirban Mukhopadhyay
Abstract:
Continual learning (CL) methods designed for natural image classification often fail to reach basic quality standards for medical image segmentation. Atlas-based segmentation, a well-established approach in medical imaging, incorporates domain knowledge on the region of interest, leading to semantically coherent predictions. This is especially promising for CL, as it allows us to leverage structur…
▽ More
Continual learning (CL) methods designed for natural image classification often fail to reach basic quality standards for medical image segmentation. Atlas-based segmentation, a well-established approach in medical imaging, incorporates domain knowledge on the region of interest, leading to semantically coherent predictions. This is especially promising for CL, as it allows us to leverage structural information and strike an optimal balance between model rigidity and plasticity over time. When combined with privacy-preserving prototypes, this process offers the advantages of rehearsal-based CL without compromising patient privacy. We propose Atlas Replay, an atlas-based segmentation approach that uses prototypes to generate high-quality segmentation masks through image registration that maintain consistency even as the training distribution changes. We explore how our proposed method performs compared to state-of-the-art CL methods in terms of knowledge transferability across seven publicly available prostate segmentation datasets. Prostate segmentation plays a vital role in diagnosing prostate cancer, however, it poses challenges due to substantial anatomical variations, benign structural differences in older age groups, and fluctuating acquisition parameters. Our results show that Atlas Replay is both robust and generalizes well to yet-unseen domains while being able to maintain knowledge, unlike end-to-end segmentation methods. Our code base is available under https://github.com/MECLabTUDA/Atlas-Replay.
△ Less
Submitted 6 November, 2023; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Efficient Large Scale Medical Image Dataset Preparation for Machine Learning Applications
Authors:
Stefan Denner,
Jonas Scherer,
Klaus Kades,
Dimitrios Bounias,
Philipp Schader,
Lisa Kausch,
Markus Bujotzek,
Andreas Michael Bucher,
Tobias Penzkofer,
Klaus Maier-Hein
Abstract:
In the rapidly evolving field of medical imaging, machine learning algorithms have become indispensable for enhancing diagnostic accuracy. However, the effectiveness of these algorithms is contingent upon the availability and organization of high-quality medical imaging datasets. Traditional Digital Imaging and Communications in Medicine (DICOM) data management systems are inadequate for handling…
▽ More
In the rapidly evolving field of medical imaging, machine learning algorithms have become indispensable for enhancing diagnostic accuracy. However, the effectiveness of these algorithms is contingent upon the availability and organization of high-quality medical imaging datasets. Traditional Digital Imaging and Communications in Medicine (DICOM) data management systems are inadequate for handling the scale and complexity of data required to be facilitated in machine learning algorithms. This paper introduces an innovative data curation tool, developed as part of the Kaapana open-source toolkit, aimed at streamlining the organization, management, and processing of large-scale medical imaging datasets. The tool is specifically tailored to meet the needs of radiologists and machine learning researchers. It incorporates advanced search, auto-annotation and efficient tagging functionalities for improved data curation. Additionally, the tool facilitates quality control and review, enabling researchers to validate image and segmentation quality in large datasets. It also plays a critical role in uncovering potential biases in datasets by aggregating and visualizing metadata, which is essential for developing robust machine learning models. Furthermore, Kaapana is integrated within the Radiological Cooperative Network (RACOON), a pioneering initiative aimed at creating a comprehensive national infrastructure for the aggregation, transmission, and consolidation of radiological data across all university clinics throughout Germany. A supplementary video showcasing the tool's functionalities can be accessed at https://bit.ly/MICCAI-DEMI2023.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Current State of Community-Driven Radiological AI Deployment in Medical Imaging
Authors:
Vikash Gupta,
Barbaros Selnur Erdal,
Carolina Ramirez,
Ralf Floca,
Laurence Jackson,
Brad Genereaux,
Sidney Bryson,
Christopher P Bridge,
Jens Kleesiek,
Felix Nensa,
Rickmer Braren,
Khaled Younis,
Tobias Penzkofer,
Andreas Michael Bucher,
Ming Melvin Qin,
Gigon Bae,
Hyeonhoon Lee,
M. Jorge Cardoso,
Sebastien Ourselin,
Eric Kerfoot,
Rahul Choudhury,
Richard D. White,
Tessa Cook,
David Bericat,
Matthew Lungren
, et al. (2 additional authors not shown)
Abstract:
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introd…
▽ More
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
△ Less
Submitted 8 May, 2023; v1 submitted 29 December, 2022;
originally announced December 2022.
-
Distance-based detection of out-of-distribution silent failures for Covid-19 lung lesion segmentation
Authors:
Camila Gonzalez,
Karol Gotkowski,
Moritz Fuchs,
Andreas Bucher,
Armin Dadras,
Ricarda Fischbach,
Isabel Kaltenborn,
Anirban Mukhopadhyay
Abstract:
Automatic segmentation of ground glass opacities and consolidations in chest computer tomography (CT) scans can potentially ease the burden of radiologists during times of high resource utilisation. However, deep learning models are not trusted in the clinical routine due to failing silently on out-of-distribution (OOD) data. We propose a lightweight OOD detection method that leverages the Mahalan…
▽ More
Automatic segmentation of ground glass opacities and consolidations in chest computer tomography (CT) scans can potentially ease the burden of radiologists during times of high resource utilisation. However, deep learning models are not trusted in the clinical routine due to failing silently on out-of-distribution (OOD) data. We propose a lightweight OOD detection method that leverages the Mahalanobis distance in the feature space and seamlessly integrates into state-of-the-art segmentation pipelines. The simple approach can even augment pre-trained models with clinically relevant uncertainty quantification. We validate our method across four chest CT distribution shifts and two magnetic resonance imaging applications, namely segmentation of the hippocampus and the prostate. Our results show that the proposed method effectively detects far- and near-OOD samples across all explored scenarios.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
Quality monitoring of federated Covid-19 lesion segmentation
Authors:
Camila Gonzalez,
Christian Harder,
Amin Ranem,
Ricarda Fischbach,
Isabel Kaltenborn,
Armin Dadras,
Andreas Bucher,
Anirban Mukhopadhyay
Abstract:
Federated Learning is the most promising way to train robust Deep Learning models for the segmentation of Covid-19-related findings in chest CTs. By learning in a decentralized fashion, heterogeneous data can be leveraged from a variety of sources and acquisition protocols whilst ensuring patient privacy. It is, however, crucial to continuously monitor the performance of the model. Yet when it com…
▽ More
Federated Learning is the most promising way to train robust Deep Learning models for the segmentation of Covid-19-related findings in chest CTs. By learning in a decentralized fashion, heterogeneous data can be leveraged from a variety of sources and acquisition protocols whilst ensuring patient privacy. It is, however, crucial to continuously monitor the performance of the model. Yet when it comes to the segmentation of diffuse lung lesions, a quick visual inspection is not enough to assess the quality, and thorough monitoring of all network outputs by expert radiologists is not feasible. In this work, we present an array of lightweight metrics that can be calculated locally in each hospital and then aggregated for central monitoring of a federated system. Our linear model detects over 70% of low-quality segmentations on an out-of-distribution dataset and thus reliably signals a decline in model performance.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Detecting when pre-trained nnU-Net models fail silently for Covid-19 lung lesion segmentation
Authors:
Camila Gonzalez,
Karol Gotkowski,
Andreas Bucher,
Ricarda Fischbach,
Isabel Kaltenborn,
Anirban Mukhopadhyay
Abstract:
Automatic segmentation of lung lesions in computer tomography has the potential to ease the burden of clinicians during the Covid-19 pandemic. Yet predictive deep learning models are not trusted in the clinical routine due to failing silently in out-of-distribution (OOD) data. We propose a lightweight OOD detection method that exploits the Mahalanobis distance in the feature space. The proposed ap…
▽ More
Automatic segmentation of lung lesions in computer tomography has the potential to ease the burden of clinicians during the Covid-19 pandemic. Yet predictive deep learning models are not trusted in the clinical routine due to failing silently in out-of-distribution (OOD) data. We propose a lightweight OOD detection method that exploits the Mahalanobis distance in the feature space. The proposed approach can be seamlessly integrated into state-of-the-art segmentation pipelines without requiring changes in model architecture or training procedure, and can therefore be used to assess the suitability of pre-trained models to new data. We validate our method with a patch-based nnU-Net architecture trained with a multi-institutional dataset and find that it effectively detects samples that the model segments incorrectly.
△ Less
Submitted 14 July, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
M3d-CAM: A PyTorch library to generate 3D data attention maps for medical deep learning
Authors:
Karol Gotkowski,
Camila Gonzalez,
Andreas Bucher,
Anirban Mukhopadhyay
Abstract:
M3d-CAM is an easy to use library for generating attention maps of CNN-based PyTorch models improving the interpretability of model predictions for humans. The attention maps can be generated with multiple methods like Guided Backpropagation, Grad-CAM, Guided Grad-CAM and Grad-CAM++. These attention maps visualize the regions in the input data that influenced the model prediction the most at a cer…
▽ More
M3d-CAM is an easy to use library for generating attention maps of CNN-based PyTorch models improving the interpretability of model predictions for humans. The attention maps can be generated with multiple methods like Guided Backpropagation, Grad-CAM, Guided Grad-CAM and Grad-CAM++. These attention maps visualize the regions in the input data that influenced the model prediction the most at a certain layer. Furthermore, M3d-CAM supports 2D and 3D data for the task of classification as well as for segmentation. A key feature is also that in most cases only a single line of code is required for generating attention maps for a model making M3d-CAM basically plug and play.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.