Skip to main content

Showing 1–25 of 25 results for author: Dobson, R J

Searching in archive cs. Search in all archives.
.
  1. Unlocking Electronic Health Records: A Hybrid Graph RAG Approach to Safe Clinical AI for Patient QA

    Authors: Samuel Thio, Matthew Lewis, Spiros Denaxas, Richard JB Dobson

    Abstract: Electronic health record (EHR) systems present clinicians with vast repositories of clinical information, creating a significant cognitive burden where critical details are easily overlooked. While Large Language Models (LLMs) offer transformative potential for data processing, they face significant limitations in clinical settings, particularly regarding context grounding and hallucinations. Curr… ▽ More

    Submitted 27 November, 2025; originally announced February 2026.

    Comments: 26 pages, 5 figures, 2 tables

    Journal ref: Frontiers in Digital Health, vol. 8, 2026

  2. arXiv:2511.07011  [pdf

    cs.CL cs.LG

    Multilingual Lexical Feature Analysis of Spoken Language for Predicting Major Depression Symptom Severity

    Authors: Anastasiia Tokareva, Judith Dineley, Zoe Firth, Pauline Conde, Faith Matcham, Sara Siddi, Femke Lamers, Ewan Carr, Carolin Oetzmann, Daniel Leightley, Yuezhou Zhang, Amos A. Folarin, Josep Maria Haro, Brenda W. J. H. Penninx, Raquel Bailon, Srinivasan Vairavan, Til Wykes, Richard J. B. Dobson, Vaibhav A. Narayan, Matthew Hotopf, Nicholas Cummins, The RADAR-CNS Consortium

    Abstract: Background: Captured between clinical appointments using mobile devices, spoken language has potential for objective, more regular assessment of symptom severity and earlier detection of relapse in major depressive disorder. However, research to date has largely been in non-clinical cross-sectional samples of written language using complex machine learning (ML) approaches with limited interpretabi… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  3. arXiv:2510.02967  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines

    Authors: Matthew Lewis, Samuel Thio, Amy Roberts, Catherine Siju, Whoasif Mukit, Rebecca Kuruvilla, Zhangshu Joshua Jiang, Niko Möller-Grell, Aditya Borakati, Richard JB Dobson, Spiros Denaxas

    Abstract: This paper presents the development and evaluation of a Retrieval-Augmented Generation (RAG) system for querying the United Kingdom's National Institute for Health and Care Excellence (NICE) clinical guidelines using Large Language Models (LLMs). The extensive length and volume of these guidelines can impede their utilisation within a time-constrained healthcare system, a challenge this project ad… ▽ More

    Submitted 14 December, 2025; v1 submitted 3 October, 2025; originally announced October 2025.

  4. arXiv:2509.10970  [pdf, ps, other

    cs.LG cs.AI

    The Psychogenic Machine: Simulating AI Psychosis, Delusion Reinforcement and Harm Enablement in Large Language Models

    Authors: Joshua Au Yeung, Jacopo Dalmasso, Luca Foschini, Richard JB Dobson, Zeljko Kraljevic

    Abstract: Background: Emerging reports of "AI psychosis" are on the rise, where user-LLM interactions may exacerbate or induce psychosis or adverse psychological symptoms. Whilst the sycophantic and agreeable nature of LLMs can be beneficial, it becomes a vector for harm by reinforcing delusional beliefs in vulnerable users. Methods: Psychosis-bench is a novel benchmark designed to systematically evaluate… ▽ More

    Submitted 16 September, 2025; v1 submitted 13 September, 2025; originally announced September 2025.

  5. arXiv:2505.03039  [pdf

    cs.CV stat.AP

    An Explainable Anomaly Detection Framework for Monitoring Depression and Anxiety Using Consumer Wearable Devices

    Authors: Yuezhou Zhang, Amos A. Folarin, Callum Stewart, Heet Sankesara, Yatharth Ranjan, Pauline Conde, Akash Roy Choudhury, Shaoxiong Sun, Zulqarnain Rashid, Richard J. B. Dobson

    Abstract: Continuous monitoring of behavior and physiology via wearable devices offers a novel, objective method for the early detection of worsening depression and anxiety. In this study, we present an explainable anomaly detection framework that identifies clinically meaningful increases in symptom severity using consumer-grade wearable data. Leveraging data from 2,023 participants with defined healthy ba… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  6. arXiv:2503.09927  [pdf

    cs.CL cs.AI

    Developing and Evaluating an AI-Assisted Prediction Model for Unplanned Intensive Care Admissions following Elective Neurosurgery using Natural Language Processing within an Electronic Healthcare Record System

    Authors: Julia Ive, Olatomiwa Olukoya, Jonathan P. Funnell, James Booker, Sze H M Lam, Ugan Reddy, Kawsar Noor, Richard JB Dobson, Astri M. V. Luoma, Hani J Marcus

    Abstract: Introduction: Timely care in a specialised neuro-intensive therapy unit (ITU) reduces mortality and hospital stays, with planned admissions being safer than unplanned ones. However, post-operative care decisions remain subjective. This study used artificial intelligence (AI), specifically natural language processing (NLP) to analyse electronic health records (EHRs) and predict ITU admissions for e… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  7. arXiv:2412.10848  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models for Medical Forecasting -- Foresight 2

    Authors: Zeljko Kraljevic, Joshua Au Yeung, Daniel Bean, James Teo, Richard J. Dobson

    Abstract: Foresight 2 (FS2) is a large language model fine-tuned on hospital data for modelling patient timelines (GitHub 'removed for anon'). It can understand patients' clinical notes and predict SNOMED codes for a wide range of biomedical use cases, including diagnosis suggestions, risk forecasting, and procedure and medication recommendations. FS2 is trained on the free text portion of the MIMIC-III dat… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

  8. arXiv:2409.16339  [pdf

    q-bio.QM cs.LG

    Large-scale digital phenotyping: identifying depression and anxiety indicators in a general UK population with over 10,000 participants

    Authors: Yuezhou Zhang, Callum Stewart, Yatharth Ranjan, Pauline Conde, Heet Sankesara, Zulqarnain Rashid, Shaoxiong Sun, Richard J B Dobson, Amos A Folarin

    Abstract: Digital phenotyping offers a novel and cost-efficient approach for managing depression and anxiety. Previous studies, often limited to small-to-medium or specific populations, may lack generalizability. We conducted a cross-sectional analysis of data from 10,129 participants recruited from a UK-based general population between June 2020 and August 2022. Participants shared wearable (Fitbit) data a… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  9. arXiv:2406.07497  [pdf

    cs.SD eess.AS

    A methodological framework and exemplar protocol for the collection and analysis of repeated speech samples

    Authors: Nicholas Cummins, Lauren L. White, Zahia Rahman, Catriona Lucas, Tian Pan, Ewan Carr, Faith Matcham, Johnny Downs, Richard J. Dobson, Thomas F. Quatieri, Judith Dineley

    Abstract: Speech and language biomarkers have the potential to be regular, objective assessments of symptom severity in several health conditions, both in-clinic and remotely using mobile devices. However, the complex nature of speech and often subtle changes associated with health mean that findings are highly dependent on methodological and cohort choices. These are often not reported adequately in studie… ▽ More

    Submitted 8 December, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Main manuscript: 37 pages. 3 figures, 8 tables, 1 textbox. Submitted to JMIR Research Methods. Replacement with format changes and copyediting

    ACM Class: J.3

    Journal ref: JMIR Res Protoc 2025;14:e69431

  10. arXiv:2312.16713  [pdf, ps, other

    cs.LG cs.AI

    CSAI: Conditional Self-Attention Imputation for Healthcare Time-series

    Authors: Linglong Qian, Joseph Arul Raj, Hugh Logan Ellis, Ao Zhang, Yuezhou Zhang, Tao Wang, Richard JB Dobson, Zina Ibrahim

    Abstract: We introduce the Conditional Self-Attention Imputation (CSAI) model, a novel recurrent neural network architecture designed to address the challenges of complex missing data patterns in multivariate time series derived from hospital electronic health records (EHRs). CSAI extends state-of-the-art neural network-based imputation by introducing key modifications specific to EHR data: a) attention-bas… ▽ More

    Submitted 6 January, 2026; v1 submitted 27 December, 2023; originally announced December 2023.

  11. arXiv:2308.11773  [pdf

    cs.CL cs.CY cs.SD eess.AS q-bio.QM

    Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model

    Authors: Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf , et al. (3 additional authors not shown)

    Abstract: Language use has been shown to correlate with depression, but large-scale validation is needed. Traditional methods like clinic studies are expensive. So, natural language processing has been employed on social media to predict depression, but limitations remain-lack of validated labels, biased user samples, and no context. Our study identified 29 topics in 3919 smartphone-collected speech recordi… ▽ More

    Submitted 5 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  12. arXiv:2308.02043  [pdf

    cs.CY cs.AI

    Disease Insight through Digital Biomarkers Developed by Remotely Collected Wearables and Smartphone Data

    Authors: Zulqarnain Rashid, Amos A Folarin, Yatharth Ranjan, Pauline Conde, Heet Sankesara, Yuezhou Zhang, Shaoxiong Sun, Callum Stewart, Petroula Laiou, Richard JB Dobson

    Abstract: Digital Biomarkers and remote patient monitoring can provide valuable and timely insights into how a patient is coping with their condition (disease progression, treatment response, etc.), complementing treatment in traditional healthcare settings.Smartphones with embedded and connected sensors have immense potential for improving healthcare through various apps and mHealth (mobile health) platfor… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  13. arXiv:2212.08072  [pdf

    cs.CL cs.AI cs.LG

    Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs

    Authors: Zeljko Kraljevic, Dan Bean, Anthony Shek, Rebecca Bendayan, Harry Hemingway, Joshua Au Yeung, Alexander Deng, Alfie Baston, Jack Ross, Esther Idowu, James T Teo, Richard J Dobson

    Abstract: Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generati… ▽ More

    Submitted 24 January, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

  14. arXiv:2204.09594  [pdf

    cs.CL cs.LG

    Predicting Clinical Intent from Free Text Electronic Health Records

    Authors: Kawsar Noor, Katherine Smith, Julia Bennett, Jade OConnell, Jessica Fisk, Monika Hunt, Gary Philippo, Teresa Xu, Simon Knight, Luis Romao, Richard JB Dobson, Wai Keong Wong

    Abstract: After a patient consultation, a clinician determines the steps in the management of the patient. A clinician may for example request to see the patient again or refer them to a specialist. Whilst most clinicians will record their intent as "next steps" in the patient's clinical notes, in some cases the clinician may forget to indicate their intent as an order or request, e.g. failure to place the… ▽ More

    Submitted 25 March, 2022; originally announced April 2022.

  15. arXiv:2108.06835  [pdf

    cs.IR

    Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals

    Authors: Kawsar Noor, Lukasz Roguski, Alex Handy, Roman Klapaukh, Amos Folarin, Luis Romao, Joshua Matteson, Nathan Lea, Leilei Zhu, Wai Keong Wong, Anoop Shah, Richard J Dobson

    Abstract: As more healthcare organisations transition to using electronic health record (EHR) systems it is important for these organisations to maximise the secondary use of their data to support service improvement and clinical research. These organisations will find it challenging to have systems which can mine information from the unstructured data fields in the record (clinical notes, letters etc) and… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

  16. Estimating Redundancy in Clinical Text

    Authors: Thomas Searle, Zina Ibrahim, James Teo, Richard JB Dobson

    Abstract: The current mode of use of Electronic Health Record (EHR) elicits text redundancy. Clinicians often populate new documents by duplicating existing notes, then updating accordingly. Data duplication can lead to a propagation of errors, inconsistencies and misreporting of care. Therefore, quantifying information redundancy can play an essential role in evaluating innovations that operate on clinical… ▽ More

    Submitted 26 October, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

    Journal ref: JBI v124 (2021)

  17. arXiv:2104.12407  [pdf

    stat.ML cs.LG

    Predicting Depressive Symptom Severity through Individuals' Nearby Bluetooth Devices Count Data Collected by Mobile Phones: A Preliminary Longitudinal Study

    Authors: Yuezhou Zhang, Amos A Folarin, Shaoxiong Sun, Nicholas Cummins, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Callum Stewart, Petroula Laiou, Faith Matcham, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Aki Rintala, David C Mohr, Inez Myin-Germeys, Til Wykes, Josep Maria Haro, Brenda WJH Pennix, Vaibhav A Narayan, Peter Annas, Matthew Hotopf, Richard JB Dobson

    Abstract: The Bluetooth sensor embedded in mobile phones provides an unobtrusive, continuous, and cost-efficient means to capture individuals' proximity information, such as the nearby Bluetooth devices count (NBDC). The continuous NBDC data can partially reflect individuals' behaviors and status, such as social connections and interactions, working status, mobility, and social isolation and loneliness, whi… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

  18. arXiv:2104.09263  [pdf, other

    eess.SP cs.HC cs.LG

    Fitbeat: COVID-19 Estimation based on Wristband Heart Rate

    Authors: Shuo Liu, Jing Han, Estela Laporta Puyal, Spyridon Kontaxis, Shaoxiong Sun, Patrick Locatelli, Judith Dineley, Florian B. Pokorny, Gloria Dalla Costa, Letizia Leocan, Ana Isabel Guerrero, Carlos Nos, Ana Zabalza, Per Soelberg Sørensen, Mathias Buron, Melinda Magyari, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Callum Stewart, Amos A Folarin, Richard JB Dobson, Raquel Bailón, Srinivasan Vairavan, Nicholas Cummins , et al. (4 additional authors not shown)

    Abstract: This study investigates the potential of deep learning methods to identify individuals with suspected COVID-19 infection using remotely collected heart-rate data. The study utilises data from the ongoing EU IMI RADAR-CNS research project that is investigating the feasibility of wearable devices and smart phones to monitor individuals with multiple sclerosis (MS), depression or epilepsy. Aspart of… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: 34pages, 4figures

  19. Remote smartphone-based speech collection: acceptance and barriers in individuals with major depressive disorder

    Authors: Judith Dineley, Grace Lavelle, Daniel Leightley, Faith Matcham, Sara Siddi, Maria Teresa Peñarrubia-María, Katie M. White, Alina Ivan, Carolin Oetzmann, Sara Simblett, Erin Dawe-Lane, Stuart Bruce, Daniel Stahl, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Amos A. Folarin, Josep Maria Haro, Til Wykes, Richard J. B. Dobson, Vaibhav A. Narayan, Matthew Hotopf, Björn W. Schuller, Nicholas Cummins, The RADAR-CNS Consortium

    Abstract: The ease of in-the-wild speech recording using smartphones has sparked considerable interest in the combined application of speech, remote measurement technology (RMT) and advanced analytics as a research and healthcare tool. For this to be realised, the acceptability of remote speech collection to the user must be established, in addition to feasibility from an analytical perspective. To understa… ▽ More

    Submitted 30 August, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: Accepted to Interspeech 2021. Formatting changes + minor language edits

    ACM Class: H.1.2

    Journal ref: Proc. Interspeech 2021, pp. 631-635

  20. arXiv:2011.09361  [pdf, other

    cs.LG cs.CY

    A Knowledge Distillation Ensemble Framework for Predicting Short and Long-term Hospitalisation Outcomes from Electronic Health Records Data

    Authors: Zina M Ibrahim, Daniel Bean, Thomas Searle, Honghan Wu, Anthony Shek, Zeljko Kraljevic, James Galloway, Sam Norton, James T Teo, Richard JB Dobson

    Abstract: The ability to perform accurate prognosis of patients is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission from… ▽ More

    Submitted 11 June, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: 14 pages

  21. arXiv:2010.01165  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit

    Authors: Zeljko Kraljevic, Thomas Searle, Anthony Shek, Lukasz Roguski, Kawsar Noor, Daniel Bean, Aurelie Mascio, Leilei Zhu, Amos A Folarin, Angus Roberts, Rebecca Bendayan, Mark P Richardson, Robert Stewart, Anoop D Shah, Wai Keong Wong, Zina Ibrahim, James T Teo, Richard JB Dobson

    Abstract: Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a f… ▽ More

    Submitted 25 March, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Preprint: 27 Pages, 3 Figures

  22. arXiv:2009.09648  [pdf

    physics.soc-ph cs.SI q-bio.QM

    Measuring the effect of Non-Pharmaceutical Interventions (NPIs) on mobility during the COVID-19 pandemic using global mobility data

    Authors: Berber T Snoeijer, Mariska Burger, Shaoxiong Sun, Richard JB Dobson, Amos A Folarin

    Abstract: The implementation of governmental Non-Pharmaceutical Interventions (NPIs) has been the primary means of controlling the spread of the COVID-19 disease. The intended effect of these NPIs has been to reduce mobility. A strong reduction in mobility is believed to have a positive effect on the reduction of COVID-19 transmission by limiting the opportunity for the virus to spread in the population. Du… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: 16 pages, 6 figures

  23. Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset

    Authors: Thomas Searle, Zina Ibrahim, Richard JB Dobson

    Abstract: Clinical coding is currently a labour-intensive, error-prone, but critical administrative process whereby hospital patient episodes are manually assigned codes by qualified staff from large, standardised taxonomic hierarchies of codes. Automating clinical coding has a long history in NLP research and has recently seen novel developments setting new state of the art results. A popular dataset used… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Journal ref: ACL 2020

  24. arXiv:2004.14331  [pdf

    q-bio.QM cs.HC

    Using smartphones and wearable devices to monitor behavioural changes during COVID-19

    Authors: Shaoxiong Sun, Amos Folarin, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Callum Stewart, Nicholas Cummins, Faith Matcham, Gloria Dalla Costa, Sara Simblett, Letizia Leocani, Per Soelberg Sørensen, Mathias Buron, Ana Isabel Guerrero, Ana Zabalza, Brenda WJH Penninx, Femke Lamers, Sara Siddi, Josep Maria Haro, Inez Myin-Germeys, Aki Rintala, Til Wykes, Vaibhav A. Narayan, Giancarlo Comi, Matthew Hotopf , et al. (1 additional authors not shown)

    Abstract: We aimed to explore the utility of the recently developed open-source mobile health platform RADAR-base as a toolbox to rapidly test the effect and response to NPIs aimed at limiting the spread of COVID-19. We analysed data extracted from smartphone and wearable devices and managed by the RADAR-base from 1062 participants recruited in Italy, Spain, Denmark, the UK, and the Netherlands. We derived… ▽ More

    Submitted 22 July, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  25. arXiv:1903.03995  [pdf

    cs.CL cs.AI

    Efficiently Reusing Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: Methodology Study

    Authors: Honghan Wu, Karen Hodgson, Sue Dyson, Katherine I. Morley, Zina M. Ibrahim, Ehtesham Iqbal, Robert Stewart, Richard JB Dobson, Cathie Sudlow

    Abstract: Background: Many efforts have been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to construct comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve conver… ▽ More

    Submitted 23 October, 2019; v1 submitted 10 March, 2019; originally announced March 2019.