Skip to main content

Showing 1–50 of 52 results for author: Bhatt, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.11001  [pdf, ps, other

    cs.CY cs.AI

    RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

    Authors: Patricia Paskov, Kevin Wei, Shen Zhou Hong, Dan Bateyko, Xavier Roberts-Gaal, Carson Ezell, Gailius Praninskas, Valerie Chen, Umang Bhatt, Ella Guest

    Abstract: Human uplift studies - or studies that measure AI effects on human performance relative to a status quo, typically using randomized controlled trial (RCT) methodology - are increasingly used to inform deployment, governance, and safety decisions for frontier AI systems. While the methods underlying these studies are well-established, their interaction with the distinctive properties of frontier AI… ▽ More

    Submitted 11 March, 2026; originally announced March 2026.

  2. arXiv:2602.08754  [pdf, ps, other

    cs.AI cs.CY cs.HC

    Belief Offloading in Human-AI Interaction

    Authors: Rose E. Guingrich, Dvija Mehta, Umang Bhatt

    Abstract: What happens when people's beliefs are derived from information provided by an LLM? People's use of LLM chatbots as thought partners can contribute to cognitive offloading, which can have adverse effects on cognitive skills in cases of over-reliance. This paper defines and investigates a particular kind of cognitive offloading in human-AI interaction, "belief offloading," in which people's process… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

  3. arXiv:2509.08010  [pdf, ps, other

    cs.CY cs.AI cs.CL cs.HC

    Measuring and mitigating overreliance is necessary for building human-compatible AI

    Authors: Lujain Ibrahim, Katherine M. Collins, Sunnie S. Y. Kim, Anka Reuel, Max Lamparth, Kevin Feng, Lama Ahmad, Prajna Soni, Alia El Kattan, Merlin Stein, Siddharth Swaroop, Ilia Sucholutsky, Andrew Strait, Q. Vera Liao, Umang Bhatt

    Abstract: Large language models (LLMs) distinguish themselves from previous technologies by functioning as collaborative "thought partners," capable of engaging more fluidly in natural language. As LLMs increasingly influence consequential decisions across diverse domains from healthcare to personal advice, the risk of overreliance - relying on LLMs beyond their capabilities - grows. This position paper arg… ▽ More

    Submitted 8 September, 2025; originally announced September 2025.

  4. Beyond Quantification: Navigating Uncertainty in Professional AI Systems

    Authors: Sylvie Delacroix, Diana Robinson, Umang Bhatt, Jacopo Domenicucci, Jessica Montgomery, Gael Varoquaux, Carl Henrik Ek, Vincent Fortuin, Yulan He, Tom Diethe, Neill Campbell, Mennatallah El-Assady, Soren Hauberg, Ivana Dusparic, Neil Lawrence

    Abstract: The growing integration of large language models across professional domains transforms how experts make critical decisions in healthcare, education, and law. While significant research effort focuses on getting these systems to communicate their outputs with probabilistic measures of reliability, many consequential forms of uncertainty in professional contexts resist such quantification. A physic… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

    Journal ref: RSS Data Science (2025)

  5. arXiv:2508.14119  [pdf, ps, other

    cs.CY cs.AI cs.HC

    Documenting Deployment with Fabric: A Repository of Real-World AI Governance

    Authors: Mackenzie Jorgensen, Kendall Brogle, Katherine M. Collins, Lujain Ibrahim, Arina Shah, Petra Ivanovic, Noah Broestl, Gabriel Piles, Paul Dongha, Hatim Abdulhussein, Adrian Weller, Jillian Powers, Umang Bhatt

    Abstract: Artificial intelligence (AI) is increasingly integrated into society, from financial services and traffic management to creative writing. Academic literature on the deployment of AI has mostly focused on the risks and harms that result from the use of AI. We introduce Fabric, a publicly available repository of deployed AI use cases to outline their governance mechanisms. Through semi-structured in… ▽ More

    Submitted 29 August, 2025; v1 submitted 18 August, 2025; originally announced August 2025.

    Comments: AIES 2025

  6. arXiv:2508.07872  [pdf, ps, other

    cs.CY cs.HC cs.LG

    Unequal Uncertainty: Rethinking Algorithmic Interventions for Mitigating Discrimination from AI

    Authors: Holli Sargeant, Mackenzie Jorgensen, Arina Shah, Adrian Weller, Umang Bhatt

    Abstract: Uncertainty in artificial intelligence (AI) predictions poses urgent legal and ethical challenges for AI-assisted decision-making. We examine two algorithmic interventions that act as guardrails for human-AI collaboration: selective abstention, which withholds high-uncertainty predictions from human decision-makers, and selective friction, which delivers those predictions together with salient war… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

  7. arXiv:2506.13901  [pdf, ps, other

    cs.CL cs.AI

    Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations

    Authors: Abhilekh Borah, Chhavi Sharma, Danush Khanna, Utkarsh Bhatt, Gurpreet Singh, Hasnat Md Abdullah, Raghav Kaushik Ravi, Vinija Jain, Jyoti Patel, Shubham Singh, Vasu Sharma, Arpita Vats, Rahul Raja, Aman Chadha, Amitava Das

    Abstract: Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spo… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  8. arXiv:2503.13577  [pdf, other

    cs.MA cs.CY cs.LG

    When Should We Orchestrate Multiple Agents?

    Authors: Umang Bhatt, Sanyam Kapoor, Mihir Upadhyay, Ilia Sucholutsky, Francesco Quinzan, Katherine M. Collins, Adrian Weller, Andrew Gordon Wilson, Muhammad Bilal Zafar

    Abstract: Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost diff… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  9. arXiv:2502.18635  [pdf, other

    cs.LG cs.AI cs.CL

    Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems

    Authors: Matthew Barker, Andrew Bell, Evan Thomas, James Carr, Thomas Andrews, Umang Bhatt

    Abstract: While Retrieval Augmented Generation (RAG) has emerged as a popular technique for improving Large Language Model (LLM) systems, it introduces a large number of choices, parameters and hyperparameters that must be made or tuned. This includes the LLM, embedding, and ranker models themselves, as well as hyperparameters governing individual RAG components. Yet, collectively optimizing the entire conf… ▽ More

    Submitted 8 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    MSC Class: 68T20; 68Q32; 90C29; 62P30 ACM Class: I.2.6; I.2.7; G.1.6; G.3

  10. arXiv:2501.10476  [pdf, other

    cs.AI cs.LG

    Revisiting Rogers' Paradox in the Context of Human-AI Interaction

    Authors: Katherine M. Collins, Umang Bhatt, Ilia Sucholutsky

    Abstract: Humans learn about the world, and how to act in the world, in many ways: from individually conducting experiments to observing and reproducing others' behavior. Different learning strategies come with different costs and likelihoods of successfully learning more about the world. The choice that any one individual makes of how to learn can have an impact on the collective understanding of a whole p… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: Pre-print

  11. arXiv:2408.03943  [pdf, other

    cs.HC cs.AI cs.LG

    Building Machines that Learn and Think with People

    Authors: Katherine M. Collins, Ilia Sucholutsky, Umang Bhatt, Kartik Chandra, Lionel Wong, Mina Lee, Cedegao E. Zhang, Tan Zhi-Xuan, Mark Ho, Vikash Mansinghka, Adrian Weller, Joshua B. Tenenbaum, Thomas L. Griffiths

    Abstract: What do we want from machine intelligence? We envision machines that are not just tools for thought, but partners in thought: reasonable, insightful, knowledgeable, reliable, and trustworthy systems that think with us. Current artificial intelligence (AI) systems satisfy some of these criteria, some of the time. In this Perspective, we show how the science of collaborative cognition can be put to… ▽ More

    Submitted 21 July, 2024; originally announced August 2024.

  12. arXiv:2407.12804  [pdf, other

    cs.HC cs.AI cs.LG

    Modulating Language Model Experiences through Frictions

    Authors: Katherine M. Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, Umang Bhatt

    Abstract: Language models are transforming the ways that their users engage with the world. Despite impressive capabilities, over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term. How can we develop scaffolding around language models to curate more appropriate use? We propose selective frictions… ▽ More

    Submitted 18 November, 2024; v1 submitted 24 June, 2024; originally announced July 2024.

    Comments: NeurIPS Workshop on Behavioral ML; non-archival

  13. arXiv:2406.08391  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models Must Be Taught to Know What They Don't Know

    Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

    Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati… ▽ More

    Submitted 17 August, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 Camera Ready

  14. arXiv:2406.04302  [pdf, other

    cs.LG

    Representational Alignment Supports Effective Machine Teaching

    Authors: Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. Griffiths

    Abstract: A good teacher should not only be knowledgeable, but should also be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we introduce a new controlled experimental setting, GRADE, to study pedagogy and representational alignment. We use GRADE through a series of machine-machine and machine-human teaching experiments to chara… ▽ More

    Submitted 4 February, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint

  15. arXiv:2405.10632  [pdf, ps, other

    cs.CY cs.AI cs.HC

    Towards interactive evaluations for interaction harms in human-AI systems

    Authors: Lujain Ibrahim, Saffron Huang, Umang Bhatt, Lama Ahmad, Markus Anderljung

    Abstract: Current AI evaluation methods, which rely on static, model-only tests, fail to account for harms that emerge through sustained human-AI interaction. As AI systems proliferate and are increasingly integrated into real-world applications, this disconnect between evaluation approaches and actual usage becomes more significant. In this paper, we propose a shift towards evaluation based on \textit{inte… ▽ More

    Submitted 30 July, 2025; v1 submitted 17 May, 2024; originally announced May 2024.

  16. arXiv:2403.17885  [pdf, other

    cs.DB

    Empirical Analysis of EIP-3675: Miner Dynamics, Transaction Fees, and Transaction Time

    Authors: Umesh Bhatt, Sarvesh Pandey

    Abstract: The Ethereum Improvement Proposal 3675 (EIP-3675) marks a significant shift, transitioning from a Proof of Work (PoW) to a Proof of Stake (PoS) consensus mechanism. This transition resulted in a staggering 99.95% decrease in energy consumption. However, the transition prompts two critical questions: (1). How does EIP-3675 affect miners' dynamics? and (2). How do users determine priority fees, cons… ▽ More

    Submitted 2 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Revised version

  17. arXiv:2402.18326  [pdf, other

    cs.CY cs.AI

    When Should Algorithms Resign? A Proposal for AI Governance

    Authors: Umang Bhatt, Holli Sargeant

    Abstract: Algorithmic resignation is a strategic approach for managing the use of artificial intelligence (AI) by embedding governance directly into AI systems. It involves deliberate and informed disengagement from AI, such as restricting access AI outputs or displaying performance disclaimers, in specific scenarios to aid the appropriate and effective use of AI. By integrating algorithmic resignation as a… ▽ More

    Submitted 16 July, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  18. arXiv:2402.03618  [pdf, other

    cs.AI cs.CL q-bio.NC

    Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial Reproduction

    Authors: Sreejan Kumar, Raja Marjieh, Byron Zhang, Declan Campbell, Michael Y. Hu, Umang Bhatt, Brenden Lake, Thomas L. Griffiths

    Abstract: Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often commu… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  19. arXiv:2307.15475  [pdf, other

    cs.HC cs.AI cs.LG

    FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

    Authors: Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine M. Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, Umang Bhatt

    Abstract: Even though machine learning (ML) pipelines affect an increasing array of stakeholders, there is little work on how input from stakeholders is recorded and incorporated. We propose FeedbackLogs, addenda to existing documentation of ML pipelines, to track the input of multiple stakeholders. Each log records important details about the feedback collection process, the feedback itself, and how the fe… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  20. arXiv:2306.08424  [pdf, other

    cs.HC cs.AI cs.LG

    Selective Concept Models: Permitting Stakeholder Customisation at Test-Time

    Authors: Matthew Barker, Katherine M. Collins, Krishnamurthy Dvijotham, Adrian Weller, Umang Bhatt

    Abstract: Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We propose Selective COncept Models (SCOMs) which make predictions using only a subset of concepts and can be customised by stakeholders at test-time according to t… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  21. arXiv:2306.01694  [pdf, other

    cs.LG cs.HC

    Evaluating Language Models for Mathematics through Interactions

    Authors: Katherine M. Collins, Albert Q. Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B. Tenenbaum, William Hart, Timothy Gowers, Wenda Li, Adrian Weller, Mateja Jamnik

    Abstract: There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs, and is insufficient for making an informed decision about which LLMs and under which assistive settings can they be sensibly used. Static assessment fails to a… ▽ More

    Submitted 5 November, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

  22. arXiv:2304.06701  [pdf, other

    cs.LG cs.AI cs.CY cs.HC

    Learning Personalized Decision Support Policies

    Authors: Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar

    Abstract: Individual human decision-makers may benefit from different forms of support to improve decision outcomes, but when each form of support will yield better outcomes? In this work, we posit that personalizing access to decision support tools can be an effective mechanism for instantiating the appropriate use of AI assistance. Specifically, we propose the general problem of learning a decision suppor… ▽ More

    Submitted 23 January, 2025; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: AAAI 2025

  23. arXiv:2303.12872  [pdf, other

    cs.HC cs.AI cs.LG

    Human Uncertainty in Concept-Based AI Systems

    Authors: Katherine M. Collins, Matthew Barker, Mateo Espinosa Zarlenga, Naveen Raman, Umang Bhatt, Mateja Jamnik, Ilia Sucholutsky, Adrian Weller, Krishnamurthy Dvijotham

    Abstract: Placing a human in the loop may abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such human-AI interactions is an important and understudied issue. In this work, we study human uncertainty in the context of concept-based models, a family of AI systems t… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  24. Harms from Increasingly Agentic Algorithmic Systems

    Authors: Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj

    Abstract: Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems… ▽ More

    Submitted 11 May, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted at FAccT 2023

  25. Towards Robust Metrics for Concept Representation Evaluation

    Authors: Mateo Espinosa Zarlenga, Pietro Barbiero, Zohreh Shams, Dmitry Kazhdan, Umang Bhatt, Adrian Weller, Mateja Jamnik

    Abstract: Recent work on interpretability has focused on concept-based explanations, where deep learning models are explained in terms of high-level units of information, referred to as concepts. Concept learning models, however, have been shown to be prone to encoding impurities in their representations, failing to fully capture meaningful features of their inputs. While concept learning lacks metrics to m… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: To appear at AAAI 2023

    MSC Class: 68T07 ACM Class: I.2.6

  26. arXiv:2211.01407  [pdf, other

    cs.LG cs.AI

    On the Informativeness of Supervision Signals

    Authors: Ilia Sucholutsky, Ruairidh M. Battleday, Katherine M. Collins, Raja Marjieh, Joshua C. Peterson, Pulkit Singh, Umang Bhatt, Nori Jacoby, Adrian Weller, Thomas L. Griffiths

    Abstract: Supervised learning typically focuses on learning transferable representations from training examples annotated by humans. While rich annotations (like soft labels) carry more information than sparse annotations (like hard labels), they are also more expensive to collect. For example, while hard labels only provide information about the closest class an object belongs to (e.g., "this is a dog"), s… ▽ More

    Submitted 4 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Proceedings of UAI 2023

  27. arXiv:2211.01202  [pdf, other

    cs.LG cs.CV cs.HC

    Human-in-the-Loop Mixup

    Authors: Katherine M. Collins, Umang Bhatt, Weiyang Liu, Vihari Piratla, Ilia Sucholutsky, Bradley Love, Adrian Weller

    Abstract: Aligning model representations to humans has been found to improve robustness and generalization. However, such methods often focus on standard observational data. Synthetic data is proliferating and powering many advances in machine learning; yet, it is not always clear whether synthetic labels are perceptually aligned to humans -- rendering it likely model representations are not human aligned.… ▽ More

    Submitted 30 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  28. arXiv:2210.17467  [pdf, other

    cs.LG cs.AI cs.CV

    Iterative Teaching by Data Hallucination

    Authors: Zeju Qiu, Weiyang Liu, Tim Z. Xiao, Zhen Liu, Umang Bhatt, Yucen Luo, Adrian Weller, Bernhard Schölkopf

    Abstract: We consider the problem of iterative machine teaching, where a teacher sequentially provides examples based on the status of a learner under a discrete input space (i.e., a pool of finite samples), which greatly limits the teacher's capability. To address this issue, we study iterative teaching under a continuous input space where the input example (i.e., image) can be either generated by solving… ▽ More

    Submitted 12 April, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: AISTATS 2023 (v2: 22 pages, 24 figures)

  29. arXiv:2210.04714  [pdf, other

    cs.CL cs.LG stat.ML

    Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

    Authors: Yuxin Xiao, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when w… ▽ More

    Submitted 14 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022 (Findings)

  30. arXiv:2207.02726  [pdf, ps, other

    cs.LG cs.AI cs.HC eess.SP

    Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users

    Authors: Ana Lucic, Sheeraz Ahmad, Amanda Furtado Brinhosa, Vera Liao, Himani Agrawal, Umang Bhatt, Krishnaram Kenthapadi, Alice Xiang, Maarten de Rijke, Nicholas Drabowski

    Abstract: When using medical images for diagnosis, either by clinicians or artificial intelligence (AI) systems, it is important that the images are of high quality. When an image is of low quality, the medical exam that produced the image often needs to be redone. In telemedicine, a common problem is that the quality issue is only flagged once the patient has left the clinic, meaning they must return in or… ▽ More

    Submitted 20 August, 2025; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML 2022 Workshop on Interpretable ML in Healthcare

  31. arXiv:2207.00810  [pdf, other

    cs.LG cs.AI cs.CY cs.HC

    Eliciting and Learning with Soft Labels from Every Annotator

    Authors: Katherine M. Collins, Umang Bhatt, Adrian Weller

    Abstract: The labels used to train machine learning (ML) models are of paramount importance. Typically for ML classification tasks, datasets contain hard labels, yet learning using soft labels has been shown to yield benefits for model generalization, robustness, and calibration. Earlier work found success in forming soft labels from multiple annotators' hard labels; however, this approach may not converge… ▽ More

    Submitted 29 August, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

    Comments: Accepted as a Full Paper at the 2022 AAAI Conference on Human Computation and Crowdsourcing

    Journal ref: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing. Vol. 10. 2022

  32. arXiv:2205.06905  [pdf, other

    cs.LG

    Perspectives on Incorporating Expert Feedback into Model Updates

    Authors: Valerie Chen, Umang Bhatt, Hoda Heidari, Adrian Weller, Ameet Talwalkar

    Abstract: Machine learning (ML) practitioners are increasingly tasked with developing models that are aligned with non-technical experts' values and goals. However, there has been insufficient consideration on how practitioners should translate domain expertise into ML updates. In this paper, we consider how to capture interactions between practitioners and experts systematically. We devise a taxonomy to ma… ▽ More

    Submitted 16 July, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

  33. arXiv:2205.01411  [pdf, other

    cs.AI cs.HC

    On the Utility of Prediction Sets in Human-AI Teams

    Authors: Varun Babbar, Umang Bhatt, Adrian Weller

    Abstract: Research on human-AI teams usually provides experts with a single label, which ignores the uncertainty in a model's recommendation. Conformal prediction (CP) is a well established line of research that focuses on building a theoretically grounded, calibrated prediction set, which may contain multiple labels. We explore how such prediction sets impact expert decision-making in human-AI teams. Our e… ▽ More

    Submitted 26 May, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: Accepted at IJCAI 2022

  34. arXiv:2204.09718  [pdf

    cs.CL cs.AI cs.LG

    Matching Writers to Content Writing Tasks

    Authors: Narayana Darapaneni, Chandrashekhar Bhakuni, Ujjval Bhatt, Khamir Purohit, Vikas Sardna, Prabir Chakraborty, Anwesh Reddy Paduri

    Abstract: Businesses need content. In various forms and formats and for varied purposes. In fact, the content marketing industry is set to be worth $412.88 billion by the end of 2021. However, according to the Content Marketing Institute, creating engaging content is the #1 challenge that marketers face today. We under-stand that producing great content requires great writers who understand the business and… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  35. arXiv:2202.01315  [pdf, other

    cs.LG stat.AP

    Approximating Full Conformal Prediction at Scale via Influence Functions

    Authors: Javier Abad, Umang Bhatt, Adrian Weller, Giovanni Cherubin

    Abstract: Conformal prediction (CP) is a wrapper around traditional machine learning models, giving coverage guarantees under the sole assumption of exchangeability; in classification problems, for a chosen significance level $\varepsilon$, CP guarantees that the error rate is at most $\varepsilon$, irrespective of whether the underlying model is misspecified. However, the prohibitive computational costs of… ▽ More

    Submitted 22 February, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: 18 pages, 13 figures

  36. arXiv:2112.02646  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Diverse, Global and Amortised Counterfactual Explanations for Uncertainty Estimates

    Authors: Dan Ley, Umang Bhatt, Adrian Weller

    Abstract: To interpret uncertainty estimates from differentiable probabilistic models, recent work has proposed generating a single Counterfactual Latent Uncertainty Explanation (CLUE) for a given data point where the model is uncertain, identifying a single, on-manifold change to the input such that the model becomes more certain in its prediction. We broaden the exploration to examine $δ$-CLUE, the set of… ▽ More

    Submitted 8 December, 2021; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: Accepted as a conference paper to AAAI 2022

  37. arXiv:2107.05978  [pdf, other

    cs.LG cs.AI cs.CY

    DIVINE: Diverse Influential Training Points for Data Visualization and Model Refinement

    Authors: Umang Bhatt, Isabel Chien, Muhammad Bilal Zafar, Adrian Weller

    Abstract: As the complexity of machine learning (ML) models increases, resulting in a lack of prediction explainability, several methods have been developed to explain a model's behavior in terms of the training data points that most influence the model. However, these methods tend to mark outliers as highly influential points, limiting the insights that practitioners can draw from points that are not repre… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: 30 pages, 32 figures

  38. arXiv:2105.04289  [pdf, other

    cs.LG cs.AI

    Do Concept Bottleneck Models Learn as Intended?

    Authors: Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, Adrian Weller

    Abstract: Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretabil… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted at ICLR 2021 Workshop on Responsible AI

  39. arXiv:2104.06323  [pdf, other

    cs.LG cs.AI stat.ML

    δ-CLUE: Diverse Sets of Explanations for Uncertainty Estimates

    Authors: Dan Ley, Umang Bhatt, Adrian Weller

    Abstract: To interpret uncertainty estimates from differentiable probabilistic models, recent work has proposed generating Counterfactual Latent Uncertainty Explanations (CLUEs). However, for a single input, such approaches could output a variety of explanations due to the lack of constraints placed on the explanation. Here we augment the original CLUE approach, to provide what we call $δ$-CLUE. CLUE indica… ▽ More

    Submitted 3 December, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Appeared as a workshop paper at ICLR 2021 (Responsible AI | Secure ML | Robust ML)

  40. arXiv:2103.14976  [pdf, ps, other

    cs.HC

    A Multistakeholder Approach Towards Evaluating AI Transparency Mechanisms

    Authors: Ana Lucic, Madhulika Srikumar, Umang Bhatt, Alice Xiang, Ankur Taly, Q. Vera Liao, Maarten de Rijke

    Abstract: Given that there are a variety of stakeholders involved in, and affected by, decisions from machine learning (ML) models, it is important to consider that different stakeholders have different transparency needs. Previous work found that the majority of deployed transparency mechanisms primarily serve technical stakeholders. In our work, we want to investigate how well transparency mechanisms migh… ▽ More

    Submitted 1 June, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: Accepted to CHI 2021 Workshop on Operationalizing Human-Centered Perspectives in Explainable AI

  41. arXiv:2011.07586  [pdf, other

    cs.CY cs.HC cs.LG

    Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty

    Authors: Umang Bhatt, Javier Antorán, Yunfeng Zhang, Q. Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Gauthier Melançon, Ranganath Krishnan, Jason Stanley, Omesh Tickoo, Lama Nachman, Rumi Chunara, Madhulika Srikumar, Adrian Weller, Alice Xiang

    Abstract: Algorithmic transparency entails exposing system properties to various stakeholders for purposes that include understanding, improving, and contesting predictions. Until now, most research into algorithmic transparency has predominantly focused on explainability. Explainability attempts to provide reasons for a machine learning model's behavior to stakeholders. However, understanding a model's spe… ▽ More

    Submitted 4 May, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

    Comments: AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) 2021

  42. arXiv:2010.06529  [pdf, other

    cs.LG cs.AI stat.ML

    On the Fairness of Causal Algorithmic Recourse

    Authors: Julius von Kügelgen, Amir-Hossein Karimi, Umang Bhatt, Isabel Valera, Adrian Weller, Bernhard Schölkopf

    Abstract: Algorithmic fairness is typically studied from the perspective of predictions. Instead, here we investigate fairness from the perspective of recourse actions suggested to individuals to remedy an unfavourable classification. We propose two new fairness criteria at the group and individual level, which -- unlike prior work on equalising the average group-wise distance from the decision boundary --… ▽ More

    Submitted 6 March, 2022; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: AAAI 2022 extended camera-ready version with technical appendices. (9 pages main paper + references + appendices)

  43. arXiv:2007.05408  [pdf, ps, other

    cs.CY cs.AI

    Machine Learning Explainability for External Stakeholders

    Authors: Umang Bhatt, McKane Andrus, Adrian Weller, Alice Xiang

    Abstract: As machine learning is increasingly deployed in high-stakes contexts affecting people's livelihoods, there have been growing calls to open the black box and to make machine learning algorithms more explainable. Providing useful explanations requires careful consideration of the needs of stakeholders, including end-users, regulators, and domain experts. Despite this need, little work has been done… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  44. arXiv:2006.06848  [pdf, other

    stat.ML cs.LG

    Getting a CLUE: A Method for Explaining Uncertainty Estimates

    Authors: Javier Antorán, Umang Bhatt, Tameem Adel, Adrian Weller, José Miguel Hernández-Lobato

    Abstract: Both uncertainty estimation and interpretability are important factors for trustworthy machine learning systems. However, there is little work at the intersection of these two areas. We address this gap by proposing a novel method for interpreting uncertainty estimates from differentiable probabilistic models, like Bayesian Neural Networks (BNNs). Our method, Counterfactual Latent Uncertainty Expl… ▽ More

    Submitted 18 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted as an oral presentation at ICLR 2021

  45. arXiv:2005.00631  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Evaluating and Aggregating Feature-based Model Explanations

    Authors: Umang Bhatt, Adrian Weller, José M. F. Moura

    Abstract: A feature-based model explanation denotes how much each input feature contributes to a model's output for a given data point. As the number of proposed explanation functions grows, we lack quantitative evaluation criteria to help practitioners know when to use which explanation function. This paper proposes quantitative evaluation criteria for feature-based explanations: low sensitivity, high fait… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted at IJCAI 2020

  46. arXiv:1910.11459  [pdf, other

    cs.HC cs.RO

    A Robot's Expressive Language Affects Human Strategy and Perceptions in a Competitive Game

    Authors: Aaron M. Roth, Samantha Reig, Umang Bhatt, Jonathan Shulgach, Tamara Amin, Afsaneh Doryab, Fei Fang, Manuela Veloso

    Abstract: As robots are increasingly endowed with social and communicative capabilities, they will interact with humans in more settings, both collaborative and competitive. We explore human-robot relationships in the context of a competitive Stackelberg Security Game. We vary humanoid robot expressive language (in the form of "encouraging" or "discouraging" verbal commentary) and measure the impact on part… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: RO-MAN 2019; 8 pages, 4 figures, 1 table

    Journal ref: Proceedings of the 28th IEEE International Conference on Robot Human Interactive Communication, New Delhi, India, October 2019

  47. arXiv:1909.06342  [pdf, ps, other

    cs.LG cs.AI cs.CY cs.HC stat.ML

    Explainable Machine Learning in Deployment

    Authors: Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José M. F. Moura, Peter Eckersley

    Abstract: Explainable machine learning offers the potential to provide stakeholders with insights into model behavior by using various methods such as feature importance scores, counterfactual explanations, or influential training data. Yet there is little understanding of how organizations use these methods in practice. This study explores how organizations view and use explainability for stakeholder consu… ▽ More

    Submitted 10 July, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: ACM Conference on Fairness, Accountability, and Transparency 2020

  48. arXiv:1901.10040  [pdf

    cs.LG cs.AI stat.ML

    Towards Aggregating Weighted Feature Attributions

    Authors: Umang Bhatt, Pradeep Ravikumar, Jose M. F. Moura

    Abstract: Current approaches for explaining machine learning models fall into two distinct classes: antecedent event influence and value attribution. The former leverages training instances to describe how much influence a training point exerts on a test point, while the latter attempts to attribute value to the features most pertinent to a given prediction. In this work, we discuss an algorithm, AVA: Aggre… ▽ More

    Submitted 20 January, 2019; originally announced January 2019.

    Comments: In AAAI-19 Workshop on Network Interpretability for Deep Learning

  49. On Network Science and Mutual Information for Explaining Deep Neural Networks

    Authors: Brian Davis, Umang Bhatt, Kartikeya Bhardwaj, Radu Marculescu, José M. F. Moura

    Abstract: In this paper, we present a new approach to interpret deep learning models. By coupling mutual information with network science, we explore how information flows through feedforward networks. We show that efficiently approximating mutual information allows us to create an information measure that quantifies how much information flows between any two neurons of a deep learning model. To that end, w… ▽ More

    Submitted 3 May, 2020; v1 submitted 20 January, 2019; originally announced January 2019.

    Comments: ICASSP 2020 (shorter version appeared at AAAI-19 Workshop on Network Interpretability for Deep Learning)

  50. arXiv:1806.03671  [pdf, other

    cs.HC cs.AI cs.RO

    The Impact of Humanoid Affect Expression on Human Behavior in a Game-Theoretic Setting

    Authors: Aaron M. Roth, Umang Bhatt, Tamara Amin, Afsaneh Doryab, Fei Fang, Manuela Veloso

    Abstract: With the rapid development of robot and other intelligent and autonomous agents, how a human could be influenced by a robot's expressed mood when making decisions becomes a crucial question in human-robot interaction. In this pilot study, we investigate (1) in what way a robot can express a certain mood to influence a human's decision making behavioral model; (2) how and to what extent the human w… ▽ More

    Submitted 10 June, 2018; originally announced June 2018.

    Comments: presented at 1st Workshop on Humanizing AI (HAI) at IJCAI'18 in Stockholm, Sweden