Skip to main content

Showing 1–30 of 30 results for author: Mishra, B D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.02600  [pdf, ps, other

    cs.HC cs.AI

    LitPivot: Developing Well-Situated Research Ideas Through Dynamic Contextualization and Critique within the Literature Landscape

    Authors: Hita Kambhamettu, Bhavana Dalvi Mishra, Andrew Head, Jonathan Bragg, Aakanksha Naik, Joseph Chee Chang, Pao Siangliulue

    Abstract: Developing a novel research idea is hard. It must be distinct enough from prior work to claim a contribution while also building on it. This requires iteratively reviewing literature and refining an idea based on what a researcher reads; yet when an idea changes, the literature that matters often changes with it. Most tools offer limited support for this interplay: literature tools help researcher… ▽ More

    Submitted 13 April, 2026; v1 submitted 2 April, 2026; originally announced April 2026.

  2. arXiv:2602.02660  [pdf, ps, other

    cs.AI

    MARS: Modular Agent with Reflective Search for Automated AI Research

    Authors: Jiefeng Chen, Bhavana Dalvi Mishra, Jaehyun Nam, Rui Meng, Tomas Pfister, Jinsung Yoon

    Abstract: Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research.… ▽ More

    Submitted 16 February, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

  3. arXiv:2510.00620  [pdf, ps, other

    cs.AI cs.CL

    HARPA: A Testability-Driven, Literature-Grounded Framework for Research Ideation

    Authors: Rosni Vasu, Peter Jansen, Pao Siangliulue, Cristina Sarasua, Abraham Bernstein, Peter Clark, Bhavana Dalvi Mishra

    Abstract: While there has been a surge of interest in automated scientific discovery (ASD), especially with the emergence of LLMs, it remains challenging for tools to generate hypotheses that are both testable and grounded in the scientific literature. Additionally, existing ideation tools are not adaptive to prior experimental outcomes. We developed HARPA to address these challenges by incorporating the id… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

    Comments: 10 pages (main), 65 pages total

  4. arXiv:2507.00310  [pdf, ps, other

    cs.LG cs.AI cs.CL

    AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

    Authors: Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson, Megha Chakravorty, Satvika Reddy Gavireddy, Aditya Parashar, Harshit Surana, Bhavana Dalvi Mishra, Andrew McCallum, Ashish Sabharwal, Peter Clark

    Abstract: The promise of autonomous scientific discovery (ASD) hinges not only on answering questions, but also on knowing which questions to ask. Most recent works in ASD explore the use of large language models (LLMs) in goal-driven settings, relying on human-specified research questions to guide hypothesis generation. However, scientific discovery may be accelerated further by allowing the AI system to d… ▽ More

    Submitted 12 February, 2026; v1 submitted 30 June, 2025; originally announced July 2025.

    Comments: Accepted to NeurIPS 2025: https://neurips.cc/virtual/2025/loc/san-diego/poster/116398

  5. arXiv:2506.12937  [pdf, ps, other

    cs.AI cs.CL

    HypER: Literature-grounded Hypothesis Generation and Distillation with Provenance

    Authors: Rosni Vasu, Chandrayee Basu, Bhavana Dalvi Mishra, Cristina Sarasua, Peter Clark, Abraham Bernstein

    Abstract: Large Language models have demonstrated promising performance in research ideation across scientific domains. Hypothesis development, the process of generating a highly specific declarative statement connecting a research idea with empirical validation, has received relatively less attention. Existing approaches trivially deploy retrieval augmentation and focus only on the quality of the final out… ▽ More

    Submitted 21 August, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

    Comments: EMNLP 2025, 26 pages (9 pages: main paper body)

  6. arXiv:2503.22708  [pdf, other

    cs.AI cs.CL

    CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation

    Authors: Peter Jansen, Oyvind Tafjord, Marissa Radensky, Pao Siangliulue, Tom Hope, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Daniel S. Weld, Peter Clark

    Abstract: Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluat… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: 98 Pages (13 pages: main paper body; 85 pages: appendix)

    Journal ref: ACL 2025 (Findings)

  7. arXiv:2502.15147  [pdf, other

    cs.CL

    Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision

    Authors: Zhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana, Julian McAuley, Peter Clark, Bodhisattwa Prasad Majumder

    Abstract: Instruction-following LLMs have recently allowed systems to discover hidden concepts from a collection of unstructured documents based on a natural language description of the purpose of the discovery (i.e., goal). Still, the quality of the discovered concepts remains mixed, as it depends heavily on LLM's reasoning ability and drops when the data is noisy or beyond LLM's knowledge. We present Inst… ▽ More

    Submitted 27 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: NAACL 2025

  8. arXiv:2412.17701  [pdf, other

    cs.CL

    From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question Answering

    Authors: Nathaniel Weir, Bhavana Dalvi Mishra, Orion Weller, Oyvind Tafjord, Sam Hornstein, Alexander Sabol, Peter Jansen, Benjamin Van Durme, Peter Clark

    Abstract: Recent reasoning methods (e.g., chain-of-thought, entailment reasoning) help users understand how language models (LMs) answer a single question, but they do little to reveal the LM's overall understanding, or "theory," about the question's topic, making it still hard to trust the model. Our goal is to materialize such theories - here called microtheories (a linguistic analog of logical microtheor… ▽ More

    Submitted 23 December, 2024; v1 submitted 23 December, 2024; originally announced December 2024.

  9. IdeaSynth: Iterative Research Idea Development Through Evolving and Composing Idea Facets with Literature-Grounded Feedback

    Authors: Kevin Pu, K. J. Kevin Feng, Tovi Grossman, Tom Hope, Bhavana Dalvi Mishra, Matt Latzke, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue

    Abstract: Research ideation involves broad exploring and deep refining ideas. Both require deep engagement with literature. Existing tools focus primarily on idea broad generation, yet offer little support for iterative specification, refinement, and evaluation needed to further develop initial ideas. To bridge this gap, we introduce IdeaSynth, a research idea development system that uses LLMs to provide li… ▽ More

    Submitted 15 July, 2025; v1 submitted 5 October, 2024; originally announced October 2024.

  10. arXiv:2407.01725  [pdf, other

    cs.CL cs.AI cs.LG

    DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

    Authors: Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

    Abstract: Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systemat… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Website: https://github.com/allenai/discoverybench

  11. arXiv:2406.06769  [pdf, other

    cs.AI cs.CL

    DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

    Authors: Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark

    Abstract: Automated scientific discovery promises to accelerate progress across scientific domains. However, developing and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's abil… ▽ More

    Submitted 7 October, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to NeurIPS 2024 (Benchmark Track, Spotlight)

  12. arXiv:2402.14798  [pdf, other

    cs.CL cs.AI

    Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

    Authors: Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme

    Abstract: Recent language models enable new opportunities for structured reasoning with text, such as the construction of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited… ▽ More

    Submitted 12 August, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  13. arXiv:2402.03244  [pdf, other

    cs.LG cs.CL

    Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

    Authors: Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Sameer Singh, Peter Clark, Roy Fox

    Abstract: Large language models (LLMs) have recently been used for sequential decision making in interactive environments. However, leveraging environment reward signals for continual LLM actor improvement is not straightforward. We propose Skill Set Optimization (SSO) for improving LLM actor performance through constructing and refining sets of transferable skills. SSO constructs skills by extracting commo… ▽ More

    Submitted 22 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  14. arXiv:2312.07527  [pdf, other

    cs.CL cs.AI

    BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability

    Authors: Peter Clark, Bhavana Dalvi Mishra, Oyvind Tafjord

    Abstract: While there are numerous benchmarks comparing the performance of modern language models (LMs), end-task evaluations often conflate notions of *factual accuracy* ("truth") and *reasoning ability* ("rationality", or "honesty" in the sense of correctly reporting implications of beliefs). Our goal is a dataset that clearly distinguishes these two notions. Our approach is to leverage and extend a colle… ▽ More

    Submitted 23 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Added note about how dataset sampling was performed

  15. arXiv:2310.10134  [pdf, other

    cs.CL cs.AI cs.LG

    CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization

    Authors: Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark

    Abstract: Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities, these agents to date do not continually improve over time beyond performance refinement on a specific task. Here we present C… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Project page: https://allenai.github.io/clin/

  16. arXiv:2212.10029  [pdf, other

    cs.CL cs.AI

    Do language models have coherent mental models of everyday things?

    Authors: Yuling Gu, Bhavana Dalvi Mishra, Peter Clark

    Abstract: When people think of everyday things like an egg, they typically have a mental image associated with it. This allows them to correctly judge, for example, that "the yolk surrounds the shell" is a false statement. Do language models similarly have a coherent picture of such everyday things? To investigate this, we propose a benchmark dataset consisting of 100 everyday things, their parts, and the r… ▽ More

    Submitted 8 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  17. arXiv:2210.16407  [pdf, other

    cs.CL

    Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

    Authors: Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark

    Abstract: Figurative language (e.g., "he flew like the wind") is challenging to understand, as it is hard to tell what implicit information is being conveyed from the surface form alone. We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language. We present DREAM-FLUTE, a figurative language understanding sys… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted at The Third Workshop on Figurative Language Processing @ EMNLP 2022

  18. arXiv:2210.12217  [pdf, other

    cs.AI cs.CL

    Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning

    Authors: Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark

    Abstract: Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning. Such a capability would allow better understanding of why a model produced the answer it did. Our approach is to recursively combine a trained backward-chaining model, capable of generating a set of premises entailing an answer hypothesis, with a v… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: accepted at EMNLP 2022. arXiv admin note: substantial text overlap with arXiv:2204.13074

  19. arXiv:2204.13074  [pdf, other

    cs.CL cs.AI

    Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement

    Authors: Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Clark

    Abstract: Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct its errors so that the system improves over time. Our approach is to augment a QA model with a dynamic memory of user feedback, containing user-supplied corrections to erroneous model beliefs that users identify during interaction. Retrievals from memory ar… ▽ More

    Submitted 21 October, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

    Comments: accepted at EMNLP 2022

  20. arXiv:2112.08656  [pdf, other

    cs.CL cs.AI

    DREAM: Improving Situational QA by First Elaborating the Situation

    Authors: Yuling Gu, Bhavana Dalvi Mishra, Peter Clark

    Abstract: When people answer questions about a specific situation, e.g., "I cheated on my mid-term exam last week. Was that wrong?", cognitive science suggests that they form a mental picture of that situation before answering. While we do not know how language models (LMs) answer such questions, we conjecture that they may answer more accurately if they are also provided with additional details about the q… ▽ More

    Submitted 5 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: to be published in NAACL 2022

  21. arXiv:2102.03315  [pdf, other

    cs.CL cs.AI

    Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

    Authors: Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark

    Abstract: We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  22. arXiv:2012.13048  [pdf, other

    cs.CL cs.AI

    ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language

    Authors: Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark

    Abstract: Transformers have been shown to emulate logical deduction over natural language theories (logical rules expressed in natural language), reliably assigning true/false labels to candidate implications. However, their ability to generate implications of a theory has not yet been demonstrated, and methods for reconstructing proofs of answers are imperfect. In this work we show that a generative model,… ▽ More

    Submitted 3 June, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: Findings of ACL 2021

  23. arXiv:2011.08092  [pdf, other

    cs.CL

    A Dataset for Tracking Entities in Open Domain Procedural Text

    Authors: Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy

    Abstract: We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a sm… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: To appear in EMNLP 2020

  24. arXiv:2003.13878  [pdf, other

    cs.CL

    Procedural Reading Comprehension with Attribute-Aware Context Flow

    Authors: Aida Amini, Antoine Bosselut, Bhavana Dalvi Mishra, Yejin Choi, Hannaneh Hajishirzi

    Abstract: Procedural texts often describe processes (e.g., photosynthesis and cooking) that happen over entities (e.g., light, food). In this paper, we introduce an algorithm for procedural reading comprehension by translating the text into a general formalism that represents processes as a sequence of transitions over entity attributes (e.g., location, temperature). Leveraging pre-trained language models,… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  25. arXiv:1909.04745  [pdf, other

    cs.CL cs.AI

    Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text

    Authors: Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark

    Abstract: Our goal is to better comprehend procedural text, e.g., a paragraph about photosynthesis, by not only predicting what happens, but why some actions need to happen before others. Our approach builds on a prior process comprehension framework for predicting actions' effects, to also identify subsequent steps that those effects enable. We present our new model (XPAD) that biases effect predictions to… ▽ More

    Submitted 18 September, 2019; v1 submitted 10 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP 2019 as a long paper. This revision fixed a typo in an author name in references

  26. arXiv:1909.04739  [pdf, other

    cs.CL cs.AI

    WIQA: A dataset for "What if..." reasoning over procedural text

    Authors: Niket Tandon, Bhavana Dalvi Mishra, Keisuke Sakaguchi, Antoine Bosselut, Peter Clark

    Abstract: We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text. WIQA contains three parts: a collection of paragraphs each describing a process, e.g., beach erosion; a set of crowdsourced influence graphs for each paragraph, describing how one change affects another; and a large (40k) collection of "What if...?" multiple-choice questions derived from the graphs. Fo… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: Accepted at EMNLP 2019

  27. arXiv:1909.01958  [pdf, other

    cs.CL cs.AI

    From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

    Authors: Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz

    Abstract: AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more… ▽ More

    Submitted 1 February, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: AI Magazine 41 (4) Winter 2020. New analysis sections added

  28. arXiv:1906.08942  [pdf, other

    cs.CL cs.LG

    Be Consistent! Improving Procedural Text Comprehension using Label Consistency

    Authors: Xinya Du, Bhavana Dalvi Mishra, Niket Tandon, Antoine Bosselut, Wen-tau Yih, Peter Clark, Claire Cardie

    Abstract: Our goal is procedural text comprehension, namely tracking how the properties of entities (e.g., their location) change with time given a procedural text (e.g., a paragraph about photosynthesis, a recipe). This task is challenging as the world is changing throughout the text, and despite recent advances, current systems still struggle with this task. Our approach is to leverage the fact that, for… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: NAACL 2019

  29. arXiv:1808.10012  [pdf, other

    cs.AI

    Reasoning about Actions and State Changes by Injecting Commonsense Knowledge

    Authors: Niket Tandon, Bhavana Dalvi Mishra, Joel Grus, Wen-tau Yih, Antoine Bosselut, Peter Clark

    Abstract: Comprehending procedural text, e.g., a paragraph describing photosynthesis, requires modeling actions and the state changes they produce, so that questions about entities at different timepoints can be answered. Although several recent systems have shown impressive progress in this task, their predictions can be globally inconsistent or highly improbable. In this paper, we show how the predicted e… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: Accepted at EMNLP 2018. Niket Tandon and Bhavana Dalvi Mishra contributed equally to this work

  30. arXiv:1805.06975  [pdf, other

    cs.CL

    Tracking State Changes in Procedural Text: A Challenge Dataset and Models for Process Paragraph Comprehension

    Authors: Bhavana Dalvi Mishra, Lifu Huang, Niket Tandon, Wen-tau Yih, Peter Clark

    Abstract: We present a new dataset and models for comprehending paragraphs about processes (e.g., photosynthesis), an important genre of text describing a dynamic world. The new dataset, ProPara, is the first to contain natural (rather than machine-generated) text about a changing world along with a full annotation of entity states (location and existence) during those changes (81k datapoints). The end-task… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.

    Comments: In Proc. NAACL'2018