Skip to main content

Showing 1–50 of 70 results for author: Althoff, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.08296  [pdf, ps, other

    cs.AI

    Towards a Science of Scaling Agent Systems

    Authors: Yubin Kim, Ken Gu, Chanwoo Park, Chunjong Park, Samuel Schmidgall, A. Ali Heydari, Yao Yan, Zhihan Zhang, Yuchen Zhuang, Mark Malhotra, Paul Pu Liang, Hae Won Park, Yuzhe Yang, Xuhai Xu, Yilun Du, Shwetak Patel, Tim Althoff, Daniel McDuff, Xin Liu

    Abstract: Agents, language model-based systems that are capable of reasoning, planning, and acting are becoming the dominant paradigm for real-world AI applications. Despite this widespread adoption, the principles that determine their performance remain underexplored. We address this by deriving quantitative scaling principles for agent systems. We first formalize a definition for agentic evaluation and ch… ▽ More

    Submitted 16 December, 2025; v1 submitted 9 December, 2025; originally announced December 2025.

  2. arXiv:2512.05145  [pdf, ps, other

    cs.CV

    Self-Improving VLM Judges Without Human Annotations

    Authors: Inna Wanyin Lin, Yushi Hu, Shuyue Stella Li, Scott Geng, Pang Wei Koh, Luke Zettlemoyer, Tim Althoff, Marjan Ghazvininejad

    Abstract: Effective judges of Vision-Language Models (VLMs) are crucial for model development. Current methods for training VLM judges mainly rely on large-scale human preference annotations. However, such an approach is costly, and the annotations easily become obsolete as models rapidly improve. In this work, we present a framework to self-train a VLM judge model without any human preference annotations,… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

  3. arXiv:2511.00222  [pdf, ps, other

    cs.CL cs.AI

    Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

    Authors: Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, Natasha Jaques

    Abstract: Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable training and evaluation of AI agents, off-the-shelf LLMs often drift from their assigned personas, contradict earlier statements, or abandon role-appropriate behavior. We introduce a unified framework for evalua… ▽ More

    Submitted 31 October, 2025; originally announced November 2025.

  4. arXiv:2510.24427  [pdf, ps, other

    cs.CL

    SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models

    Authors: Ken Gu, Advait Bhat, Mike A Merrill, Robert West, Xin Liu, Daniel McDuff, Tim Althoff

    Abstract: Evaluating the reasoning ability of language models (LMs) is complicated by their extensive parametric world knowledge, where benchmark performance often reflects factual recall rather than genuine reasoning. Existing datasets and approaches (e.g., temporal filtering, paraphrasing, adversarial substitution) cannot cleanly separate the two. We present SynthWorlds, a framework that disentangles task… ▽ More

    Submitted 30 October, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

  5. arXiv:2508.20148  [pdf

    cs.AI cs.HC cs.MA

    The Anatomy of a Personal Health Agent

    Authors: A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A. Metwally, Brent Winslow, Yubin Kim, Kumar Ayush, Yuzhe Yang, Girish Narayanswamy, Maxwell A. Xu, Jake Garrison, Amy Armento Lee, Jenny Vafeiadou, Ben Graef, Isaac R. Galatzer-Levy, Erik Schenck, Andrew Barakat, Javier Perez , et al. (13 additional authors not shown)

    Abstract: Health is a fundamental pillar of human wellness, and the rapid advancements in large language models (LLMs) have driven the development of a new generation of health agents. However, the application of health agents to fulfill the diverse needs of individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health agent that is able to reason… ▽ More

    Submitted 18 September, 2025; v1 submitted 27 August, 2025; originally announced August 2025.

    Comments: Minor updates to the manuscript (V2)

  6. arXiv:2508.08596  [pdf, ps, other

    cs.SI cs.HC

    How Conversational Structure and Style Shape Online Community Experiences

    Authors: Galen Weld, Carl Pearson, Bradley Spahn, Tim Althoff, Amy X. Zhang, Sanjay Kairam

    Abstract: Sense of Community (SOC) is vital to individual and collective well-being. Although social interactions have moved increasingly online, still little is known about the specific relationships between the nature of these interactions and Sense of Virtual Community (SOVC). This study addresses this gap by exploring how conversational structure and linguistic style predict SOVC in online communities,… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

    Comments: to appear at ICWSM 2026

  7. arXiv:2506.09108  [pdf, ps, other

    cs.LG cs.AI cs.CL

    SensorLM: Learning the Language of Wearable Sensors

    Authors: Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, Yuzhe Yang

    Abstract: We present SensorLM, a family of sensor-language foundation models that enable wearable sensor data understanding with natural language. Despite its pervasive nature, aligning and interpreting sensor data with language remains challenging due to the lack of paired, richly annotated sensor-text descriptions in uncurated, real-world wearable data. We introduce a hierarchical caption generation pipel… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  8. arXiv:2506.08249  [pdf, ps, other

    cs.DB cs.CL

    RADAR: Benchmarking Language Models on Imperfect Tabular Data

    Authors: Ken Gu, Zhihan Zhang, Kate Lin, Yuwei Zhang, Akshay Paruchuri, Hong Yu, Mehran Kazemi, Kumar Ayush, A. Ali Heydari, Maxwell A. Xu, Girish Narayanswamy, Yun Liu, Ming-Zher Poh, Yuzhe Yang, Mark Malhotra, Shwetak Patel, Hamid Palangi, Xuhai Xu, Daniel McDuff, Tim Althoff, Xin Liu

    Abstract: Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness -- the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies -- remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compro… ▽ More

    Submitted 30 October, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: NeurIPS 2025 Dataset and Benchmark Track

  9. arXiv:2506.07468  [pdf, ps, other

    cs.LG cs.CL cs.MA

    Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

    Authors: Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, Natasha Jaques

    Abstract: Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities. This sequential approach creates a mismatch -- attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats. To address this, we propose Self-RedTeam, an online self-play… ▽ More

    Submitted 5 October, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  10. arXiv:2506.05321  [pdf, other

    cs.LG

    LSM-2: Learning from Incomplete Wearable Sensor Data

    Authors: Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, Shun Liao, Shyam A. Tailor, Ahmed Metwally, A. Ali Heydari, Yuwei Zhang, Jake Garrison, Samy Abdel-Ghaffar, Xuhai Xu, Ken Gu, Jacob Sunshine, Ming-Zher Poh, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Mark Malhotra, Shwetak Patel, Yuzhe Yang, James M. Rehg, Xin Liu, Daniel McDuff

    Abstract: Foundation models, a cornerstone of recent advancements in machine learning, have predominantly thrived on complete and well-structured data. Wearable sensor data frequently suffers from significant missingness, posing a substantial challenge for self-supervised learning (SSL) models that typically assume complete data inputs. This paper introduces the second generation of Large Sensor Model (LSM-… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Xu and Narayanswamy are co-first authors. McDuff and Liu are co-last authors

  11. arXiv:2503.19328  [pdf, ps, other

    cs.CL cs.AI

    Substance over Style: Evaluating Proactive Conversational Coaching Agents

    Authors: Vidya Srinivas, Xuhai Xu, Xin Liu, Kumar Ayush, Isaac Galatzer-Levy, Shwetak Patel, Daniel McDuff, Tim Althoff

    Abstract: While NLP research has made strides in conversational tasks, many approaches focus on single-turn responses with well-defined objectives or evaluation criteria. In contrast, coaching presents unique challenges with initially undefined goals that evolve through multi-turn interactions, subjective evaluation criteria, mixed-initiative dialogue. In this work, we describe and implement five multi-turn… ▽ More

    Submitted 8 July, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

    Comments: Accepted to ACL 2025

  12. arXiv:2503.14190  [pdf, other

    cs.AI

    Inferring Events from Time Series using Language Models

    Authors: Mingtian Tan, Mike A. Merrill, Zack Gottesman, Tim Althoff, David Evans, Tom Hartvigsen

    Abstract: Time series data measure how environments change over time and drive decision-making in critical domains like finance and healthcare. A common goal in analyzing time series data is to understand the underlying events that cause the observed variations. We conduct the first study of whether Large Language Models (LLMs) can infer events described with natural language from time series data. We evalu… ▽ More

    Submitted 22 May, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: 21 pages, 15 Figures

    MSC Class: 62M10; 68T07; ACM Class: I.2.6; I.2.7

  13. arXiv:2502.07663  [pdf, ps, other

    cs.AI cs.CL cs.CY cs.HC

    Human Decision-making is Susceptible to AI-driven Manipulation

    Authors: Sahand Sabour, June M. Liu, Siyang Liu, Chris Z. Yao, Shiyao Cui, Xuanming Zhang, Wen Zhang, Yaru Cao, Advait Bhat, Jian Guan, Wei Wu, Rada Mihalcea, Hongning Wang, Tim Althoff, Tatia M. C. Lee, Minlie Huang

    Abstract: AI systems are increasingly intertwined with daily life, assisting users with various tasks and guiding decision-making. This integration introduces risks of AI-driven manipulation, where such systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes. Through a randomized between-subjects experiment with 233 participants, we examined human susc… ▽ More

    Submitted 1 December, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Work in progress

  14. arXiv:2501.14163  [pdf, other

    cs.SI cs.CY cs.HC

    Reddit Rules and Rulers: Quantifying the Link Between Rules and Perceptions of Governance across Thousands of Communities

    Authors: Leon Leibmann, Galen Weld, Amy X. Zhang, Tim Althoff

    Abstract: Rules are a critical component of the functioning of nearly every online community, yet it is challenging for community moderators to make data-driven decisions about what rules to set for their communities. The connection between a community's rules and how its membership feels about its governance is not well understood. In this work, we conduct the largest-to-date analysis of rules on Reddit, c… ▽ More

    Submitted 14 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: to appear at ICWSM 2025

  15. arXiv:2410.13638  [pdf, other

    cs.LG cs.AI cs.HC

    Scaling Wearable Foundation Models

    Authors: Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, Daniel McDuff

    Abstract: Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful repre… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  16. arXiv:2408.09667  [pdf, ps, other

    cs.CL

    BLADE: Benchmarking Language Model Agents for Data-Driven Science

    Authors: Ken Gu, Ruoxi Shang, Ruien Jiang, Keying Kuang, Richard-John Lin, Donghe Lyu, Yue Mao, Youran Pan, Teng Wu, Jiaqian Yu, Yikun Zhang, Tianmai M. Zhang, Lanyi Zhu, Mike A. Merrill, Jeffrey Heer, Tim Althoff

    Abstract: Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-dri… ▽ More

    Submitted 10 November, 2025; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: EMNLP 2024

  17. arXiv:2406.16964  [pdf, other

    cs.LG cs.AI

    Are Language Models Actually Useful for Time Series Forecasting?

    Authors: Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, Thomas Hartvigsen

    Abstract: Large language models (LLMs) are being applied to time series forecasting. But are language models actually useful for time series? In a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade forecasting performance -- in most cases, the results even impr… ▽ More

    Submitted 25 October, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted to NeurIPS 2024 (Spotlight)

  18. arXiv:2406.12830  [pdf, other

    cs.CL

    What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

    Authors: Akshay Paruchuri, Jake Garrison, Shun Liao, John Hernandez, Jacob Sunshine, Tim Althoff, Xin Liu, Daniel McDuff

    Abstract: Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability distributions. In this paper, we focus on evaluating the probabilistic reasoning capabilities of LMs using idealized and real-world statistical distributions. We perform a… ▽ More

    Submitted 30 September, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: EMNLP 2024 (Main), 21 pages, 9 figures, 2 tables

  19. arXiv:2406.06474  [pdf, other

    cs.AI cs.CL

    Towards a Personal Health Large Language Model

    Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

    Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 72 pages

  20. arXiv:2406.06464  [pdf, ps, other

    cs.AI cs.CL

    Transforming Wearable Data into Personal Health Insights using Large Language Model Agents

    Authors: Mike A. Merrill, Akshay Paruchuri, Naghmeh Rezaei, Geza Kovacs, Javier Perez, Yun Liu, Erik Schenck, Nova Hammerquist, Jake Sunshine, Shyam Tailor, Kumar Ayush, Hao-Wei Su, Qian He, Cory Y. McLean, Mark Malhotra, Shwetak Patel, Jiening Zhan, Tim Althoff, Daniel McDuff, Xin Liu

    Abstract: Deriving personalized insights from popular wearable trackers requires complex numerical reasoning that challenges standard LLMs, necessitating tool-based approaches like code generation. Large language model (LLM) agents present a promising yet largely untapped solution for this analysis at scale. We introduce the Personal Health Insights Agent (PHIA), a system leveraging multistep reasoning with… ▽ More

    Submitted 8 September, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 53 pages, 7 main figures, 2 main tables, accepted to Nature Communications

  21. arXiv:2406.04557  [pdf, other

    cs.CY

    Countrywide natural experiment reveals impact of built environment on physical activity

    Authors: Tim Althoff, Boris Ivanovic, Jennifer L. Hicks, Scott L. Delp, Abby C. King, Jure Leskovec

    Abstract: While physical activity is critical to human health, most people do not meet recommended guidelines. More walkable built environments have the potential to increase activity across the population. However, previous studies on the built environment and physical activity have led to mixed findings, possibly due to methodological limitations such as small cohorts, few or single locations, over-relian… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  22. arXiv:2404.11757  [pdf, other

    cs.CL

    Language Models Still Struggle to Zero-shot Reason about Time Series

    Authors: Mike A. Merrill, Mingtian Tan, Vinayak Gupta, Tom Hartvigsen, Tim Althoff

    Abstract: Time series are critical for decision-making in fields like finance and healthcare. Their importance has driven a recent influx of works passing time series into language models, leading to non-trivial forecasting on some datasets. But it remains unknown whether non-trivial forecasting implies that language models can reason about time series. To address this gap, we generate a first-of-its-kind e… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  23. arXiv:2403.11169  [pdf, other

    cs.CL cs.AI

    Correcting misinformation on social media with a large language model

    Authors: Xinyi Zhou, Ashish Sharma, Amy X. Zhang, Tim Althoff

    Abstract: Real-world misinformation, often multimodal, can be partially or fully factual but misleading using diverse tactics like conflating correlation with causation. Such misinformation is severely understudied, challenging to address, and harms various social domains, particularly on social media, where it can spread rapidly. High-quality and timely correction of misinformation that identifies and expl… ▽ More

    Submitted 3 September, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: 50 pages

  24. arXiv:2403.09810  [pdf, other

    cs.HC cs.AI cs.LG

    LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems

    Authors: Chu Li, Zhihan Zhang, Michael Saugstad, Esteban Safranchik, Minchu Kulkarni, Xiaoyu Huang, Shwetak Patel, Vikram Iyer, Tim Althoff, Jon E. Froehlich

    Abstract: Crowdsourcing platforms have transformed distributed problem-solving, yet quality control remains a persistent challenge. Traditional quality control measures, such as prescreening workers and refining instructions, often focus solely on optimizing economic output. This paper explores just-in-time AI interventions to enhance both labeling quality and domain-specific knowledge among crowdworkers. W… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  25. arXiv:2402.12556  [pdf, other

    cs.HC cs.CL

    IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction

    Authors: Inna Wanyin Lin, Ashish Sharma, Christopher Michael Rytting, Adam S. Miner, Jina Suh, Tim Althoff

    Abstract: Navigating certain communication situations can be challenging due to individuals' lack of skills and the interference of strong emotions. However, effective learning opportunities are rarely accessible. In this work, we conduct a human-centered study that uses language models to simulate bespoke communication training and provide just-in-time feedback to support the practice and learning of inter… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  26. arXiv:2402.05070  [pdf, other

    cs.AI cs.CL cs.IR

    A Roadmap to Pluralistic Alignment

    Authors: Taylor Sorensen, Jared Moore, Jillian Fisher, Mitchell Gordon, Niloofar Mireshghallah, Christopher Michael Rytting, Andre Ye, Liwei Jiang, Ximing Lu, Nouha Dziri, Tim Althoff, Yejin Choi

    Abstract: With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formaliz… ▽ More

    Submitted 20 August, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  27. arXiv:2401.16610  [pdf, other

    cs.SI cs.CY cs.HC

    Perceptions of Moderators as a Large-Scale Measure of Online Community Governance

    Authors: Galen Weld, Leon Leibmann, Amy X. Zhang, Tim Althoff

    Abstract: Millions of online communities are governed by volunteer moderators, who shape their communities by setting and enforcing rules, recruiting additional moderators, and participating in the community themselves. These moderators must regularly make decisions about how to govern, yet measuring the 'success' of governance is complex and nuanced, making it challenging to determine what governance strat… ▽ More

    Submitted 23 January, 2025; v1 submitted 29 January, 2024; originally announced January 2024.

  28. arXiv:2401.00820  [pdf, other

    cs.CL cs.HC

    A Computational Framework for Behavioral Assessment of LLM Therapists

    Authors: Yu Ying Chiu, Ashish Sharma, Inna Wanyin Lin, Tim Althoff

    Abstract: The emergence of large language models (LLMs) like ChatGPT has increased interest in their use as therapists to address mental health challenges and the widespread lack of access to care. However, experts have emphasized the critical need for systematic evaluation of LLM-based mental health interventions to accurately assess their capabilities and limitations. Here, we propose BOLT, a proof-of-con… ▽ More

    Submitted 28 November, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

  29. arXiv:2310.15461  [pdf, other

    cs.HC cs.CL

    Facilitating Self-Guided Mental Health Interventions Through Human-Language Model Interaction: A Case Study of Cognitive Restructuring

    Authors: Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, Theresa Nguyen, Tim Althoff

    Abstract: Self-guided mental health interventions, such as "do-it-yourself" tools to learn and practice coping strategies, show great promise to improve access to mental health care. However, these interventions are often cognitively demanding and emotionally triggering, creating accessibility barriers that limit their wide-scale implementation and adoption. In this paper, we study how human-language model… ▽ More

    Submitted 10 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: CHI 2024 Camera Ready

  30. arXiv:2309.10947  [pdf, other

    cs.HC

    How Do Analysts Understand and Verify AI-Assisted Data Analyses?

    Authors: Ken Gu, Ruoxi Shang, Tim Althoff, Chenglong Wang, Steven M. Drucker

    Abstract: Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to inc… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted to CHI 2024

  31. arXiv:2309.10108  [pdf, other

    cs.HC

    How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study

    Authors: Ken Gu, Madeleine Grunde-McLaughlin, Andrew M. McNutt, Jeffrey Heer, Tim Althoff

    Abstract: Data analysis is challenging as analysts must navigate nuanced decisions that may yield divergent conclusions. AI assistants have the potential to support analysts in planning their analyses, enabling more robust decision making. Though AI-based assistants that target code execution (e.g., Github Copilot) have received significant attention, limited research addresses assistance for both analysis… ▽ More

    Submitted 4 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to CHI 2024

  32. arXiv:2305.08323  [pdf, other

    cs.HC

    Approximation and Progressive Display of Multiverse Analyses

    Authors: Yang Liu, Tim Althoff, Jeffrey Heer

    Abstract: A multiverse analysis evaluates all combinations of "reasonable" analytic decisions to promote robustness and transparency, but can lead to a combinatorial explosion of analyses to compute. Long delays before assessing results prevent users from diagnosing errors and iterating early. We contribute (1) approximation algorithms for estimating multiverse sensitivity and (2) monitoring visualizations… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

  33. arXiv:2305.02466  [pdf, other

    cs.CL cs.HC cs.SI

    Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction

    Authors: Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, David Wadden, Khendra G. Lucas, Adam S. Miner, Theresa Nguyen, Tim Althoff

    Abstract: A proven therapeutic technique to overcome negative thoughts is to replace them with a more hopeful "reframed thought." Although therapy can help people practice and learn this Cognitive Reframing of Negative Thoughts, clinician shortages and mental health stigma commonly limit people's access to therapy. In this paper, we conduct a human-centered study of how language models may assist people in… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at ACL 2023

  34. arXiv:2303.14177  [pdf, other

    cs.CL cs.AI

    Scaling Expert Language Models with Unsupervised Domain Discovery

    Authors: Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

    Abstract: Large language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We introduce a simple but effective method to asynchronously train large, sparse language models on arbitrary text corpora. Our method clusters a corpus into sets of related documents, trains a separate expert… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  35. arXiv:2211.02733  [pdf, other

    cs.LG cs.AI cs.HC

    GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization

    Authors: Xuhai Xu, Han Zhang, Yasaman Sefidgar, Yiyi Ren, Xin Liu, Woosuk Seo, Jennifer Brown, Kevin Kuehn, Mike Merrill, Paula Nurius, Shwetak Patel, Tim Althoff, Margaret E. Morris, Eve Riskin, Jennifer Mankoff, Anind K. Dey

    Abstract: Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring th… ▽ More

    Submitted 4 March, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

    Comments: Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

    MSC Class: 68T09 ACM Class: I.2.1; E.m

  36. arXiv:2210.15144  [pdf, other

    cs.CL cs.CY

    Gendered Mental Health Stigma in Masked Language Models

    Authors: Inna Wanyin Lin, Lucille Njoo, Anjalie Field, Ashish Sharma, Katharina Reinecke, Tim Althoff, Yulia Tsvetkov

    Abstract: Mental health stigma prevents many individuals from receiving the appropriate care, and social psychology studies have shown that mental health tends to be overlooked in men. In this work, we investigate gendered mental health stigma in masked language models. In doing so, we operationalize mental health stigma by developing a framework grounded in psychology research: we use clinical psychology l… ▽ More

    Submitted 11 April, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  37. arXiv:2210.03804  [pdf, other

    cs.HC cs.SE

    Understanding and Supporting Debugging Workflows in Multiverse Analysis

    Authors: Ken Gu, Eunice Jun, Tim Althoff

    Abstract: Multiverse analysis, a paradigm for statistical analysis that considers all combinations of reasonable analysis choices in parallel, promises to improve transparency and reproducibility. Although recent tools help analysts specify multiverse analyses, they remain difficult to use in practice. In this work, we identify debugging as a key barrier due to the latency from running analyses to detecting… ▽ More

    Submitted 4 June, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: CHI 2023

    Journal ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23-28, 2023, Hamburg, Germany. ACM, New York, NY, USA

  38. arXiv:2208.03306  [pdf, other

    cs.CL

    Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

    Authors: Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

    Abstract: We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is possible to independently train subparts of a new class of LLMs on different subsets of the data, eliminating the massive multi-node synchronization currently required to train LLMs. BTM learns a set of independent expert LMs (ELMs), each spec… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

  39. arXiv:2205.13607  [pdf, other

    cs.LG cs.HC

    Self-supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets

    Authors: Mike A. Merrill, Tim Althoff

    Abstract: Detailed mobile sensing data from phones, watches, and fitness trackers offer an unparalleled opportunity to quantify and act upon previously unmeasurable behavioral changes in order to improve individual health and accelerate responses to emerging diseases. Unlike in natural language processing and computer vision, deep representation learning has yet to broadly impact this domain, in which the v… ▽ More

    Submitted 2 June, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  40. arXiv:2203.15144  [pdf, other

    cs.CL cs.HC cs.SI

    Human-AI Collaboration Enables More Empathic Conversations in Text-based Peer-to-Peer Mental Health Support

    Authors: Ashish Sharma, Inna W. Lin, Adam S. Miner, David C. Atkins, Tim Althoff

    Abstract: Advances in artificial intelligence (AI) are enabling systems that augment and collaborate with humans to perform simple, mechanistic tasks like scheduling meetings and grammar-checking text. However, such Human-AI collaboration poses challenges for more complex, creative tasks, such as carrying out empathic conversations, due to difficulties of AI systems in understanding complex human emotions a… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  41. arXiv:2111.05835  [pdf, other

    cs.SI cs.CY cs.HC

    What Makes Online Communities 'Better'? Measuring Values, Consensus, and Conflict across Thousands of Subreddits

    Authors: Galen Weld, Amy X. Zhang, Tim Althoff

    Abstract: Making online social communities 'better' is a challenging undertaking, as online communities are extraordinarily varied in their size, topical focus, and governance. As such, what is valued by one community may not be valued by another. However, community values are challenging to measure as they are rarely explicitly stated. In this work, we measure community values through the first large-scale… ▽ More

    Submitted 9 May, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

    Comments: 12 pages, 8 figures, 4 tables; to appear at ICWSM 2022

  42. arXiv:2109.05152  [pdf, other

    cs.HC cs.CY cs.SI

    Making Online Communities 'Better': A Taxonomy of Community Values on Reddit

    Authors: Galen Weld, Amy X. Zhang, Tim Althoff

    Abstract: Many researchers studying online communities seek to make them better. However, beyond a small set of widely-held values, such as combating misinformation and abuse, determining what 'better' means can be challenging, as community members may disagree, values may be in conflict, and different communities may have differing preferences as a whole. In this work, we present the first study that elici… ▽ More

    Submitted 20 September, 2023; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: to appear at ICWSM 2024

  43. arXiv:2107.06097  [pdf, other

    cs.LG cs.HC

    Transformer-Based Behavioral Representation Learning Enables Transfer Learning for Mobile Sensing in Small Datasets

    Authors: Mike A. Merrill, Tim Althoff

    Abstract: While deep learning has revolutionized research and applications in NLP and computer vision, this has not yet been the case for behavioral modeling and behavioral health applications. This is because the domain's datasets are smaller, have heterogeneous datatypes, and typically exhibit a large degree of missingness. Therefore, off-the-shelf deep learning models require significant, often prohibiti… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  44. arXiv:2104.13490  [pdf, other

    cs.CY

    Leveraging Community and Author Context to Explain the Performance and Bias of Text-Based Deception Detection Models

    Authors: Galen Weld, Ellyn Ayton, Tim Althoff, Maria Glenski

    Abstract: Deceptive news posts shared in online communities can be detected with NLP models, and much recent research has focused on the development of such models. In this work, we use characteristics of online communities and authors -- the context of how and where content is posted -- to explain the performance of a neural network deception detection model and identify sub-populations who are disproporti… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

  45. arXiv:2102.12523  [pdf, other

    cs.HC cs.CY q-bio.NC

    Online Mobile App Usage as an Indicator of Sleep Behavior and Job Performance

    Authors: Chunjong Park, Morelle Arian, Xin Liu, Leon Sasson, Jeffrey Kahn, Shwetak Patel, Alex Mariakakis, Tim Althoff

    Abstract: Sleep is critical to human function, mediating factors like memory, mood, energy, and alertness; therefore, it is commonly conjectured that a good night's sleep is important for job performance. However, both real-world sleep behavior and job performance are hard to measure at scale. In this work, we show that people's everyday interactions with online mobile apps can reveal insights into their jo… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  46. arXiv:2102.08537  [pdf, other

    cs.CY

    Political Bias and Factualness in News Sharing across more than 100,000 Online Communities

    Authors: Galen Weld, Maria Glenski, Tim Althoff

    Abstract: As civil discourse increasingly takes place online, misinformation and the polarization of news shared in online communities have become ever more relevant concerns with real world harms across our society. Studying online news sharing at scale is challenging due to the massive volume of content which is shared by millions of users across thousands of communities. Therefore, existing research has… ▽ More

    Submitted 9 May, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 12 pages, 7 figures. Published at ICWSM 2021

  47. arXiv:2101.07714  [pdf, other

    cs.CL cs.SI

    Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach

    Authors: Ashish Sharma, Inna W. Lin, Adam S. Miner, David C. Atkins, Tim Althoff

    Abstract: Online peer-to-peer support platforms enable conversations between millions of people who seek and provide mental health support. If successful, web-based mental health conversations could improve access to treatment and reduce the global disease burden. Psychologists have repeatedly demonstrated that empathy, the ability to understand and feel the emotions and experiences of others, is a key comp… ▽ More

    Submitted 16 May, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

    Comments: Published at WWW 2021

  48. arXiv:2009.09961  [pdf, other

    cs.CL

    Adjusting for Confounders with Text: Challenges and an Empirical Evaluation Framework for Causal Inference

    Authors: Galen Weld, Peter West, Maria Glenski, David Arbour, Ryan Rossi, Tim Althoff

    Abstract: Causal inference studies using textual social media data can provide actionable insights on human behavior. Making accurate causal inferences with text requires controlling for confounding which could otherwise impart bias. Recently, many different methods for adjusting for confounders have been proposed, and we show that these existing methods disagree with one another on two datasets inspired by… ▽ More

    Submitted 6 May, 2022; v1 submitted 21 September, 2020; originally announced September 2020.

    Comments: to appear at ICWSM 2022

  49. arXiv:2009.08441  [pdf, other

    cs.CL cs.SI

    A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support

    Authors: Ashish Sharma, Adam S. Miner, David C. Atkins, Tim Althoff

    Abstract: Empathy is critical to successful mental health support. Empathy measurement has predominantly occurred in synchronous, face-to-face settings, and may not translate to asynchronous, text-based contexts. Because millions of people use text-based platforms for mental health support, understanding empathy in these contexts is crucial. In this work, we present a computational approach to understanding… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted for publication at EMNLP 2020

  50. arXiv:2008.12828  [pdf, other

    cs.LG cs.DL stat.ML

    CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers for Analyzing Data Analysis

    Authors: Ge Zhang, Mike A. Merrill, Yang Liu, Jeffrey Heer, Tim Althoff

    Abstract: Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process, identifying analytical best practices, and providing insights to the builders of scientific toolkits. However, large corpora have remained unanalyzed in depth, as descriptive labels are absent and require expert domain knowledge to generate. We propose… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.