Skip to main content

Showing 1–45 of 45 results for author: Pang, R Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.08516  [pdf, ps, other

    cs.CV

    MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

    Authors: Tanmay Gupta, Piper Wolters, Zixian Ma, Peter Sushko, Rock Yuren Pang, Diego Llanes, Yue Yang, Taira Anderson, Boyuan Zheng, Zhongzheng Ren, Harsh Trivedi, Taylor Blanton, Caleb Ouellette, Winson Han, Ali Farhadi, Ranjay Krishna

    Abstract: Web agents--autonomous systems that navigate and execute tasks on the web on behalf of users--have the potential to transform how people interact with the digital world. However, the most capable web agents today rely on proprietary models with undisclosed training data and recipes, limiting scientific understanding, reproducibility, and community-driven progress. We believe agents for the open… ▽ More

    Submitted 9 April, 2026; originally announced April 2026.

    Comments: https://allenai.org/blog/molmoweb

  2. arXiv:2603.13036  [pdf, ps, other

    cs.HC cs.AI cs.CY

    Interrogating Design Homogenization in Web Vibe Coding

    Authors: Donghoon Shin, Alice Gao, Rock Yuren Pang, Jaewook Lee, Katharina Reinecke, Emily Tseng

    Abstract: Generative AI is known for its tendency to homogenize, often reproducing dominant style conventions found in training data. However, it remains unclear how these homogenizing effects extend to complex structural tasks like web design. As lay creators increasingly turn to LLMs to 'vibe-code' websites -- prompting for aesthetic and functional goals rather than writing code -- they may inadvertently… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    ACM Class: H.5.2; I.2.7

  3. arXiv:2603.07446  [pdf, ps, other

    cs.HC

    GeoVisA11y: An AI-based Geovisualization Question-Answering System for Screen-Reader Users

    Authors: Chu Li, Rock Yuren Pang, Arnavi Chheda-Kothary, Ather Sharif, Henok Assalif, Jeffrey Heer, Jon E. Froehlich

    Abstract: Geovisualizations are powerful tools for communicating spatial information, but are inaccessible to screen-reader users. To address this limitation, we present GeoVisA11y, an LLM-based question-answering system that makes geovisualizations accessible through natural language interaction. The system supports map reading, analysis, interpretation and navigation by handling analytical, geospatial, vi… ▽ More

    Submitted 7 March, 2026; originally announced March 2026.

    Comments: This manuscript has been accepted at CHI'26

  4. arXiv:2512.00742  [pdf, ps, other

    cs.CY cs.AI

    On the Regulatory Potential of User Interfaces for AI Agent Governance

    Authors: K. J. Kevin Feng, Tae Soo Kim, Rock Yuren Pang, Faria Huq, Tal August, Amy X. Zhang

    Abstract: AI agents that take actions in their environment autonomously over extended time horizons require robust governance interventions to curb their potentially consequential risks. Prior proposals for governing AI agents primarily target system-level safeguards (e.g., prompt injection monitors) or agent infrastructure (e.g., agent IDs). In this work, we explore a complementary approach: regulating use… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: RegML workshop at NeurIPS 2025 (oral)

  5. arXiv:2511.10507  [pdf, ps, other

    cs.CL

    AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

    Authors: Yun He, Wenzhe Li, Hejia Zhang, Songlin Li, Karishma Mandyam, Sopan Khosla, Yuanhao Xiong, Nanshu Wang, Xiaoliang Peng, Beibin Li, Shengjie Bi, Shishir G. Patil, Qi Qi, Shengyu Feng, Julian Katz-Samuels, Richard Yuanzhe Pang, Sujan Gonugondla, Hunter Lang, Yue Yu, Yundi Qian, Maryam Fazel-Zarandi, Licheng Yu, Amine Benhalloum, Hany Awadalla, Manaal Faruqui

    Abstract: Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-prompted instructions-remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpr… ▽ More

    Submitted 26 November, 2025; v1 submitted 13 November, 2025; originally announced November 2025.

  6. arXiv:2510.01135  [pdf, ps, other

    cs.LG cs.CL

    Prompt Curriculum Learning for Efficient LLM Post-Training

    Authors: Zhaolin Gao, Joongwon Kim, Wen Sun, Thorsten Joachims, Sid Wang, Richard Yuanzhe Pang, Liang Tan

    Abstract: We introduce Prompt Curriculum Learning (PCL), a lightweight reinforcement learning (RL) algorithm that selects intermediate-difficulty prompts using a learned value model to post-train language models. Since post-training LLMs via RL remains sensitive to batching and prompt selection strategies, we first conduct a series of systematic experiments where we (1) determine the optimal training batch… ▽ More

    Submitted 1 October, 2025; originally announced October 2025.

  7. arXiv:2508.10071  [pdf, ps, other

    cs.CY cs.AI cs.HC

    Advancing Data Equity: Practitioner Responsibility and Accountability in NLP Data Practices

    Authors: Jay L. Cunningham, Kevin Zhongyang Shao, Rock Yuren Pang, Nathaniel Mengist

    Abstract: While research has focused on surfacing and auditing algorithmic bias to ensure equitable AI development, less is known about how NLP practitioners - those directly involved in dataset development, annotation, and deployment - perceive and navigate issues of NLP data equity. This study is among the first to center practitioners' perspectives, linking their experiences to a multi-scalar AI governan… ▽ More

    Submitted 16 August, 2025; v1 submitted 13 August, 2025; originally announced August 2025.

    Comments: 10 pages, 6 Pages (References and Appendices). The archival version has been accepted to AAAI (AIES 2025) without the extended Appendices. This extended version includes Appendices

  8. arXiv:2506.23678  [pdf, ps, other

    cs.HC cs.AI

    Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models

    Authors: Rock Yuren Pang, K. J. Kevin Feng, Shangbin Feng, Chu Li, Weijia Shi, Yulia Tsvetkov, Jeffrey Heer, Katharina Reinecke

    Abstract: The output quality of large language models (LLMs) can be improved via "reasoning": generating segments of chain-of-thought (CoT) content to further condition the model prior to producing user-facing output. While these chains contain valuable information, they are verbose and lack explicit organization, making them tedious to review. Moreover, they lack opportunities for user feedback, such as to… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  9. arXiv:2505.01537  [pdf, ps, other

    cs.HC

    Passing the Buck to AI: How Individuals' Decision-Making Patterns Affect Reliance on AI

    Authors: Katelyn Xiaoying Mei, Rock Yuren Pang, Alex Lyford, Lucy Lu Wang, Katharina Reinecke

    Abstract: Psychological research has identified different patterns individuals have while making decisions, such as vigilance (making decisions after thorough information gathering), hypervigilance (rushed and anxious decision-making), and buckpassing (deferring decisions to others). We examine whether these decision-making patterns shape peoples' likelihood of seeking out or relying on AI. In an online exp… ▽ More

    Submitted 22 January, 2026; v1 submitted 2 May, 2025; originally announced May 2025.

  10. arXiv:2504.07096  [pdf, ps, other

    cs.CL

    OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

    Authors: Jiacheng Liu, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen, Arnavi Chheda-Kothary, Huy Tran, Byron Bischoff, Eric Marsh, Michael Schmitz, Cassidy Trier, Aaron Sarnat, Jenna James, Jon Borchardt, Bailey Kuehl, Evie Cheng, Karen Farley, Sruthi Sreeram, Taira Anderson, David Albright, Carissa Schoenick, Luca Soldaini, Dirk Groeneveld, Rock Yuren Pang, Pang Wei Koh , et al. (6 additional authors not shown)

    Abstract: We present OLMoTrace, the first system that traces the outputs of language models back to their full, multi-trillion-token training data in real time. OLMoTrace finds and shows verbatim matches between segments of language model output and documents in the training text corpora. Powered by an extended version of infini-gram (Liu et al., 2024), our system returns tracing results within a few second… ▽ More

    Submitted 7 July, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: ACL 2025 demo track

  11. Accessibility for Whom? Perceptions of Sidewalk Barriers Across Disability Groups and Implications for Designing Personalized Maps

    Authors: Chu Li, Rock Yuren Pang, Delphine Labbé, Yochai Eisenberg, Maryam Hosseini, Jon E. Froehlich

    Abstract: Despite diverse mobility needs worldwide, existing mapping tools fail to address the varied experiences of different mobility device users. This paper presents a large-scale online survey exploring how five mobility groups -- users of canes, walkers, mobility scooters, manual wheelchairs, and motorized wheelchairs -- perceive sidewalk barriers. Using 52 sidewalk barrier images, respondents evaluat… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Manuscript accepted at CHI'25

  12. arXiv:2501.12557  [pdf, other

    cs.HC cs.AI cs.CL cs.CY

    Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review

    Authors: Rock Yuren Pang, Hope Schroeder, Kynnedy Simone Smith, Solon Barocas, Ziang Xiao, Emily Tseng, Danielle Bragg

    Abstract: Large language models (LLMs) have been positioned to revolutionize HCI, by reshaping not only the interfaces, design patterns, and sociotechnical systems that we study, but also the research practices we use. To-date, however, there has been little understanding of LLMs' uptake in HCI. We address this gap via a systematic literature review of 153 CHI papers from 2020-24 that engage with LLMs. We t… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: This is a preprint version of the paper conditionally accepted to CHI'25

  13. arXiv:2412.04703  [pdf, other

    cs.CL cs.AI cs.LG

    Transformers Struggle to Learn to Search

    Authors: Abulhair Saparov, Srushti Pawar, Shreyas Pimpalgaonkar, Nitish Joshi, Richard Yuanzhe Pang, Vishakh Padmakumar, Seyed Mehran Kazemi, Najoung Kim, He He

    Abstract: Search is an ability foundational in many important tasks, and recent studies have shown that large language models (LLMs) struggle to perform search robustly. It is unknown whether this inability is due to a lack of data, insufficient model parameters, or fundamental limitations of the transformer architecture. In this work, we use the foundational graph connectivity problem as a testbed to gener… ▽ More

    Submitted 16 March, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: Published as a conference paper at ICLR 2025

  14. arXiv:2411.16646  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Generated Critiques Boost Reward Modeling for Language Models

    Authors: Yue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui Hou

    Abstract: Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF). However, current reward models mainly produce scalar scores and struggle to incorporate critiques in a natural language format. We hypothesize that predicting both critiques and the scalar reward would improve reward modeling ability. Motivat… ▽ More

    Submitted 9 February, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: Accepted to NAACL 2025 (Main Conference)

    Journal ref: NAACL 2025

  15. arXiv:2411.04109  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Self-Consistency Preference Optimization

    Authors: Archiki Prasad, Weizhe Yuan, Richard Yuanzhe Pang, Jing Xu, Maryam Fazel-Zarandi, Mohit Bansal, Sainbayar Sukhbaatar, Jason Weston, Jane Yu

    Abstract: Self-alignment, whereby models learn to improve themselves without human annotation, is a rapidly growing research area. However, existing techniques often fail to improve complex reasoning tasks due to the difficulty of assigning correct rewards. An orthogonal approach that is known to improve correctness is self-consistency, a method applied at inference time based on multiple sampling in order… ▽ More

    Submitted 6 July, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: ICML 2025 (camera-ready)

  16. arXiv:2408.02666  [pdf, other

    cs.CL cs.AI

    Self-Taught Evaluators

    Authors: Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li

    Abstract: Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation. To train such evaluators, the standard approach is to collect a large amount of human preference judgments over model responses, which is costly and the data becomes stale as models improve. In this work, we present an approach that aims to im-prove e… ▽ More

    Submitted 8 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  17. AltGeoViz: Facilitating Accessible Geovisualization

    Authors: Chu Li, Rock Yuren Pang, Ather Sharif, Arnavi Chheda-Kothary, Jeffrey Heer, Jon E. Froehlich

    Abstract: Geovisualizations are powerful tools for exploratory spatial analysis, enabling sighted users to discern patterns, trends, and relationships within geographic data. However, these visual tools have remained largely inaccessible to screen-reader users. We present AltGeoViz, a new system we designed to facilitate geovisualization exploration for these users. AltGeoViz dynamically generates alt-text… ▽ More

    Submitted 9 December, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: IEEE VIS 2024

  18. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  19. arXiv:2405.06783  [pdf, other

    cs.HC cs.AI cs.CY

    BLIP: Facilitating the Exploration of Undesirable Consequences of Digital Technologies

    Authors: Rock Yuren Pang, Sebastin Santy, René Just, Katharina Reinecke

    Abstract: Digital technologies have positively transformed society, but they have also led to undesirable consequences not anticipated at the time of design or development. We posit that insights into past undesirable consequences can help researchers and practitioners gain awareness and anticipate potential adverse effects. To test this assumption, we introduce BLIP, a system that extracts real-world undes… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: To appear in the Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  20. arXiv:2404.19733  [pdf, other

    cs.CL cs.AI

    Iterative Reasoning Preference Optimization

    Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  21. arXiv:2401.10020  [pdf, other

    cs.CL cs.AI

    Self-Rewarding Language Models

    Authors: Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston

    Abstract: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewardi… ▽ More

    Submitted 27 March, 2025; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: ICML 2024

  22. arXiv:2311.12022  [pdf, other

    cs.AI cs.CL

    GPQA: A Graduate-Level Google-Proof Q&A Benchmark

    Authors: David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

    Abstract: We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert v… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 28 pages, 5 figures, 7 tables

  23. arXiv:2309.04456  [pdf, other

    cs.CY cs.HC

    The Case for Anticipating Undesirable Consequences of Computing Innovations Early, Often, and Across Computer Science

    Authors: Rock Yuren Pang, Dan Grossman, Tadayoshi Kohno, Katharina Reinecke

    Abstract: From smart sensors that infringe on our privacy to neural nets that portray realistic imposter deepfakes, our society increasingly bears the burden of negative, if unintended, consequences of computing innovations. As the experts in the technology we create, Computer Science (CS) researchers must do better at anticipating and addressing these undesirable consequences proactively. Our prior work sh… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: More details at NSF #2315937: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2315937&HistoricalAwards=false

  24. arXiv:2307.14117  [pdf, other

    cs.CL

    Leveraging Implicit Feedback from Deployment Data in Dialogue

    Authors: Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

    Abstract: We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployme… ▽ More

    Submitted 31 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: EACL 2024

  25. arXiv:2305.15269  [pdf, other

    cs.CL cs.AI

    Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

    Authors: Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim, He He

    Abstract: Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, an… ▽ More

    Submitted 3 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published as a conference paper at NeurIPS 2023

  26. Auditing Cross-Cultural Consistency of Human-Annotated Labels for Recommendation Systems

    Authors: Rock Yuren Pang, Jack Cenatempo, Franklyn Graham, Bridgette Kuehn, Maddy Whisenant, Portia Botchway, Katie Stone Perez, Allison Koenecke

    Abstract: Recommendation systems increasingly depend on massive human-labeled datasets; however, the human annotators hired to generate these labels increasingly come from homogeneous backgrounds. This poses an issue when downstream predictive models -- based on these labels -- are applied globally to a heterogeneous set of users. We study this disconnect with respect to the labels themselves, asking whethe… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted at FAccT 2023

  27. arXiv:2304.05687  [pdf, ps, other

    cs.HC

    Anticipating Unintended Consequences of Technology Using Insights from Creativity Support Tools

    Authors: Rock Yuren Pang, Katharina Reinecke

    Abstract: Our society has been increasingly witnessing a number of negative, unintended consequences of digital technologies. While post-hoc policy regulation is crucial in addressing these issues, reasonably anticipating the consequences before deploying technology can help mitigate potential harm to society in the first place. Yet, the quest to anticipate potential harms can be difficult without seeing di… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: In CHI '23 Workshop on Designing Technology and Policy Simultaneously: Towards A Research Agenda and New Practice, April 23, 2023

  28. "That's important, but...": How Computer Science Researchers Anticipate Unintended Consequences of Their Research Innovations

    Authors: Kimberly Do, Rock Yuren Pang, Jiachen Jiang, Katharina Reinecke

    Abstract: Computer science research has led to many breakthrough innovations but has also been scrutinized for enabling technology that has negative, unintended consequences for society. Given the increasing discussions of ethics in the news and among researchers, we interviewed 20 researchers in various CS sub-disciplines to identify whether and how they consider potential unintended consequences of their… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Corresponding author: Rock Yuren Pang, email provided below. Kimberly Do and Rock Yuren Pang contributed equally to this research. The author order is listed alphabetically. To appear in CHI Conference on Human Factors in Computing Systems (CHI '23), April 23-April 28, 2023, Hamburg, Germany. ACM, New York, NY, USA, 16 pages

  29. arXiv:2303.04562  [pdf, other

    cs.LG cs.CL q-bio.QM

    Extrapolative Controlled Sequence Generation via Iterative Refinement

    Authors: Vishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh

    Abstract: We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their att… ▽ More

    Submitted 7 June, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: ICML 2023 - Camera Ready Version

  30. arXiv:2211.08714  [pdf, other

    cs.CL cs.AI cs.LG

    Reward Gaming in Conditional Text Generation

    Authors: Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P. Parikh, He He

    Abstract: To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring sp… ▽ More

    Submitted 1 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: ACL 2023

  31. arXiv:2210.03305  [pdf, other

    cs.HC

    How Do Data Science Workers Communicate Intermediate Results?

    Authors: Rock Yuren Pang, Ruotong Wang, Joely Nelson, Leilani Battle

    Abstract: Data science workers increasingly collaborate on large-scale projects before communicating insights to a broader audience in the form of visualization. While prior work has modeled how data science teams, oftentimes with distinct roles and work processes, communicate knowledge to outside stakeholders, we have little knowledge of how data science workers communicate intermediately before delivering… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: This paper was accepted for presentation as part of the eighth Symposium on Visualization in Data Science (VDS) at ACM KDD 2022 as well as IEEE VIS 2022. http://www.visualdatascience.org/2022/index.html

  32. arXiv:2208.12852  [pdf, other

    cs.CL cs.AI

    What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

    Authors: Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman

    Abstract: We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, w… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

    Comments: 31 pages, 19 figures, 3 tables; more information at https://nlpsurvey.net

    ACM Class: I.2.7

  33. arXiv:2205.11465  [pdf, ps, other

    cs.CL cs.AI

    SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

    Authors: Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman

    Abstract: Summarization datasets are often assembled either by scraping naturally occurring public-domain summaries -- which are nearly always in difficult-to-work-with technical domains -- or by using approximate heuristics to extract them from everyday text -- which frequently yields unfaithful summaries. In this work, we turn to a slower but more straightforward approach to developing summarization bench… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  34. arXiv:2203.13240  [pdf, other

    cs.CL cs.LG

    Token Dropping for Efficient BERT Pretraining

    Authors: Le Hou, Richard Yuanzhe Pang, Tianyi Zhou, Yuexin Wu, Xinying Song, Xiaodan Song, Denny Zhou

    Abstract: Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective "token dropping" method to accelerate the pretraining of transformer models, such as BERT, without degrading its performance on downstream tasks. In short, we drop unimportant tokens starting from an intermediate layer in the model to make the model focus… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  35. arXiv:2112.08670  [pdf, other

    cs.CL cs.LG

    Amortized Noisy Channel Neural Machine Translation

    Authors: Richard Yuanzhe Pang, He He, Kyunghyun Cho

    Abstract: Noisy channel models have been especially effective in neural machine translation (NMT). However, recent approaches like "beam search and rerank" (BSR) incur significant computation overhead during inference, making real-world application infeasible. We aim to study if it is possible to build an amortized noisy channel NMT model such that when we do greedy decoding during inference, the translatio… ▽ More

    Submitted 18 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: INLG 2022

  36. arXiv:2112.08608  [pdf, other

    cs.CL

    QuALITY: Question Answering with Long Input Texts, Yes!

    Authors: Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman

    Abstract: To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than rely… ▽ More

    Submitted 11 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  37. arXiv:2106.02278  [pdf, other

    cs.CL

    AgreeSum: Agreement-Oriented Multi-Document Summarization

    Authors: Richard Yuanzhe Pang, Adam D. Lelkes, Vinh Q. Tran, Cong Yu

    Abstract: We aim to renew interest in a particular multi-document summarization (MDS) task which we call AgreeSum: agreement-oriented multi-document summarization. Given a cluster of articles, the goal is to provide abstractive summaries that represent information common and faithful to all input articles. Given the lack of existing datasets, we create a dataset for AgreeSum, and provide annotations on arti… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: Findings of ACL 2021

  38. arXiv:2106.00840  [pdf, other

    cs.CL

    Comparing Test Sets with Item Response Theory

    Authors: Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman

    Abstract: Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks. Recent results from large pretrained models, though, show that many of these datasets are largely saturated and unlikely to be able to detect further progress. What kind of datasets are still effective at discriminating among strong models, and what kind… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  39. arXiv:2009.07839  [pdf, other

    cs.CL cs.LG

    Text Generation by Learning from Demonstrations

    Authors: Richard Yuanzhe Pang, He He

    Abstract: Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation. This paradigm leads to (i) diverse but low-quality samples due to mismatched learning objective and evaluation metric (likelihood vs. quality) and (ii) exposure bias due to mismatched history distributions (gold vs. model-generated). To alleviate these problems, we frame text generation a… ▽ More

    Submitted 2 March, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: ICLR 2021

  40. arXiv:2005.00850  [pdf, other

    cs.CL cs.LG

    ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation

    Authors: Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, Kevin Gimpel

    Abstract: We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model. In particular, we view our non-autoregressive translation system as an inference network (Tu and Gimpel, 2018) trained to minimize the autoregressive teacher energy. This contrasts with the popular approach of training a non-autoregressive model on a distilled cor… ▽ More

    Submitted 12 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: ACL 2020 camera-ready version

  41. arXiv:2005.00628  [pdf, other

    cs.CL

    Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

    Authors: Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman

    Abstract: While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large… ▽ More

    Submitted 9 May, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  42. arXiv:2002.02492  [pdf, other

    cs.LG cs.CL stat.ML

    Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

    Authors: Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho

    Abstract: Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm,… ▽ More

    Submitted 2 October, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: EMNLP 2020

  43. arXiv:1911.02891  [pdf, other

    cs.CL cs.LG

    Improving Joint Training of Inference Networks and Structured Prediction Energy Networks

    Authors: Lifu Tu, Richard Yuanzhe Pang, Kevin Gimpel

    Abstract: Deep energy-based models are powerful, but pose challenges for learning and inference (Belanger and McCallum, 2016). Tu and Gimpel (2018) developed an efficient framework for energy-based models by training "inference networks" to approximate structured inference instead of using gradient descent. However, their alternating optimization approach suffers from instabilities during training, requirin… ▽ More

    Submitted 10 October, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: EMNLP 2020 Workshop on Structured Prediction for NLP (SPNLP)

  44. arXiv:1910.03747  [pdf, ps, other

    cs.CL

    The Daunting Task of Real-World Textual Style Transfer Auto-Evaluation

    Authors: Richard Yuanzhe Pang

    Abstract: The difficulty of textual style transfer lies in the lack of parallel corpora. Numerous advances have been proposed for the unsupervised generation. However, significant problems remain with the auto-evaluation of style transfer tasks. Based on the summary of Pang and Gimpel (2018) and Mir et al. (2019), style transfer evaluations rely on three criteria: style accuracy of transferred sentences, co… ▽ More

    Submitted 9 October, 2019; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: Extended abstract in EMNLP Workshop on Neural Generation and Translation (WNGT 2019)

  45. arXiv:1810.11878  [pdf, other

    cs.CL cs.AI

    Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer

    Authors: Richard Yuanzhe Pang, Kevin Gimpel

    Abstract: We consider the problem of automatically generating textual paraphrases with modified attributes or properties, focusing on the setting without parallel data (Hu et al., 2017; Shen et al., 2017). This setting poses challenges for evaluation. We show that the metric of post-transfer classification accuracy is insufficient on its own, and propose additional metrics based on semantic preservation and… ▽ More

    Submitted 30 September, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

    Comments: EMNLP 2019 Workshop on Neural Generation and Translation (WNGT)