Skip to main content

Showing 1–50 of 146 results for author: Mihalcea, R

.
  1. arXiv:2603.21744  [pdf, ps, other

    cond-mat.mes-hall

    A closed-loop platform for the design and nanoscale imaging of GHz acoustic metamaterials

    Authors: Federico Maccagno, Jasleen Kaur, Benjamin H. November, Layan Ansari, Daria-Teodora Harabor, Rares-Georgian Mihalcea, Harris Pirie, Jennifer E. Hoffman

    Abstract: Band structure engineering in surface acoustic wave (SAW) metamaterials could advance both classical telecommunications and quantum information processing. However, no imaging technique has demonstrated the necessary capability to resolve sub-$μ$m traveling SAWs across wide GHz bandwidths. Existing methods capture only fragments of the dispersion at discrete frequencies, preventing systematic char… ▽ More

    Submitted 23 March, 2026; originally announced March 2026.

  2. arXiv:2603.04217  [pdf, ps, other

    cs.CL

    When Do Language Models Endorse Limitations on Human Rights Principles?

    Authors: Keenan Samway, Nicole Miu Takagi, Rada Mihalcea, Bernhard Schölkopf, Ilias Chalkidis, Daniel Hershcovich, Zhijing Jin

    Abstract: As Large Language Models (LLMs) increasingly mediate global information access with the potential to shape public discourse, their alignment with universal human rights principles becomes important to ensure that these rights are abided by in high stakes AI-mediated interactions. In this paper, we evaluate how LLMs navigate trade-offs involving the Universal Declaration of Human Rights (UDHR), lev… ▽ More

    Submitted 4 March, 2026; originally announced March 2026.

    Comments: EACL Findings 2026

  3. arXiv:2603.03585  [pdf, ps, other

    cs.CL cs.AI

    Belief-Sim: Towards Belief-Driven Simulation of Demographic Misinformation Susceptibility

    Authors: Angana Borah, Zohaib Khan, Rada Mihalcea, Verónica Pérez-Rosas

    Abstract: Misinformation is a growing societal threat, and susceptibility to misinformative claims varies across demographic groups due to differences in underlying beliefs. As Large Language Models (LLMs) are increasingly used to simulate human behaviors, we investigate whether they can simulate demographic misinformation susceptibility, treating beliefs as a primary driving factor. We introduce BeliefSim,… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

    Comments: Paper Under Review

  4. arXiv:2602.17433  [pdf, ps, other

    cs.CY

    Preserving Historical Truth: Detecting Historical Revisionism in Large Language Models

    Authors: Francesco Ortu, Joeun Yook, Punya Syon Pandey, Keenan Samway, Bernhard Schölkopf, Alberto Cazzaniga, Rada Mihalcea, Zhijing Jin

    Abstract: Large language models (LLMs) are increasingly used as sources of historical information, motivating the need for scalable audits on contested events and politically charged narratives in settings that mirror real user interactions. We introduce \texttt{HistoricalMisinfo, a curated dataset of $500$ contested events from $45$ countries, each paired with a factual reference narrative and a documented… ▽ More

    Submitted 22 February, 2026; v1 submitted 19 February, 2026; originally announced February 2026.

    Comments: Accepted at IASEAI 2026, non-archival

  5. arXiv:2602.05252  [pdf, ps, other

    cs.CL

    Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks

    Authors: Guangwei Zhang, Jianing Zhu, Cheng Qian, Neil Gong, Rada Mihalcea, Zhaozhuo Xu, Jingrui He, Jiaqi Ma, Yun Huang, Chaowei Xiao, Bo Li, Ahmed Abbasi, Dongwon Lee, Heng Ji, Denghui Zhang

    Abstract: We present Copyright Detective, the first interactive forensic system for detecting, analyzing, and visualizing potential copyright risks in LLM outputs. The system treats copyright infringement versus compliance as an evidence discovery process rather than a static classification task due to the complex nature of copyright law. It integrates multiple detection paradigms, including content recall… ▽ More

    Submitted 10 February, 2026; v1 submitted 4 February, 2026; originally announced February 2026.

  6. arXiv:2512.03173  [pdf, ps, other

    cs.CY cs.AI cs.CL cs.CV

    Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping

    Authors: Joan Nwatu, Longju Bai, Oana Ignat, Rada Mihalcea

    Abstract: Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Western contexts. This imbalance reduces model generalizability and perpetuates performance disparities, especially impacting lower-income and non-Western communities. To address these disparities, we propose a nove… ▽ More

    Submitted 2 December, 2025; originally announced December 2025.

    ACM Class: K.4; I.2.7

    Journal ref: AAAI 2026 Social Impact Track

  7. arXiv:2511.23174  [pdf, ps, other

    cs.CL

    Are LLMs Good Safety Agents or a Propaganda Engine?

    Authors: Neemesh Yadav, Francesco Ortu, Jiarui Liu, Joeun Yook, Bernhard Schölkopf, Rada Mihalcea, Alberto Cazzaniga, Zhijing Jin

    Abstract: Large Language Models (LLMs) are trained to refuse to respond to harmful content. However, systematic analyses of whether this behavior is truly a reflection of its safety policies or an indication of political censorship, that is practiced globally by countries, is lacking. Differentiating between safety influenced refusals or politically motivated censorship is hard and unclear. For this purpose… ▽ More

    Submitted 28 November, 2025; originally announced November 2025.

    Comments: 15 pages, 7 tables, 4 figures

  8. arXiv:2510.12943  [pdf, ps, other

    cs.CL

    The Curious Case of Curiosity across Human Cultures and LLMs

    Authors: Angana Borah, Zhijing Jin, Rada Mihalcea

    Abstract: Recent advances in Large Language Models (LLMs) have expanded their role in human interaction, yet curiosity -- a central driver of inquiry -- remains underexplored in these systems, particularly across cultural contexts. In this work, we investigate cultural variation in curiosity using Yahoo! Answers, a real-world multi-country dataset spanning diverse topics. We introduce CUEST (CUriosity Evalu… ▽ More

    Submitted 20 October, 2025; v1 submitted 14 October, 2025; originally announced October 2025.

    Comments: Preprint (Paper under review)

  9. arXiv:2510.04891  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

    Authors: Punya Syon Pandey, Hai Son Le, Devansh Bhardwaj, Rada Mihalcea, Zhijing Jin

    Abstract: Large language models (LLMs) are increasingly deployed in contexts where their failures can have direct sociopolitical consequences. Yet, existing safety benchmarks rarely test vulnerabilities in domains such as political manipulation, propaganda and disinformation generation, or surveillance and information control. We introduce SocialHarmBench, a dataset of 585 prompts spanning 7 sociopolitical… ▽ More

    Submitted 22 February, 2026; v1 submitted 6 October, 2025; originally announced October 2025.

    Comments: ICLR 2026

  10. arXiv:2509.19358  [pdf, ps, other

    cs.CL cs.AI

    Benchmarking and Improving LLM Robustness for Personalized Generation

    Authors: Chimaobi Okite, Naihao Deng, Kiran Bodipati, Huaidian Hou, Joyce Chai, Rada Mihalcea

    Abstract: Recent years have witnessed a growing interest in personalizing the responses of large language models (LLMs). While existing evaluations primarily focus on whether a response aligns with a user's preferences, we argue that factuality is an equally important yet often overlooked dimension. In the context of personalization, we define a model as robust if its responses are both factually accurate a… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: First draft. First camera-ready version

  11. arXiv:2508.14344  [pdf, ps, other

    cs.CL

    ISCA: A Framework for Interview-Style Conversational Agents

    Authors: Charles Welch, Allison Lahnala, Vasudha Varadarajan, Lucie Flek, Rada Mihalcea, J. Lomax Boyd, João Sedoc

    Abstract: We present a low-compute non-generative system for implementing interview-style conversational agents which can be used to facilitate qualitative data collection through controlled interactions and quantitative analysis. Use cases include applications to tracking attitude formation or behavior change, where control or standardization over the conversational flow is desired. We show how our system… ▽ More

    Submitted 19 August, 2025; originally announced August 2025.

  12. arXiv:2508.10972  [pdf, ps, other

    cs.CV cs.AI cs.HC

    Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision

    Authors: Rosiana Natalie, Wenqian Xu, Ruei-Che Chang, Rada Mihalcea, Anhong Guo

    Abstract: Advances in vision language models (VLMs) have enabled the simulation of general human behavior through their reasoning and problem solving capabilities. However, prior research has not investigated such simulation capabilities in the accessibility domain. In this paper, we evaluate the extent to which VLMs can simulate the vision perception of low vision individuals when interpreting images. We f… ▽ More

    Submitted 14 August, 2025; originally announced August 2025.

  13. arXiv:2507.13490  [pdf, ps, other

    cs.CL

    Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?

    Authors: Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea

    Abstract: There has been extensive research on assessing the value orientation of Large Language Models (LLMs) as it can shape user experiences across demographic groups. However, several challenges remain. First, while the Multiple Choice Question (MCQ) setting has been shown to be vulnerable to perturbations, there is no systematic comparison of probing methods for value probing. Second, it is unclear to… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

  14. arXiv:2507.04415  [pdf, ps, other

    cs.CL

    MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind

    Authors: Emilio Villa-Cueva, S M Masrur Ahmed, Rendi Chevi, Jan Christian Blaise Cruz, Kareem Elzeky, Fermin Cristobal, Alham Fikri Aji, Skyler Wang, Rada Mihalcea, Thamar Solorio

    Abstract: Understanding Theory of Mind is essential for building socially intelligent multimodal agents capable of perceiving and interpreting human behavior. We introduce MoMentS (Multimodal Mental States), a comprehensive benchmark designed to assess the ToM capabilities of multimodal large language models (LLMs) through realistic, narrative-rich scenarios presented in short films. MoMentS includes over 2… ▽ More

    Submitted 21 September, 2025; v1 submitted 6 July, 2025; originally announced July 2025.

  15. arXiv:2507.04026  [pdf, ps, other

    cs.CL

    Patient-Centered RAG for Oncology Visit Aid Following the Ottawa Decision Guide

    Authors: Siyang Liu, Lawrence Chin-I An, Rada Mihalcea

    Abstract: Effective communication is essential in cancer care, yet patients often face challenges in preparing for complex medical visits. We present an interactive, Retrieval-augmented Generation-assisted system that helps patients progress from uninformed to visit-ready. Our system adapts the Ottawa Personal Decision Guide into a dynamic retrieval-augmented generation workflow, helping users bridge knowle… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  16. arXiv:2506.14680  [pdf

    cs.CY

    Which Humans? Inclusivity and Representation in Human-Centered AI

    Authors: Rada Mihalcea, Nazanin Andalibi, David Jensen, Matthew Turk, Pamela Wisniewski, Holly Yanco

    Abstract: As AI systems continue to spread and become integrated into many aspects of society, the concept of "human-centered AI" has gained increasing prominence, raising the critical question of which humans are the AI systems to be centered around.

    Submitted 17 June, 2025; originally announced June 2025.

  17. arXiv:2506.14679  [pdf

    cs.CY

    Now More Than Ever, Foundational AI Research and Infrastructure Depends on the Federal Government

    Authors: Michela Taufer, Rada Mihalcea, Matthew Turk, Dan Lopresti, Adam Wierman, Kevin Butler, Sven Koenig, David Danks, William Gropp, Manish Parashar, Yolanda Gil, Bill Regli, Rajmohan Rajaraman, David Jensen, Nadya Bliss, Mary Lou Maher

    Abstract: Leadership in the field of AI is vital for our nation's economy and security. Maintaining this leadership requires investments by the federal government. The federal investment in foundation AI research is essential for U.S. leadership in the field. Providing accessible AI infrastructure will benefit everyone. Now is the time to increase the federal support, which will be complementary to, and hel… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  18. arXiv:2506.12936  [pdf, ps, other

    cs.CL

    CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation

    Authors: Naihao Deng, Kapotaksha Das, Rada Mihalcea, Vitaliy Popov, Mohamed Abouelenien

    Abstract: In clinical operations, teamwork can be the crucial factor that determines the final outcome. Prior studies have shown that sufficient collaboration is the key factor that determines the outcome of an operation. To understand how the team practices teamwork during the operation, we collected CliniDial from simulations of medical operations. CliniDial includes the audio data and its transcriptions,… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 Findings

  19. arXiv:2506.12758  [pdf, ps, other

    cs.CL

    Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

    Authors: David Guzman Piedrahita, Irene Strauss, Bernhard Schölkopf, Rada Mihalcea, Zhijing Jin

    Abstract: As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism s… ▽ More

    Submitted 6 December, 2025; v1 submitted 15 June, 2025; originally announced June 2025.

  20. arXiv:2505.22981  [pdf, ps, other

    cs.HC

    Free Lunch for User Experience: Crowdsourcing Agents for Scalable User Studies

    Authors: Siyang Liu, Sahand Sabour, Xiaoyang Wang, Rada Mihalcea

    Abstract: User studies are central to user experience research, yet recruiting participant is expensive, slow, and limited in diversity. Recent work has explored using Large Language Models as simulated users, but doubts about fidelity have hindered practical adoption. We deepen this line of research by asking whether scale itself can enable useful simulation, even if not perfectly accurate. We introduce Cr… ▽ More

    Submitted 16 October, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

  21. arXiv:2505.22327  [pdf, ps, other

    cs.CL cs.CY

    NLP for Social Good: A Survey and Outlook of Challenges, Opportunities, and Responsible Deployment

    Authors: Antonia Karamolegkou, Angana Borah, Eunjung Cho, Sagnik Ray Choudhury, Martina Galletti, Pranav Gupta, Oana Ignat, Priyanka Kargupta, Neema Kotonya, Hemank Lamba, Sun-Joo Lee, Arushi Mangla, Ishani Mondal, Fatima Zahra Moudakir, Deniz Nazarova, Poli Nemkova, Dina Pisarevskaya, Naquee Rizwan, Nazanin Sabri, Keenan Samway, Dominik Stammbach, Anna Steinberg, David Tomás, Steven R Wilson, Bowen Yi , et al. (8 additional authors not shown)

    Abstract: Natural language processing (NLP) now shapes many aspects of our world, yet its potential for positive social impact is underexplored. This paper surveys work in ``NLP for Social Good" (NLP4SG) across nine domains relevant to global development and risk agendas, summarizing principal tasks and challenges. We analyze ACL Anthology trends, finding that inclusion and AI harms attract the most researc… ▽ More

    Submitted 21 January, 2026; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted to EACL 2026

  22. arXiv:2505.21479  [pdf, ps, other

    cs.CL

    Are Language Models Consequentialist or Deontological Moral Reasoners?

    Authors: Keenan Samway, Max Kleiman-Weiner, David Guzman Piedrahita, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin

    Abstract: As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Fu… ▽ More

    Submitted 12 October, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: EMNLP 2025

  23. arXiv:2505.19212  [pdf, other

    cs.CL cs.AI cs.CY

    When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas

    Authors: Steffen Backmann, David Guzman Piedrahita, Emanuel Tewolde, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin

    Abstract: Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents, making ethical alignment a key AI safety concern. While prior work has examined both LLMs' moral judgment and strategic behavior in social dilemmas, there is limited understanding of how they act when moral imperatives directly conflict with reward… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  24. arXiv:2504.16778  [pdf

    cs.CL cs.AI cs.CY

    Evaluation Framework for AI Systems in "the Wild"

    Authors: Sarah Jabbour, Trenton Chang, Anindya Das Antar, Joseph Peper, Insu Jang, Jiachen Liu, Jae-Won Chung, Shiqi He, Michael Wellman, Bryan Goodman, Elizabeth Bondi-Kelly, Kevin Samy, Rada Mihalcea, Mosharaf Chowdhury, David Jurgens, Lu Wang

    Abstract: Generative AI (GenAI) models have become vital across industries, yet current evaluation methods have not adapted to their widespread use. Traditional evaluations often rely on benchmarks and fixed datasets, frequently failing to reflect real-world performance, which creates a gap between lab-tested outcomes and practical applications. This white paper proposes a comprehensive framework for how we… ▽ More

    Submitted 28 April, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

    Comments: 35 pages

  25. arXiv:2503.05280  [pdf, other

    cs.CL

    Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

    Authors: Neemesh Yadav, Jiarui Liu, Francesco Ortu, Roya Ensafi, Zhijing Jin, Rada Mihalcea

    Abstract: The ability of Natural Language Processing (NLP) methods to categorize text into multiple classes has motivated their use in online content moderation tasks, such as hate speech and fake news detection. However, there is limited understanding of how or why these methods make such decisions, or why certain content is moderated in the first place. To investigate the hidden mechanisms behind content… ▽ More

    Submitted 10 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

  26. arXiv:2503.02038  [pdf, ps, other

    cs.CL

    Persuasion at Play: Understanding Misinformation Dynamics in Demographic-Aware Human-LLM Interactions

    Authors: Angana Borah, Rada Mihalcea, Verónica Pérez-Rosas

    Abstract: Existing challenges in misinformation exposure and susceptibility vary across demographic groups, as some populations are more vulnerable to misinformation than others. Large language models (LLMs) introduce new dimensions to these challenges through their ability to generate persuasive content at scale and reinforcing existing biases. This study investigates the bidirectional persuasion dynamics… ▽ More

    Submitted 14 October, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  27. arXiv:2503.02016  [pdf, ps, other

    cs.CL cs.AI

    Mind the (Belief) Gap: Group Identity in the World of LLMs

    Authors: Angana Borah, Marwa Houalla, Rada Mihalcea

    Abstract: Social biases and belief-driven behaviors can significantly impact Large Language Models (LLMs) decisions on several tasks. As LLMs are increasingly used in multi-agent systems for societal simulations, their ability to model fundamental group psychological characteristics remains critical yet under-explored. In this study, we present a multi-agent framework that simulates belief congruence, a cla… ▽ More

    Submitted 7 October, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted to ACL 2025 (Findings)

  28. arXiv:2503.00018  [pdf, other

    cs.CL cs.AI

    Eeyore: Realistic Depression Simulation via Supervised and Preference Optimization

    Authors: Siyang Liu, Bianca Brie, Wenda Li, Laura Biester, Andrew Lee, James Pennebaker, Rada Mihalcea

    Abstract: Large Language Models (LLMs) have been previously explored for mental healthcare training and therapy client simulation, but they still fall short in authentically capturing diverse client traits and psychological conditions. We introduce \textbf{Eeyore}, an 8B model optimized for realistic depression simulation through a structured alignment framework, incorporating expert input at every stage. F… ▽ More

    Submitted 21 February, 2025; originally announced March 2025.

    ACM Class: I.2.7

  29. arXiv:2502.08458  [pdf, other

    cs.CL

    Examining Spanish Counseling with MIDAS: a Motivational Interviewing Dataset in Spanish

    Authors: Aylin Gunal, Bowen Yi, John Piette, Rada Mihalcea, Verónica Pérez-Rosas

    Abstract: Cultural and language factors significantly influence counseling, but Natural Language Processing research has not yet examined whether the findings of conversational analysis for counseling conducted in English apply to other languages. This paper presents a first step towards this direction. We introduce MIDAS (Motivational Interviewing Dataset in Spanish), a counseling dataset created from publ… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: To appear in NAACL 2025 Main Conference

  30. arXiv:2502.07663  [pdf, ps, other

    cs.AI cs.CL cs.CY cs.HC

    Human Decision-making is Susceptible to AI-driven Manipulation

    Authors: Sahand Sabour, June M. Liu, Siyang Liu, Chris Z. Yao, Shiyao Cui, Xuanming Zhang, Wen Zhang, Yaru Cao, Advait Bhat, Jian Guan, Wei Wu, Rada Mihalcea, Hongning Wang, Tim Althoff, Tatia M. C. Lee, Minlie Huang

    Abstract: AI systems are increasingly intertwined with daily life, assisting users with various tasks and guiding decision-making. This integration introduces risks of AI-driven manipulation, where such systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes. Through a randomized between-subjects experiment with 233 participants, we examined human susc… ▽ More

    Submitted 1 December, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Work in progress

  31. arXiv:2501.15283  [pdf, other

    cs.CL

    Are Human Interactions Replicable by Generative Agents? A Case Study on Pronoun Usage in Hierarchical Interactions

    Authors: Naihao Deng, Rada Mihalcea

    Abstract: As Large Language Models (LLMs) advance in their capabilities, researchers have increasingly employed them for social simulation. In this paper, we investigate whether interactions among LLM agents resemble those of humans. Specifically, we focus on the pronoun usage difference between leaders and non-leaders, examining whether the simulation would lead to human-like pronoun usage patterns during… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  32. arXiv:2501.14693  [pdf, ps, other

    cs.CL cs.AI

    Rethinking Table Instruction Tuning

    Authors: Naihao Deng, Rada Mihalcea

    Abstract: Recent advances in table understanding have focused on instruction-tuning large language models (LLMs) for table-related tasks. However, existing research has overlooked the impact of hyperparameter choices, and also lacks a comprehensive evaluation of the out-of-domain table understanding ability and the general capabilities of these table LLMs. In this paper, we evaluate these abilities in exist… ▽ More

    Submitted 1 August, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: Accepted to ACL 2025 Findings. Updates: 07/2025: We release the TAMA-QWen2.5 and TAMA-QWen3 models. 06/2025: We release our project page: https://lit.eecs.umich.edu/TAMA/, code: https://github.com/MichiganNLP/TAMA, huggingface models: https://huggingface.co/collections/MichiganNLP/tama-684eeb3e7f262362856eccd1, and data: https://huggingface.co/datasets/MichiganNLP/TAMA_Instruct

  33. arXiv:2412.17729  [pdf, other

    cs.CL cs.AI

    Chumor 2.0: Towards Benchmarking Chinese Humor Understanding

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Rada Mihalcea, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, leaving limited resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, the first Chinese humor explanation dataset that exceeds the size of existing humor datasets. Chumor is sourced from Ruo Zhi Ba, a Chinese Reddit-like platform known for sharing intellectually… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.12754

  34. arXiv:2411.11758  [pdf, other

    cs.CV cs.AI cs.CL

    The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning

    Authors: Longju Bai, Angana Borah, Oana Ignat, Rada Mihalcea

    Abstract: Large Multimodal Models (LMMs) exhibit impressive performance across various multimodal tasks. However, their effectiveness in cross-cultural contexts remains limited due to the predominantly Western-centric nature of most data and models. Conversely, multi-agent models have shown significant capability in solving complex tasks. Our study evaluates the collective performance of LMMs in a multi-age… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  35. arXiv:2410.16315  [pdf, other

    cs.CY

    Why AI Is WEIRD and Should Not Be This Way: Towards AI For Everyone, With Everyone, By Everyone

    Authors: Rada Mihalcea, Oana Ignat, Longju Bai, Angana Borah, Luis Chiruzzo, Zhijing Jin, Claude Kwizera, Joan Nwatu, Soujanya Poria, Thamar Solorio

    Abstract: This paper presents a vision for creating AI systems that are inclusive at every stage of development, from data collection to model design and evaluation. We address key limitations in the current AI pipeline and its WEIRD representation, such as lack of data diversity, biases in model performance, and narrow evaluation metrics. We also focus on the need for diverse representation among the devel… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  36. arXiv:2410.02584  [pdf, other

    cs.CL cs.CY

    Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

    Authors: Angana Borah, Rada Mihalcea

    Abstract: As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our s… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP Findings 2024

  37. arXiv:2407.02623  [pdf, other

    cs.CY cs.AI cs.CL cs.CV

    Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models

    Authors: Joan Nwatu, Oana Ignat, Rada Mihalcea

    Abstract: Recent work has demonstrated that the unequal representation of cultures and socioeconomic groups in training data leads to biased Large Multi-modal (LMM) models. To improve LMM model performance on underrepresented data, we propose and evaluate several prompting strategies using non-English, geographic, and socioeconomic attributes. We show that these geographic and socioeconomic integrated promp… ▽ More

    Submitted 14 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    ACM Class: K.4; I.2.7; I.2.8

  38. arXiv:2407.02273  [pdf, other

    cs.CL

    Language Model Alignment in Multilingual Trolley Problems

    Authors: Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, Sydney Levine, Jiarui Liu, Fernando Gonzalez, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, Bernhard Schölkopf

    Abstract: We evaluate the moral alignment of LLMs with human preferences in multilingual trolley problems. Building on the Moral Machine experiment, which captures over 40 million human judgments across 200+ countries, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. This dataset enables the assessment of LLMs' decision-making processes in diverse linguistic… ▽ More

    Submitted 27 May, 2025; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: ICLR 2025 Spotlight, Best Paper @ NeurIPS 2024 Workshop on Pluralistic Alignment

  39. arXiv:2406.16152  [pdf, ps, other

    cs.CL

    Towards Region-aware Bias Evaluation Metrics

    Authors: Angana Borah, Aparna Garimella, Rada Mihalcea

    Abstract: When exposed to human-generated data, language models are known to learn and amplify societal biases. While previous works introduced benchmarks that can be used to assess the bias in these models, they rely on assumptions that may not be universally true. For instance, a gender bias dimension commonly used by these metrics is that of family--career, but this may not be the only common bias in cer… ▽ More

    Submitted 14 October, 2025; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted to Cross-Cultural Considerations in NLP (C3NLP Workshop at NAACL 2025) -- Outstanding Paper Award

  40. arXiv:2406.09264  [pdf, ps, other

    cs.HC cs.AI cs.CL

    Position: Towards Bidirectional Human-AI Alignment

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and cross-disciplinary collaboration. In this position paper, we argue that the research community should explicitly define and critically reflect on "alignment" to account for the… ▽ More

    Submitted 29 September, 2025; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2025 Position Paper

  41. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (51 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 4 November, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

  42. arXiv:2405.20318  [pdf, ps, other

    cs.CL cs.AI cs.LG stat.ML

    Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries

    Authors: Roberto Ceraolo, Dmitrii Kharlapenko, Ahmad Khan, Amélie Reymond, Punya Syon Pandey, Rada Mihalcea, Bernhard Schölkopf, Mrinmaya Sachan, Zhijing Jin

    Abstract: Recent progress in Large Language Model (LLM) technology has changed our role in interacting with these models. Instead of primarily testing these models with questions we already know answers to, we are now using them for queries where the answers are unknown to us, driven by human curiosity. This shift highlights the growing need to understand curiosity-driven human questions - those that are mo… ▽ More

    Submitted 9 November, 2025; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: IJCNLP-AACL 2025 Findings

  43. arXiv:2405.14808  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Implicit Personalization in Language Models: A Systematic Study

    Authors: Zhijing Jin, Nils Heil, Jiarui Liu, Shehzaad Dhuliawala, Yahang Qi, Bernhard Schölkopf, Rada Mihalcea, Mrinmaya Sachan

    Abstract: Implicit Personalization (IP) is a phenomenon of language models inferring a user's background from the implicit cues in the input prompts and tailoring the response based on this inference. While previous work has touched upon various instances of this problem, there lacks a unified framework to study this behavior. This work systematically studies IP through a rigorous mathematical formulation,… ▽ More

    Submitted 31 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: EMNLP 2024 Findings

  44. arXiv:2405.04655  [pdf, other

    cs.CL

    Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

    Authors: Siqi Shen, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Soujanya Poria, Rada Mihalcea

    Abstract: Large language models (LLMs) have demonstrated substantial commonsense understanding through numerous benchmark evaluations. However, their understanding of cultural commonsense remains largely unexamined. In this paper, we conduct a comprehensive examination of the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks. Using several general and… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  45. arXiv:2404.18739  [pdf, other

    cs.CL

    Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification

    Authors: Artem Abzaliev, Humberto Pérez Espinosa, Rada Mihalcea

    Abstract: Similar to humans, animals make extensive use of verbal and non-verbal forms of communication, including a large range of audio signals. In this paper, we address dog vocalizations and explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks that find parallels in human-centered tasks in speech recognition. We specifically… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: to be published in LREC-COLING 2024

  46. arXiv:2404.16698  [pdf, other

    cs.CL

    Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

    Authors: Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea

    Abstract: As AI systems pervade human life, ensuring that large language models (LLMs) make safe decisions remains a significant challenge. We introduce the Governance of the Commons Simulation (GovSim), a generative simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. In GovSim, a society of AI agents must collectively balance exploiting a common resource wi… ▽ More

    Submitted 8 December, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: NeurIPS 2024

  47. arXiv:2404.12938  [pdf, other

    cs.CL cs.AI

    MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews

    Authors: Oana Ignat, Xiaomeng Xu, Rada Mihalcea

    Abstract: Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily o… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  48. arXiv:2404.12933  [pdf, other

    cs.CL cs.AI

    Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

    Authors: Oana Ignat, Gayathri Ganesh Lakshmy, Rada Mihalcea

    Abstract: Inspiration is linked to various positive outcomes, such as increased creativity, productivity, and happiness. Although inspiration has great potential, there has been limited effort toward identifying content that is inspiring, as opposed to just engaging or positive. Additionally, most research has concentrated on Western data, with little attention paid to other cultures. This work is the first… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  49. arXiv:2404.11055  [pdf, other

    cs.CL

    Do LLMs Think Fast and Slow? A Causal Study on Sentiment Analysis

    Authors: Zhiheng Lyu, Zhijing Jin, Fernando Gonzalez, Rada Mihalcea, Bernhard Schölkopf, Mrinmaya Sachan

    Abstract: Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this work formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the tradit… ▽ More

    Submitted 27 October, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: EMNLP 2024 Findings

  50. arXiv:2404.09956  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

    Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

    Abstract: Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of such processes in the music and film industry. Many of the recent diffusion-based text-to-audio models… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at ACM MM 2024