Skip to main content

Showing 1–50 of 293 results for author: Zhang, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2512.19443  [pdf, ps, other

    cs.CV

    D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning

    Authors: Evelyn Zhang, Fufu Yu, Aoqi Wu, Zichen Wen, Ke Yan, Shouhong Ding, Biqing Qi, Linfeng Zhang

    Abstract: Processing long visual token sequences poses a significant computational burden on Multimodal Large Language Models (MLLMs). While token pruning offers a path to acceleration, we find that current methods, while adequate for general understanding, catastrophically fail on fine-grained localization tasks. We attribute this failure to the inherent flaws of the two prevailing strategies: importance-b… ▽ More

    Submitted 22 December, 2025; originally announced December 2025.

  2. arXiv:2512.11067  [pdf, ps, other

    cs.DB cs.AI

    KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration

    Authors: Guorui Xiao, Enhao Zhang, Nicole Sullivan, Will Hansen, Magdalena Balazinska

    Abstract: Traditional DBMSs execute user- or application-provided SQL queries over relational data with strong semantic guarantees and advanced query optimization, but writing complex SQL is hard and focuses only on structured tables. Contemporary multimodal systems (which operate over relations but also text, images, and even videos) either expose low-level controls that force users to use (and possibly cr… ▽ More

    Submitted 11 December, 2025; originally announced December 2025.

  3. arXiv:2512.09636  [pdf, ps, other

    cs.CL

    MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

    Authors: Mengxi Xiao, Kailai Yang, Pengde Zhao, Enze Zhang, Ziyan Kuang, Zhiwei Liu, Weiguang Han, Shu Liao, Lianting Huang, Jinpeng Hu, Min Peng, Qianqian Xie, Sophia Ananiadou

    Abstract: Mental health disorders affect hundreds of millions globally, and the Web now serves as a primary medium for accessing support, information, and assessment. Large language models (LLMs) offer scalable and accessible assistance, yet their deployment in mental-health settings remains risky when their reasoning is incomplete, inconsistent, or ungrounded. Existing psychological LLMs emphasize emotiona… ▽ More

    Submitted 16 December, 2025; v1 submitted 10 December, 2025; originally announced December 2025.

  4. arXiv:2512.08754  [pdf, ps, other

    cs.RO

    A Multi-Robot Platform for Robotic Triage Combining Onboard Sensing and Foundation Models

    Authors: Jason Hughes, Marcel Hussing, Edward Zhang, Shenbagaraj Kannapiran, Joshua Caswell, Kenneth Chaney, Ruichen Deng, Michaela Feehery, Agelos Kratimenos, Yi Fan Li, Britny Major, Ethan Sanchez, Sumukh Shrote, Youkang Wang, Jeremy Wang, Daudi Zein, Luying Zhang, Ruijun Zhang, Alex Zhou, Tenzi Zhouga, Jeremy Cannon, Zaffir Qasim, Jay Yelon, Fernando Cladera, Kostas Daniilidis , et al. (2 additional authors not shown)

    Abstract: This report presents a heterogeneous robotic system designed for remote primary triage in mass-casualty incidents (MCIs). The system employs a coordinated air-ground team of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to locate victims, assess their injuries, and prioritize medical assistance without risking the lives of first responders. The UAV identify and provide overhe… ▽ More

    Submitted 9 December, 2025; originally announced December 2025.

    Comments: Technical Report for the DARPA Triage Challenge PRONTO team

  5. arXiv:2512.01104  [pdf, ps, other

    cs.RO cs.CV

    Estimation of Kinematic Motion from Dashcam Footage

    Authors: Evelyn Zhang, Alex Richardson, Jonathan Sprinkle

    Abstract: The goal of this paper is to explore the accuracy of dashcam footage to predict the actual kinematic motion of a car-like vehicle. Our approach uses ground truth information from the vehicle's on-board data stream, through the controller area network, and a time-synchronized dashboard camera, mounted to a consumer-grade vehicle, for 18 hours of footage and driving. The contributions of the paper i… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: 8 pages, 10 figures

  6. arXiv:2511.19275  [pdf, ps, other

    cs.SD cs.AI eess.AS eess.SP

    Dynamic Multi-Species Bird Soundscape Generation with Acoustic Patterning and 3D Spatialization

    Authors: Ellie L. Zhang, Duoduo Liao, Callie C. Liao

    Abstract: Generation of dynamic, scalable multi-species bird soundscapes remains a significant challenge in computer music and algorithmic sound design. Birdsongs involve rapid frequency-modulated chirps, complex amplitude envelopes, distinctive acoustic patterns, overlapping calls, and dynamic inter-bird interactions, all of which require precise temporal and spatial control in 3D environments. Existing ap… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

    Comments: Accepted by IEEE Big Data 2025

  7. arXiv:2511.17323  [pdf, ps, other

    cs.SD cs.AI cs.CL cs.MM

    MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core

    Authors: Callie C. Liao, Duoduo Liao, Ellie L. Zhang

    Abstract: Recent advances in generative AI have made music generation a prominent research focus. However, many neural-based models rely on large datasets, raising concerns about copyright infringement and high-performance costs. In contrast, we propose MusicAIR, an innovative multimodal AI music generation framework powered by a novel algorithm-driven symbolic music core, effectively mitigating copyright i… ▽ More

    Submitted 21 November, 2025; originally announced November 2025.

    Comments: Accepted by IEEE Big Data 2025

  8. arXiv:2511.12878  [pdf, ps, other

    cs.CV cs.RO

    Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

    Authors: Junyi Ma, Wentao Bao, Jingyi Xu, Guanzhong Sun, Yu Zheng, Erhang Zhang, Xieyuanli Chen, Hesheng Wang

    Abstract: Forecasting how human hands move in egocentric views is critical for applications like augmented reality and human-robot policy transfer. Recently, several hand trajectory prediction (HTP) methods have been developed to generate future possible hand waypoints, which still suffer from insufficient prediction targets, inherent modality gaps, entangled hand-head motion, and limited validation in down… ▽ More

    Submitted 4 December, 2025; v1 submitted 16 November, 2025; originally announced November 2025.

    Comments: Extended journal version of MMTwin (IROS'25). Code and data: https://github.com/IRMVLab/UniHand

  9. arXiv:2511.11884  [pdf

    cs.CL

    Context-Emotion Aware Therapeutic Dialogue Generation: A Multi-component Reinforcement Learning Approach to Language Models for Mental Health Support

    Authors: Eric Hua Qing Zhang, Julia Ive

    Abstract: Mental health illness represents a substantial global socioeconomic burden, with COVID-19 further exacerbating accessibility challenges and driving increased demand for telehealth mental health support. While large language models (LLMs) offer promising solutions through 24/7 availability and non-judgmental interactions, pre-trained models often lack the contextual and emotional awareness necessar… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  10. arXiv:2511.10138  [pdf, ps, other

    cs.IR

    GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation

    Authors: Jun Zhang, Yi Li, Yue Liu, Changping Wang, Yuan Wang, Yuling Xiong, Xun Liu, Haiyang Wu, Qian Li, Enming Zhang, Jiawei Sun, Xin Xu, Zishuai Zhang, Ruoran Liu, Suyuan Huang, Zhaoxin Zhang, Zhengkai Guo, Shuojin Yang, Meng-Hao Guo, Huan Yu, Jie Jiang, Shi-Min Hu

    Abstract: As an intelligent infrastructure connecting users with commercial content, advertising recommendation systems play a central role in information flow and value creation within the digital economy. However, existing multi-stage advertising recommendation systems suffer from objective misalignment and error propagation, making it difficult to achieve global optimality, while unified generative recom… ▽ More

    Submitted 21 November, 2025; v1 submitted 13 November, 2025; originally announced November 2025.

    Comments: 12 pages, 5 figures

  11. arXiv:2511.08548  [pdf, ps, other

    cs.AI

    A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

    Authors: Shubhra Mishra, Yuka Machino, Gabriel Poesia, Albert Jiang, Joy Hsu, Adrian Weller, Challenger Mishra, David Broman, Joshua B. Tenenbaum, Mateja Jamnik, Cedegao E. Zhang, Katherine M. Collins

    Abstract: The evolution of mathematics has been guided in part by interestingness. From researchers choosing which problems to tackle next, to students deciding which ones to engage with, people's choices are often guided by judgments about how interesting or challenging problems are likely to be. As AI systems, such as LLMs, increasingly participate in mathematics with people -- whether for advanced resear… ▽ More

    Submitted 11 November, 2025; originally announced November 2025.

    Comments: Published at the Math-AI Workshop, NeurIPS 2025

  12. arXiv:2511.06458  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS

    EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response

    Authors: Chenpei Huang, Lingfeng Yao, Kyu In Lee, Lan Emily Zhang, Xun Chen, Miao Pan

    Abstract: Acoustic Environment Matching (AEM) is the task of transferring clean audio into a target acoustic environment, enabling engaging applications such as audio dubbing and auditory immersive virtual reality (VR). Recovering similar room impulse response (RIR) directly from reverberant speech offers more accessible and flexible AEM solution. However, this capability also introduces vulnerabilities of… ▽ More

    Submitted 9 November, 2025; originally announced November 2025.

  13. arXiv:2511.02022  [pdf, ps, other

    cs.LG cs.AI

    Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior

    Authors: Daniel Aarao Reis Arturi, Eric Zhang, Andrew Ansah, Kevin Zhu, Ashwinee Panda, Aishwarya Balwani

    Abstract: Recent work has discovered that large language models can develop broadly misaligned behaviors after being fine-tuned on narrowly harmful datasets, a phenomenon known as emergent misalignment (EM). However, the fundamental mechanisms enabling such harmful generalization across disparate domains remain poorly understood. In this work, we adopt a geometric perspective to study EM and demonstrate tha… ▽ More

    Submitted 3 November, 2025; originally announced November 2025.

  14. arXiv:2511.01053  [pdf

    cs.CL

    Building a Silver-Standard Dataset from NICE Guidelines for Clinical LLMs

    Authors: Qing Ding, Eric Hua Qing Zhang, Felix Jozsa, Julia Ive

    Abstract: Large language models (LLMs) are increasingly used in healthcare, yet standardised benchmarks for evaluating guideline-based clinical reasoning are missing. This study introduces a validated dataset derived from publicly available guidelines across multiple diagnoses. The dataset was created with the help of GPT and contains realistic patient scenarios, as well as clinical questions. We benchmark… ▽ More

    Submitted 2 November, 2025; originally announced November 2025.

    Comments: Submitted to EFMI Medical Informatics Europe 2026

  15. arXiv:2510.25015  [pdf, ps, other

    cs.SE cs.AI

    VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus

    Authors: Chuyue Sun, Yican Sun, Daneshvar Amrollahi, Ethan Zhang, Shuvendu Lahiri, Shan Lu, David Dill, Clark Barrett

    Abstract: We introduce VeriStruct, a novel framework that extends AI-assisted automated verification from single functions to more complex data structure modules in Verus. VeriStruct employs a planner module to orchestrate the systematic generation of abstractions, type invariants, specifications, and proof code. To address the challenge that LLMs often misunderstand Verus' annotation syntax and verificatio… ▽ More

    Submitted 16 November, 2025; v1 submitted 28 October, 2025; originally announced October 2025.

  16. arXiv:2510.20909  [pdf, ps, other

    cs.CL cs.AI

    Code-enabled language models can outperform reasoning models on diverse tasks

    Authors: Cedegao E. Zhang, Cédric Colas, Gabriel Poesia, Joshua B. Tenenbaum, Jacob Andreas

    Abstract: Reasoning models (RMs), language models (LMs) trained with reinforcement learning to produce long-form natural language reasoning, have been remarkably successful, but they still require large amounts of computation and data to train, and can be slow and expensive to run. In this paper, we show that standard instruct LMs can already be elicited to be strong reasoners at a level comparable to or ev… ▽ More

    Submitted 23 October, 2025; originally announced October 2025.

  17. arXiv:2510.17950  [pdf, ps, other

    cs.RO

    RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies

    Authors: Adina Yakefu, Bin Xie, Chongyang Xu, Enwen Zhang, Erjin Zhou, Fan Jia, Haitao Yang, Haoqiang Fan, Haowei Zhang, Hongyang Peng, Jing Tan, Junwen Huang, Kai Liu, Kaixin Liu, Kefan Gu, Qinglun Zhang, Ruitao Zhang, Saike Huang, Shen Cheng, Shuaicheng Liu, Tiancai Wang, Tiezhen Wang, Wei Sun, Wenbin Tang, Yajun Wei , et al. (12 additional authors not shown)

    Abstract: Testing on real machines is indispensable for robotic control algorithms. In the context of learning-based algorithms, especially VLA models, demand for large-scale evaluation, i.e. testing a large number of models on a large number of tasks, is becoming increasingly urgent. However, doing this right is highly non-trivial, especially when scalability and reproducibility is taken into account. In t… ▽ More

    Submitted 20 October, 2025; originally announced October 2025.

    Comments: Authors are listed in alphabetical order. The official website is located at https://robochallenge.ai

  18. arXiv:2510.16196  [pdf, ps, other

    cs.CV cs.AI

    Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI

    Authors: Zheng Huang, Enpei Zhang, Yinghao Cai, Weikang Qiu, Carl Yang, Elynn Chen, Xiang Zhang, Rex Ying, Dawei Zhou, Yujun Yan

    Abstract: Understanding how the brain encodes visual information is a central challenge in neuroscience and machine learning. A promising approach is to reconstruct visual stimuli, essentially images, from functional Magnetic Resonance Imaging (fMRI) signals. This involves two stages: transforming fMRI signals into a latent space and then using a pretrained generative model to reconstruct images. The recons… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

  19. arXiv:2510.14763  [pdf, ps, other

    cs.CL cs.AI

    COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

    Authors: Yunwen Li, Shuangshuang Ying, Xingwei Qu, Xin Li, Sheng Jin, Minghao Liu, Zhoufutu Wen, Tianyu Zheng, Xeron Du, Qiguang Chen, Jiajun Shi, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Libo Qin, Stephen Huang, Wanxiang Che, Chenghua Lin, Eli Zhang

    Abstract: Large language models exhibit systematic deficiencies in creative writing, particularly in non-English contexts where training data is scarce and lacks process-level supervision. We present COIG-Writer, a novel Chinese creative writing dataset that captures both diverse outputs and their underlying thought processes through systematic reverse-engineering of high-quality texts. Unlike existing data… ▽ More

    Submitted 16 October, 2025; originally announced October 2025.

  20. arXiv:2510.12847  [pdf, ps, other

    cs.LG

    Lifting Manifolds to Mitigate Pseudo-Alignment in LLM4TS

    Authors: Liangwei Nathan Zheng, Wenhao Liang, Wei Emma Zhang, Miao Xu, Olaf Maennel, Weitong Chen

    Abstract: Pseudo-Alignment is a pervasive challenge in many large language models for time series (LLM4TS) models, often causing them to underperform compared to linear models or randomly initialised backbones. However, there is limited discussion in the community for the reasons that pseudo-alignment occurs. In this work, we conduct a thorough investigation into the root causes of pseudo-alignment in LLM4T… ▽ More

    Submitted 14 October, 2025; originally announced October 2025.

  21. arXiv:2510.11503  [pdf, ps, other

    q-bio.NC cs.AI cs.GT

    People use fast, flat goal-directed simulation to reason about novel problems

    Authors: Katherine M. Collins, Cedegao E. Zhang, Lionel Wong, Mauricio Barba da Costa, Graham Todd, Adrian Weller, Samuel J. Cheyette, Thomas L. Griffiths, Joshua B. Tenenbaum

    Abstract: Games have long been a microcosm for studying planning and reasoning in both natural and artificial intelligence, especially with a focus on expert-level or even super-human play. But real life also pushes human intelligence along a different frontier, requiring people to flexibly navigate decision-making problems that they have never thought about before. Here, we use novice gameplay to study how… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

    Comments: Pre-print

  22. arXiv:2510.10930  [pdf, ps, other

    cs.CL cs.AI

    Evaluating Language Models' Evaluations of Games

    Authors: Katherine M. Collins, Cedegao E. Zhang, Graham Todd, Lance Ying, Mauricio Barba da Costa, Ryan Liu, Prafull Sharma, Adrian Weller, Ionatan Kuperwajs, Lionel Wong, Joshua B. Tenenbaum, Thomas L. Griffiths

    Abstract: Reasoning is not just about solving problems -- it is also about evaluating which problems are worth solving at all. Evaluations of artificial intelligence (AI) systems primarily focused on problem solving, historically by studying how models play games such as chess and Go. In this paper, we advocate for a new paradigm that assesses AI systems' evaluation of games. First, we introduce a formalism… ▽ More

    Submitted 12 October, 2025; originally announced October 2025.

    Comments: Pre-print

  23. arXiv:2510.09116  [pdf, ps, other

    cs.CL

    DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

    Authors: Enze Zhang, Jiaying Wang, Mengxi Xiao, Jifei Liu, Ziyan Kuang, Rui Dong, Eric Dong, Sophia Ananiadou, Min Peng, Qianqian Xie

    Abstract: Large language models (LLMs) have substantially advanced machine translation (MT), yet their effectiveness in translating web novels remains unclear. Existing benchmarks rely on surface-level metrics that fail to capture the distinctive traits of this genre. To address these gaps, we introduce DITING, the first comprehensive evaluation framework for web novel translation, assessing narrative and c… ▽ More

    Submitted 13 October, 2025; v1 submitted 10 October, 2025; originally announced October 2025.

  24. arXiv:2510.04465  [pdf, ps, other

    cs.HC cs.AI cs.CR

    Autonomy Matters: A Study on Personalization-Privacy Dilemma in LLM Agents

    Authors: Zhiping Zhang, Yi Evie Zhang, Freda Shi, Tianshi Li

    Abstract: Large Language Model (LLM) agents require personal information for personalization in order to better act on users' behalf in daily tasks, but this raises privacy concerns and a personalization-privacy dilemma. Agent's autonomy introduces both risks and opportunities, yet its effects remain unclear. To better understand this, we conducted a 3$\times$3 between-subjects experiment ($N=450$) to study… ▽ More

    Submitted 5 October, 2025; originally announced October 2025.

  25. arXiv:2510.00351  [pdf, ps, other

    cs.LG q-bio.BM

    Flow Autoencoders are Effective Protein Tokenizers

    Authors: Rohit Dilip, Evan Zhang, Ayush Varshney, David Van Valen

    Abstract: Protein structure tokenizers enable the creation of multimodal models of protein structure, sequence, and function. Current approaches to protein structure tokenization rely on bespoke components that are invariant to spatial symmetries, but that are challenging to optimize and scale. We present Kanzi, a flow-based tokenizer for tokenization and generation of protein structures. Kanzi consists of… ▽ More

    Submitted 30 September, 2025; originally announced October 2025.

  26. arXiv:2510.00222  [pdf, ps, other

    cs.HC

    Data Melodification FM: Where Musical Rhetoric Meets Sonification

    Authors: Ke Er Amy Zhang, David Grellscheid, Laura Garrison

    Abstract: We propose a design space for data melodification, where standard visualization idioms and fundamental data characteristics map to rhetorical devices of music for a more affective experience of data. Traditional data sonification transforms data into sound by mapping it to different parameters such as pitch, volume, and duration. Often and regrettably, this mapping leaves behind melody, harmony, r… ▽ More

    Submitted 30 September, 2025; originally announced October 2025.

    Comments: 5 pages, 5 figures, accepted to alt.VIS 2025

  27. arXiv:2509.23585  [pdf, ps, other

    cs.LG cs.AI cs.CV

    EVO-LRP: Evolutionary Optimization of LRP for Interpretable Model Explanations

    Authors: Emerald Zhang, Julian Weaver, Samantha R Santacruz, Edward Castillo

    Abstract: Explainable AI (XAI) methods help identify which image regions influence a model's prediction, but often face a trade-off between detail and interpretability. Layer-wise Relevance Propagation (LRP) offers a model-aware alternative. However, LRP implementations commonly rely on heuristic rule sets that are not optimized for clarity or alignment with model behavior. We introduce EVO-LRP, a method th… ▽ More

    Submitted 30 September, 2025; v1 submitted 27 September, 2025; originally announced September 2025.

    Comments: 15 pages

  28. arXiv:2509.16717  [pdf, ps, other

    cs.CL

    Semi-Supervised Synthetic Data Generation with Fine-Grained Relevance Control for Short Video Search Relevance Modeling

    Authors: Haoran Li, Zhiming Su, Junyan Yao, Enwei Zhang, Yang Ji, Yan Chen, Kan Zhou, Chao Feng, Jiao Ran

    Abstract: Synthetic data is widely adopted in embedding models to ensure diversity in training data distributions across dimensions such as difficulty, length, and language. However, existing prompt-based synthesis methods struggle to capture domain-specific data distributions, particularly in data-scarce domains, and often overlook fine-grained relevance diversity. In this paper, we present a Chinese short… ▽ More

    Submitted 4 December, 2025; v1 submitted 20 September, 2025; originally announced September 2025.

    Comments: Submitted to AAAI 2026

  29. arXiv:2509.14304  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework

    Authors: Eric Zhang, Li Wei, Sarah Chen, Michael Wang

    Abstract: Stuttered and dysfluent speech detection systems have traditionally suffered from the trade-off between accuracy and clinical interpretability. While end-to-end deep learning models achieve high performance, their black-box nature limits clinical adoption. This paper looks at the Unconstrained Dysfluency Modeling (UDM) series-the current state-of-the-art framework developed by Berkeley that combin… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  30. arXiv:2509.14177  [pdf, ps, other

    cs.GR

    Progressing Level-of-Detail Animation of Volumetric Elastodynamics

    Authors: Jiayi Eris Zhang, Doug L. James, Danny M. Kaufman

    Abstract: We extend the progressive dynamics model (Zhang et al., 2024) from cloth and shell simulation to volumetric finite elements, enabling an efficient level-of-detail (LOD) animation-design pipeline with predictive coarse-resolution previews facilitating rapid iterative design for a final, to-be-generated, high-resolution animation of volumetric elastodynamics. This extension to volumetric domains pos… ▽ More

    Submitted 16 September, 2025; originally announced September 2025.

  31. arXiv:2509.06984  [pdf, ps, other

    cs.LG cs.AI

    FediLoRA: Heterogeneous LoRA for Federated Multimodal Fine-tuning under Missing Modalities

    Authors: Lishan Yang, Wei Emma Zhang, Nam Kha Nguygen, Po Hu, Yanjun Shu, Weitong Chen, Mong Yuan Sim

    Abstract: Foundation models have demonstrated remarkable performance across a wide range of tasks, yet their large parameter sizes pose challenges for practical deployment, especially in decentralized environments. Parameter-efficient fine-tuning (PEFT), such as Low-Rank Adaptation (LoRA), reduces local computing and memory overhead, making it attractive for federated learning. However, existing federated L… ▽ More

    Submitted 23 September, 2025; v1 submitted 1 September, 2025; originally announced September 2025.

    Comments: 8 pages, 7 figures

    ACM Class: I.2.7; I.2.11

  32. arXiv:2509.06952  [pdf, ps, other

    cs.CL

    On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts

    Authors: Linlu Qiu, Cedegao E. Zhang, Joshua B. Tenenbaum, Yoon Kim, Roger P. Levy

    Abstract: Language use is shaped by pragmatics -- i.e., reasoning about communicative goals and norms in context. As language models (LMs) are increasingly used as conversational agents, it becomes ever more important to understand their pragmatic reasoning abilities. We propose an evaluation framework derived from Wavelength, a popular communication game where a speaker and a listener communicate about a b… ▽ More

    Submitted 26 September, 2025; v1 submitted 8 September, 2025; originally announced September 2025.

    Comments: EMNLP 2025 (Main)

  33. arXiv:2509.04664  [pdf, ps, other

    cs.CL

    Why Language Models Hallucinate

    Authors: Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, Edwin Zhang

    Abstract: Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such "hallucinations" persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  34. arXiv:2509.03059  [pdf, ps, other

    cs.LG cs.AI

    Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

    Authors: Xingyue Huang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, Wendong Fan, Xin Gao, Ruohao Guo, Yuan He, Zhuangzhuang He, Xianglong Hu, Neil Johnson, Bowen Li, Fangru Lin, Siyu Lin, Tong Liu, Yunpu Ma, Hao Shen, Hao Sun, Beibei Wang , et al. (21 additional authors not shown)

    Abstract: Recent advances in Large Language Models (LLMs) have shown that their reasoning capabilities can be significantly improved through Reinforcement Learning with Verifiable Reward (RLVR), particularly in domains like mathematics and programming, where ground-truth correctness can be automatically evaluated. However, extending this success to other reasoning-intensive domains remains challenging due t… ▽ More

    Submitted 3 September, 2025; originally announced September 2025.

  35. arXiv:2509.00440  [pdf, ps, other

    cs.HC

    Data Humanism Decoded: A Characterization of its Principles to Bridge Data Visualization Researchers and Practitioners

    Authors: Ibrahim Al-Hazwani, Ke Er Zhang, Laura Garrison, Jürgen Bernard

    Abstract: Data Humanism is a human-centered design approach that emphasizes the personal, contextual, and imperfect nature of data. Despite its growing influence among practitioners, the 13 principles outlined in Giorgia Lupi's visual manifesto remain loosely defined in research contexts, creating a gap between design practice and systematic application. Through a mixed-methods approach, including a systema… ▽ More

    Submitted 30 August, 2025; originally announced September 2025.

    Comments: 5 pages, to be feature in the proceedings of IEEE VIS Short paper track

  36. arXiv:2509.00058  [pdf, ps, other

    cs.AI

    A Comparative Study of Controllability, Explainability, and Performance in Dysfluency Detection Models

    Authors: Eric Zhang, Li Wei, Sarah Chen, Michael Wang

    Abstract: Recent advances in dysfluency detection have introduced a variety of modeling paradigms, ranging from lightweight object-detection inspired networks (YOLOStutter) to modular interpretable frameworks (UDM). While performance on benchmark datasets continues to improve, clinical adoption requires more than accuracy: models must be controllable and explainable. In this paper, we present a systematic c… ▽ More

    Submitted 25 August, 2025; originally announced September 2025.

  37. arXiv:2508.17669  [pdf, ps, other

    cs.AI

    A Taxonomy of Transcendence

    Authors: Natalie Abreu, Edwin Zhang, Eran Malach, Naomi Saphra

    Abstract: Although language models are trained to mimic humans, the resulting systems display capabilities beyond the scope of any one person. To understand this phenomenon, we use a controlled setting to identify properties of the training data that lead a model to transcend the performance of its data sources. We build on previous work to outline three modes of transcendence, which we call skill denoising… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  38. arXiv:2508.16681  [pdf, ps, other

    cs.AI cs.CL

    Revisiting Rule-Based Stuttering Detection: A Comprehensive Analysis of Interpretable Models for Clinical Applications

    Authors: Eric Zhang

    Abstract: Stuttering affects approximately 1% of the global population, impacting communication and quality of life. While recent advances in deep learning have pushed the boundaries of automatic speech dysfluency detection, rule-based approaches remain crucial for clinical applications where interpretability and transparency are paramount. This paper presents a comprehensive analysis of rule-based stutteri… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

  39. arXiv:2508.12349  [pdf, ps, other

    cs.CV

    EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos

    Authors: Junyi Ma, Erhang Zhang, Yin-Dong Zheng, Yuchen Xie, Yixuan Zhou, Hesheng Wang

    Abstract: Analyzing hand-object interaction in egocentric vision facilitates VR/AR applications and human-robot policy transfer. Existing research has mostly focused on modeling the behavior paradigm of interactive actions (i.e., ``how to interact''). However, the more challenging and fine-grained problem of capturing the critical moments of contact and separation between the hand and the target object (i.e… ▽ More

    Submitted 17 August, 2025; originally announced August 2025.

    Comments: Extended journal version of arXiv:2506.03662

  40. arXiv:2508.10925  [pdf, ps, other

    cs.CL cs.AI

    gpt-oss-120b & gpt-oss-20b Model Card

    Authors: OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook , et al. (102 additional authors not shown)

    Abstract: We present gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models that push the frontier of accuracy and inference cost. The models use an efficient mixture-of-expert transformer architecture and are trained using large-scale distillation and reinforcement learning. We optimize the models to have strong agentic capabilities (deep research browsing, python tool use, and support for develope… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

  41. arXiv:2508.10914  [pdf, ps, other

    cs.HC

    Generation and Evaluation in the Human Invention Process through the Lens of Game Design

    Authors: Katherine M. Collins, Graham Todd, Cedegao E. Zhang, Adrian Weller, Julian Togelius, Junyi Chu, Lionel Wong, Thomas L. Griffiths, Joshua B. Tenenbaum

    Abstract: The human ability to learn rules and solve problems has been a central concern of cognitive science research since the field's earliest days. But we do not just follow rules and solve problems given to us by others: we modify those rules, create new problems, and set new goals and tasks for ourselves and others. Arguably, even more than rule following and problem solving, human intelligence is abo… ▽ More

    Submitted 31 July, 2025; originally announced August 2025.

    Comments: CogSci conference non-archival paper

  42. arXiv:2508.09958  [pdf, ps, other

    cs.CL cs.LG

    Neural Bandit Based Optimal LLM Selection for a Pipeline of Tasks

    Authors: Baran Atalar, Eddie Zhang, Carlee Joe-Wong

    Abstract: With the increasing popularity of large language models (LLMs) for a variety of tasks, there has been a growing interest in strategies that can predict which out of a set of LLMs will yield a successful answer at low cost. This problem promises to become more and more relevant as providers like Microsoft allow users to easily create custom LLM "assistants" specialized to particular types of querie… ▽ More

    Submitted 17 August, 2025; v1 submitted 13 August, 2025; originally announced August 2025.

    Comments: Submitted to AAAI 2026

  43. arXiv:2508.08547  [pdf, ps, other

    cs.CV

    Calibration Attention: Instance-wise Temperature Scaling for Vision Transformers

    Authors: Wenhao Liang, Wei Emma Zhang, Lin Yue, Miao Xu, Olaf Maennel, Weitong Chen

    Abstract: Probability calibration is critical when Vision Transformers are deployed in risk-sensitive applications. The standard fix, post-hoc temperature scaling, uses a single global scalar and requires a held-out validation set. We introduce Calibration Attention (CalAttn), a drop-in module that learns an adaptive, per-instance temperature directly from the ViT's CLS token. Across CIFAR-10/100, MNIST, Ti… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

    Comments: UnderReview

  44. arXiv:2508.06806  [pdf, ps, other

    cs.LG cs.AI

    Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation

    Authors: Xiao Huang, Xu Liu, Enze Zhang, Tong Yu, Shuai Li

    Abstract: Offline-to-online Reinforcement Learning (O2O RL) aims to perform online fine-tuning on an offline pre-trained policy to minimize costly online interactions. Existing work used offline datasets to generate data that conform to the online data distribution for data augmentation. However, generated data still exhibits a gap with the online data, limiting overall performance. To address this, we prop… ▽ More

    Submitted 8 August, 2025; originally announced August 2025.

    Comments: ICML2025

  45. arXiv:2508.01005  [pdf, ps, other

    cs.CL cs.IR

    MAO-ARAG: Multi-Agent Orchestration for Adaptive Retrieval-Augmented Generation

    Authors: Yiqun Chen, Erhan Zhang, Lingyong Yan, Shuaiqiang Wang, Jizhou Huang, Dawei Yin, Jiaxin Mao

    Abstract: In question-answering (QA) systems, Retrieval-Augmented Generation (RAG) has become pivotal in enhancing response accuracy and reducing hallucination issues. The architecture of RAG systems varies significantly, encompassing single-round RAG, iterative RAG, and reasoning RAG, each tailored to address different types of queries. Due to the varying complexity of real-world queries, a fixed RAG pipel… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

  46. FedDPG: An Adaptive Yet Efficient Prompt-tuning Approach in Federated Learning Settings

    Authors: Ali Shakeri, Wei Emma Zhang, Amin Beheshti, Weitong Chen, Jian Yang, Lishan Yang

    Abstract: Pre-trained Language Models (PLMs) have demonstrated impressive performance in various NLP tasks. However, traditional fine-tuning methods for leveraging PLMs for downstream tasks entail significant computational overhead. Prompt-tuning has emerged as an efficient alternative that involves prepending a limited number of parameters to the input sequence and only updating them while the PLM's parame… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

    Comments: 12 pages; Published to PAKDD'2025

    ACM Class: I.2; I.7

    Journal ref: PAKDD 2025. Lecture Notes in Computer Science(), vol 15874

  47. arXiv:2507.16172  [pdf

    cs.CV

    AtrousMamaba: An Atrous-Window Scanning Visual State Space Model for Remote Sensing Change Detection

    Authors: Tao Wang, Tiecheng Bai, Chao Xu, Bin Liu, Erlei Zhang, Jiyun Huang, Hongming Zhang

    Abstract: Recently, a novel visual state space (VSS) model, referred to as Mamba, has demonstrated significant progress in modeling long sequences with linear complexity, comparable to Transformer models, thereby enhancing its adaptability for processing visual data. Although most methods aim to enhance the global receptive field by directly modifying Mamba's scanning mechanism, they tend to overlook the cr… ▽ More

    Submitted 21 July, 2025; originally announced July 2025.

  48. arXiv:2507.12547  [pdf, ps, other

    cs.CL cs.AI cs.PL

    Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models

    Authors: Lionel Wong, Katherine M. Collins, Lance Ying, Cedegao E. Zhang, Adrian Weller, Tobias Gerstenberg, Timothy O'Donnell, Alexander K. Lew, Jacob D. Andreas, Joshua B. Tenenbaum, Tyler Brooke-Wilson

    Abstract: When faced with novel situations, people are able to marshal relevant considerations from a wide range of background knowledge and put these to use in inferences and predictions. What permits us to draw in globally relevant information and reason over it coherently? Here, we explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental… ▽ More

    Submitted 18 July, 2025; v1 submitted 16 July, 2025; originally announced July 2025.

    Comments: Presented at CogSci 2025

  49. arXiv:2507.12377  [pdf, ps, other

    cs.HC

    Deconstructing Implicit Beliefs in Visual Data Journalism: Unstable Meanings Behind Data as Truth & Design for Insight

    Authors: Ke Er Amy Zhang, Jodie Jenkinson, Laura Garrison

    Abstract: We conduct a deconstructive reading of a qualitative interview study with 17 visual data journalists from newsrooms across the globe. We borrow a deconstruction approach from literary critique to explore the instability of meaning in language and reveal implicit beliefs in words and ideas. Through our analysis we surface two sets of opposing implicit beliefs in visual data journalism: objectivity/… ▽ More

    Submitted 18 July, 2025; v1 submitted 16 July, 2025; originally announced July 2025.

    Comments: 11 pages, 5 figures, accepted to IEEE VIS 2025 Conference

  50. arXiv:2507.08944  [pdf, ps, other

    cs.MA cs.AI

    Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents

    Authors: Enhao Zhang, Erkang Zhu, Gagan Bansal, Adam Fourney, Hussein Mozannar, Jack Gerrits

    Abstract: Large language model (LLM)-based multi-agent systems have demonstrated remarkable promise for tackling complex tasks by breaking them down into subtasks that are iteratively planned, executed, observed, and refined. Despite their effectiveness, these systems often incur high latency because real-world problems frequently demand multiple iterative cycles of reasoning steps. To address this challeng… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: ICML 2025 Workshop on MAS