Skip to main content

Showing 1–50 of 131 results for author: Kan, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.12598  [pdf, ps, other

    cs.CV

    Neural Gate: Mitigating Privacy Risks in LVLMs via Neuron-Level Gradient Gating

    Authors: Xiangkui Cao, Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen

    Abstract: Large Vision-Language Models (LVLMs) have shown remarkable potential across a wide array of vision-language tasks, leading to their adoption in critical domains such as finance and healthcare. However, their growing deployment also introduces significant security and privacy risks. Malicious actors could potentially exploit these models to extract sensitive information, highlighting a critical vul… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  2. arXiv:2602.09494  [pdf, ps, other

    cs.CV

    OSI: One-step Inversion Excels in Extracting Diffusion Watermarks

    Authors: Yuwei Chen, Zhenliang He, Jia Tang, Meina Kan, Shiguang Shan

    Abstract: Watermarking is an important mechanism for provenance and copyright protection of diffusion-generated images. Training-free methods, exemplified by Gaussian Shading, embed watermarks into the initial noise of diffusion models with negligible impact on the quality of generated images. However, extracting this type of watermark typically requires multi-step diffusion inversion to obtain precise init… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

  3. arXiv:2602.08945  [pdf, ps, other

    cs.CL cs.CY

    GitSearch: Enhancing Community Notes Generation with Gap-Informed Targeted Search

    Authors: Sahajpreet Singh, Kokil Jaidka, Min-Yen Kan

    Abstract: Community-based moderation offers a scalable alternative to centralized fact-checking, yet it faces significant structural challenges, and existing AI-based methods fail in "cold start" scenarios. To tackle these challenges, we introduce GitSearch (Gap-Informed Targeted Search), a framework that treats human-perceived quality gaps, such as missing context, etc., as first-class signals. GitSearch h… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

    Comments: 18 pages, 11 figures, 7 tables

  4. arXiv:2601.21742  [pdf, ps, other

    cs.AI cs.CL cs.MA

    Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems

    Authors: Ruiwen Zhou, Maojia Song, Xiaobao Wu, Sitao Cheng, Xunjian Yin, Yuxi Xie, Zhuoqun Hao, Wenyue Hua, Liangming Pan, Soujanya Poria, Min-Yen Kan

    Abstract: Individual agents in multi-agent (MA) systems often lack robustness, tending to blindly conform to misleading peers. We show this weakness stems from both sycophancy and inadequate ability to evaluate peer reliability. To address this, we first formalize the learning problem of history-aware reference, introducing the historical interactions of peers as additional input, so that agents can estimat… ▽ More

    Submitted 29 January, 2026; originally announced January 2026.

    Comments: Codes and data are available at https://github.com/skyriver-2000/epistemic-context-learning

  5. arXiv:2601.05563  [pdf, ps, other

    cs.CV cs.SI

    What's Left Unsaid? Detecting and Correcting Misleading Omissions in Multimodal News Previews

    Authors: Fanxiao Li, Jiaying Wu, Tingchao Fu, Dayang Li, Herun Wan, Wei Zhou, Min-Yen Kan

    Abstract: Even when factually correct, social-media news previews (image-headline pairs) can induce interpretation drift: by selectively omitting crucial context, they lead readers to form judgments that diverge from what the full article conveys. This covert harm is harder to detect than explicit misinformation yet remains underexplored. To address this gap, we develop a multi-stage pipeline that disentang… ▽ More

    Submitted 9 January, 2026; originally announced January 2026.

  6. arXiv:2601.05478  [pdf, ps, other

    cs.CL

    The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence

    Authors: Herun Wan, Jiaying Wu, Minnan Luo, Fanxiao Li, Zhi Zeng, Min-Yen Kan

    Abstract: To reliably assist human decision-making, LLMs must maintain factual internal beliefs against misleading injections. While current models resist explicit misinformation, we uncover a fundamental vulnerability to sophisticated, hard-to-falsify evidence. To systematically probe this weakness, we introduce MisBelief, a framework that generates misleading evidence via collaborative, multi-round intera… ▽ More

    Submitted 8 January, 2026; originally announced January 2026.

  7. arXiv:2512.06885  [pdf, ps, other

    cs.CV cs.AI

    JoPano: Unified Panorama Generation via Joint Modeling

    Authors: Wancheng Feng, Chen An, Zhenliang He, Meina Kan, Shiguang Shan, Lukun Wang

    Abstract: Panorama generation has recently attracted growing interest in the research community, with two core tasks, text-to-panorama and view-to-panorama generation. However, existing methods still face two major challenges: their U-Net-based architectures constrain the visual quality of the generated panoramas, and they usually treat the two core tasks independently, which leads to modeling redundancy an… ▽ More

    Submitted 7 December, 2025; originally announced December 2025.

    Comments: Code: https://github.com/VIPL-GENUN/JoPano

  8. arXiv:2510.24667  [pdf, ps, other

    cs.CV

    SAGE: Structure-Aware Generative Video Transitions between Diverse Clips

    Authors: Mia Kan, Yilin Liu, Niloy Mitra

    Abstract: Video transitions aim to synthesize intermediate frames between two clips, but naive approaches such as linear blending introduce artifacts that limit professional use or break temporal coherence. Traditional techniques (cross-fades, morphing, frame interpolation) and recent generative inbetweening methods can produce high-quality plausible intermediates, but they struggle with bridging diverse cl… ▽ More

    Submitted 6 March, 2026; v1 submitted 28 October, 2025; originally announced October 2025.

    Comments: Project Website: https://kan32501.github.io/sage.github.io/

  9. arXiv:2510.11423  [pdf, ps, other

    cs.SI cs.CL

    Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation

    Authors: Jiaying Wu, Zihang Fu, Haonan Wang, Fanxiao Li, Jiafeng Guo, Preslav Nakov, Min-Yen Kan

    Abstract: Community Notes, the crowd-sourced misinformation governance system on X (formerly Twitter), allows users to flag misleading posts, attach contextual notes, and rate the notes' helpfulness. However, our empirical analysis of 30.8K health-related notes reveals substantial latency, with a median delay of 17.6 hours before notes receive a helpfulness status. To improve responsiveness during real-worl… ▽ More

    Submitted 7 January, 2026; v1 submitted 13 October, 2025; originally announced October 2025.

  10. arXiv:2510.11210  [pdf, ps, other

    cs.CL cs.LG

    Discursive Circuits: How Do Language Models Understand Discourse Relations?

    Authors: Yisong Miao, Min-Yen Kan

    Abstract: Which components in transformer language models are responsible for discourse understanding? We hypothesize that sparse computational graphs, termed as discursive circuits, control how models process discourse relations. Unlike simpler tasks, discourse relations involve longer spans and complex reasoning. To make circuit discovery feasible, we introduce a task called Completion under Discourse Rel… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

    Comments: Accepted to EMNLP 2025 (Main Conference); 9 pages, 8 figures, 5 tables (20 pages, 12 figures, 14 tables including references and appendices)

  11. Forecasting the Buzz: Enriching Hashtag Popularity Prediction with LLM Reasoning

    Authors: Yifei Xu, Jiaying Wu, Herun Wan, Yang Li, Zhen Hou, Min-Yen Kan

    Abstract: Hashtag trends ignite campaigns, shift public opinion, and steer millions of dollars in advertising spend, yet forecasting which tag goes viral is elusive. Classical regressors digest surface features but ignore context, while large language models (LLMs) excel at contextual reasoning but misestimate numbers. We present BuzzProphet, a reasoning-augmented hashtag popularity prediction framework tha… ▽ More

    Submitted 9 October, 2025; originally announced October 2025.

    Comments: Accepted to CIKM 2025

  12. arXiv:2510.06640  [pdf, ps, other

    cs.CL cs.LG

    A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures

    Authors: Nhat M. Hoang, Do Xuan Long, Cong-Duy Nguyen, Min-Yen Kan, Luu Anh Tuan

    Abstract: State Space Models (SSMs) have recently emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing with linear scaling, yet how contextual information flows across layers in these architectures remains understudied. We present the first unified, token- and layer-wise analysis of representation propagation in SSMs and TBMs. Using centered kernel alignment, var… ▽ More

    Submitted 6 January, 2026; v1 submitted 8 October, 2025; originally announced October 2025.

  13. arXiv:2509.25851  [pdf, ps, other

    cs.CV

    MuSLR: Multimodal Symbolic Logical Reasoning

    Authors: Jundong Xu, Hao Fei, Yuhui Zhang, Liangming Pan, Qijun Huang, Qian Liu, Preslav Nakov, Min-Yen Kan, William Yang Wang, Mong-Li Lee, Wynne Hsu

    Abstract: Multimodal symbolic logical reasoning, which aims to deduce new facts from multimodal input via formal logic, is critical in high-stakes applications such as autonomous driving and medical diagnosis, as its rigorous, deterministic reasoning helps prevent serious consequences. To evaluate such capabilities of current state-of-the-art vision language models (VLMs), we introduce the first benchmark M… ▽ More

    Submitted 28 January, 2026; v1 submitted 30 September, 2025; originally announced September 2025.

    Comments: Accepted by NeurIPS 2025

  14. arXiv:2509.17037  [pdf, ps, other

    cs.AI

    KAHAN: Knowledge-Augmented Hierarchical Analysis and Narration for Financial Data Narration

    Authors: Yajing Yang, Tony Deng, Min-Yen Kan

    Abstract: We propose KAHAN, a knowledge-augmented hierarchical framework that systematically extracts insights from raw tabular data at entity, pairwise, group, and system levels. KAHAN uniquely leverages LLMs as domain experts to drive the analysis. On DataTales financial reporting benchmark, KAHAN outperforms existing approaches by over 20% on narrative quality (GPT-4o), maintains 98.2% factuality, and de… ▽ More

    Submitted 21 September, 2025; originally announced September 2025.

    Comments: Accepted at EMNLP 2025 Findings

  15. Improving Conversational Recommendation with Contextual Adaptation of External Recommenders and LLM-based Reranking

    Authors: Chuang Li, Weida Liang, Hengchang Hu, See-Kiong Ng, Min-Yen Kan, Haizhou Li, Yang Deng

    Abstract: We tackle the challenge of integrating large language models (LLMs) with external recommender systems to enhance domain expertise in conversational recommendation (CRS). Current LLM-based CRS approaches primarily rely on zero/few-shot methods for generating item recommendations based on user queries, but this method faces two significant challenges: (1) without domain-specific adaptation, LLMs fre… ▽ More

    Submitted 30 March, 2026; v1 submitted 19 August, 2025; originally announced August 2025.

    Comments: Accepted to ECIR 2026 (13 pages, 9 figures)

    Journal ref: Advances in Information Retrieval. ECIR 2026. Lecture Notes in Computer Science, vol 16484

  16. arXiv:2507.12838  [pdf, ps, other

    cs.CL

    Are Knowledge and Reference in Multilingual Language Models Cross-Lingually Consistent?

    Authors: Xi Ai, Mahardika Krisna Ihsani, Min-Yen Kan

    Abstract: Cross-lingual consistency should be considered to assess cross-lingual transferability, maintain the factuality of the model knowledge across languages, and preserve the parity of language model performance. We are thus interested in analyzing, evaluating, and interpreting cross-lingual consistency for factual knowledge. To facilitate our study, we examine multiple pretrained models and tuned mode… ▽ More

    Submitted 30 September, 2025; v1 submitted 17 July, 2025; originally announced July 2025.

    Comments: EMNLP'25 Findings Camera Ready

  17. Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs

    Authors: Xiao Xu, Libo Qin, Wanxiang Che, Min-Yen Kan

    Abstract: Two-Tower Vision--Language Models (VLMs) have demonstrated strong performance across various downstream VL tasks. While BridgeTower further enhances performance by building bridges between encoders, it \textit{(i)} suffers from ineffective layer-by-layer utilization of unimodal representations, \textit{(ii)} restricts the flexible exploitation of different levels of unimodal semantic knowledge, an… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). June 2025. DOI: https://doi.org/10.1109/TCSVT.2025.3578266

  18. arXiv:2506.06950  [pdf, ps, other

    cs.CL

    What Makes a Good Natural Language Prompt?

    Authors: Do Xuan Long, Duy Dinh, Ngoc-Hai Nguyen, Kenji Kawaguchi, Nancy F. Chen, Shafiq Joty, Min-Yen Kan

    Abstract: As large language models (LLMs) have progressed towards more human-like and human--AI communications have become prevalent, prompting has emerged as a decisive component. However, there is limited conceptual consensus on what exactly quantifies natural language prompts. We attempt to address this question by conducting a meta-analysis surveying more than 150 prompting-related papers from leading N… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Main Conference

  19. arXiv:2506.01265  [pdf, ps, other

    cs.CL

    Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines

    Authors: Do Xuan Long, Duong Ngoc Yen, Do Xuan Trong, Luu Anh Tuan, Kenji Kawaguchi, Shafiq Joty, Min-Yen Kan, Nancy F. Chen

    Abstract: In-context learning (ICL) is an important yet not fully understood ability of pre-trained large language models (LLMs). It can greatly enhance task performance using a few examples, termed demonstrations, without fine-tuning. Although effective in question answering, ICL often underperforms in long-form generation tasks such as summarization. Under appropriately realistic assumptions, we empirical… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

  20. arXiv:2505.19084  [pdf, other

    cs.CV cs.AI cs.LG

    Jodi: Unification of Visual Generation and Understanding via Joint Modeling

    Authors: Yifeng Xu, Zhenliang He, Meina Kan, Shiguang Shan, Xilin Chen

    Abstract: Visual generation and understanding are two deeply interconnected aspects of human intelligence, yet they have been traditionally treated as separate tasks in machine learning. In this paper, we propose Jodi, a diffusion framework that unifies visual generation and understanding by jointly modeling the image domain and multiple label domains. Specifically, Jodi is built upon a linear diffusion tra… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Code: https://github.com/VIPL-GENUN/Jodi

  21. arXiv:2505.17659  [pdf, ps, other

    cs.RO cs.CV

    Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

    Authors: Xiaolong Tang, Meina Kan, Shiguang Shan, Xilin Chen

    Abstract: Safe and feasible trajectory planning is critical for real-world autonomous driving systems. However, existing learning-based planners rely heavily on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting undesirable behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a two-stag… ▽ More

    Submitted 26 September, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  22. arXiv:2505.15489  [pdf, ps, other

    cs.CV cs.CL cs.MM

    Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models

    Authors: Jiaying Wu, Fanxiao Li, Zihang Fu, Min-Yen Kan, Bryan Hooi

    Abstract: The impact of multimodal misinformation arises not only from factual inaccuracies but also from the misleading narratives that creators deliberately embed. Interpreting such creator intent is therefore essential for multimodal misinformation detection (MMD) and effective information governance. To this end, we introduce DeceptionDecoded, a large-scale benchmark of 12,000 image-caption pairs ground… ▽ More

    Submitted 13 April, 2026; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: ICLR 2026

  23. arXiv:2504.18838  [pdf, other

    cs.CL

    Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

    Authors: Yixin Cao, Shibo Hong, Xinze Li, Jiahao Ying, Yubo Ma, Haiyuan Liang, Yantao Liu, Zijun Yao, Xiaozhi Wang, Dan Huang, Wenxuan Zhang, Lifu Huang, Muhao Chen, Lei Hou, Qianru Sun, Xingjun Ma, Zuxuan Wu, Min-Yen Kan, David Lo, Qi Zhang, Heng Ji, Jing Jiang, Juanzi Li, Aixin Sun, Xuanjing Huang , et al. (2 additional authors not shown)

    Abstract: Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  24. arXiv:2503.15450  [pdf, ps, other

    cs.CL

    SkyLadder: Better and Faster Pretraining via Context Window Scheduling

    Authors: Tongyao Zhu, Qian Liu, Haonan Wang, Shiqi Chen, Xiangming Gu, Tianyu Pang, Min-Yen Kan

    Abstract: Recent advancements in LLM pretraining have featured ever-expanding context windows to process longer sequences. However, our pilot study reveals that models pretrained with shorter context windows consistently outperform their long-context counterparts under a fixed token budget. This finding motivates us to explore an optimal context window scheduling strategy to better balance long-context capa… ▽ More

    Submitted 2 December, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Accepted to NeurIPS 2025. 10 pages

  25. arXiv:2502.14297  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Evaluating Sakana's AI Scientist: Bold Claims, Mixed Results, and a Promising Future?

    Authors: Joeran Beel, Min-Yen Kan, Moritz Baumgart

    Abstract: A major step toward Artificial General Intelligence (AGI) and Super Intelligence is AI's ability to autonomously conduct research - what we term Artificial Research Intelligence (ARI). If machines could generate hypotheses, conduct experiments, and write research papers without human intervention, it would transform science. Sakana recently introduced the 'AI Scientist', claiming to conduct resear… ▽ More

    Submitted 15 October, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 20 pages

    Journal ref: SIGIR Forum 2025

  26. arXiv:2412.05939  [pdf, other

    cs.CV cs.CL cs.LG

    Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

    Authors: Xiao Xu, Tianhao Niu, Yuxi Xie, Libo Qin, Wanxiang Che, Min-Yen Kan

    Abstract: Multimodal Large Language Models (MLLMs) excel in vision--language tasks by pre-training solely on coarse-grained concept annotations (e.g., image captions). We hypothesize that integrating fine-grained concept annotations (e.g., object labels and object regions) will further improve performance, as both data granularities complement each other in terms of breadth and depth in concept representati… ▽ More

    Submitted 8 December, 2024; originally announced December 2024.

    Comments: A manuscript that should have been Arxived in May :)

  27. arXiv:2411.05345  [pdf, other

    cs.CL cs.AI

    Reasoning Robustness of LLMs to Adversarial Typographical Errors

    Authors: Esther Gan, Yiran Zhao, Liying Cheng, Yancan Mao, Anirudh Goyal, Kenji Kawaguchi, Min-Yen Kan, Michael Shieh

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning using Chain-of-Thought (CoT) prompting. However, CoT can be biased by users' instruction. In this work, we study the reasoning robustness of LLMs to typographical errors, which can naturally occur in users' queries. We design an Adversarial Typo Attack ($\texttt{ATA}$) algorithm that iteratively samples typos for w… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

  28. arXiv:2411.02712  [pdf, other

    cs.CV cs.AI

    V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization

    Authors: Yuxi Xie, Guanzhen Li, Xiao Xu, Min-Yen Kan

    Abstract: Large vision-language models (LVLMs) suffer from hallucination, resulting in misalignment between the output textual response and the input visual content. Recent research indicates that the over-reliance on the Large Language Model (LLM) backbone, as one cause of the LVLM hallucination, inherently introduces bias from language priors, leading to insufficient context attention to the visual inputs… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    Comments: EMNLP 2024 Findings; 9 pages, 6 figures, 5 tables (16 pages, 8 figures, 8 tables including references and appendices)

  29. arXiv:2411.00492  [pdf, other

    cs.CL

    Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models

    Authors: Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen

    Abstract: We present Multi-expert Prompting, a novel enhancement of ExpertPrompting (Xu et al., 2023), designed to improve the large language model (LLM) generation. Specifically, it guides an LLM to fulfill an input instruction by simulating multiple experts, aggregating their responses, and selecting the best among individual and aggregated responses. This process is performed in a single chain of thought… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: EMNLP 2024 Main Conference

  30. arXiv:2410.17859  [pdf, ps, other

    cs.AI

    DataTales: A Benchmark for Real-World Intelligent Data Narration

    Authors: Yajing Yang, Qian Liu, Min-Yen Kan

    Abstract: We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding… ▽ More

    Submitted 23 August, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted at EMNLP 2024 (main conference, long paper)

  31. arXiv:2410.12601  [pdf, ps, other

    cs.CL

    CCSBench: Evaluating Compositional Controllability in LLMs for Scientific Document Summarization

    Authors: Yixi Ding, Jiaying Wu, Tongyao Zhu, Yanxia Qin, Qian Liu, Min-Yen Kan

    Abstract: To broaden the dissemination of scientific knowledge to diverse audiences, it is desirable for scientific document summarization systems to simultaneously control multiple attributes such as length and empirical focus. However, existing research typically focuses on controlling single attributes, leaving the compositional control of multiple attributes underexplored. To address this gap, we introd… ▽ More

    Submitted 4 August, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

    Comments: Accepted to KDD 2025 SciSoc LLM Workshop: Large Language Models for Scientific and Societal Advances

  32. arXiv:2410.09675  [pdf, other

    cs.CL

    COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement

    Authors: Yuxi Xie, Anirudh Goyal, Xiaobao Wu, Xunjian Yin, Xiao Xu, Min-Yen Kan, Liangming Pan, William Yang Wang

    Abstract: Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges,… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

    Comments: 12 pages, 7 figures, 3 tables (23 pages, 9 figures, 4 tables including references and appendices)

  33. arXiv:2410.04345  [pdf, other

    cs.CV cs.AI

    MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans?

    Authors: Guanzhen Li, Yuxi Xie, Min-Yen Kan

    Abstract: Humans perform visual perception at multiple levels, including low-level object recognition and high-level semantic interpretation such as behavior understanding. Subtle differences in low-level details can lead to substantial changes in high-level perception. For example, substituting the shopping bag held by a person with a gun suggests violent behavior, implying criminal or violent activity. De… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  34. arXiv:2409.11724  [pdf, ps, other

    cs.CL

    TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning

    Authors: Xinyuan Lu, Liangming Pan, Yubo Ma, Preslav Nakov, Min-Yen Kan

    Abstract: Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning, which is crucial for tasks such as table question answering (TQA) and table-based fact verification (TFV). To address these challenges, we introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools. TART contains… ▽ More

    Submitted 10 July, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: NAACL 2025 (Findings)

  35. arXiv:2408.08656  [pdf, other

    cs.CL

    LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

    Authors: Do Xuan Long, Hai Nguyen Ngoc, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan

    Abstract: We present the first systematic evaluation examining format bias in performance of large language models (LLMs). Our approach distinguishes between two categories of an evaluation metric under format constraints to reliably and accurately assess performance: one measures performance when format constraints are adhered to, while the other evaluates performance regardless of constraint adherence. We… ▽ More

    Submitted 22 February, 2025; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: NAACL 2025 Main Conference

  36. arXiv:2406.10130  [pdf, other

    cs.CL

    The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

    Authors: Yan Liu, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan, Tsung-Yi Ho

    Abstract: Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases, which may cause negative social impacts or even bring catastrophic results in application. Previous works on this problem mainly focused on using black-box methods such as probing to detect and quantify social biases in PLMs by observing model outputs. As a result, previous debiasing me… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  37. arXiv:2405.15329  [pdf, other

    cs.CL

    DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation

    Authors: Minzhi Li, Zhengyuan Liu, Shumin Deng, Shafiq Joty, Nancy F. Chen, Min-Yen Kan

    Abstract: The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as a crucial research question. Prior research efforts in the meta-evaluation of LLMs as judges limit the prompting of an LLM to a single use to obtain a final ev… ▽ More

    Submitted 8 December, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: COLING2025

  38. arXiv:2405.01868  [pdf, other

    cs.CL

    Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

    Authors: Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li

    Abstract: This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work,… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Main paper 8 pages; References and Appendix 9 pages; 7 figures and 14 tables

  39. arXiv:2405.00451  [pdf, other

    cs.AI cs.LG

    Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

    Authors: Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh

    Abstract: We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 4 tables (24 pages, 9 figures, 9 tables including references and appendices)

  40. arXiv:2404.13246  [pdf, other

    cs.CL

    ISQA: Informative Factuality Feedback for Scientific Summarization

    Authors: Zekai Li, Yanxia Qin, Qian Liu, Min-Yen Kan

    Abstract: We propose Iterative Facuality Refining on Informative Scientific Question-Answering (ISQA) feedback\footnote{Code is available at \url{https://github.com/lizekai-richard/isqa}}, a method following human learning theories that employs model-generated feedback consisting of both positive and negative information. Through iterative refining of summaries, it probes for the underlying rationale of sta… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 18 pages, 4 figures

  41. arXiv:2404.06351  [pdf, other

    cs.CV

    HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

    Authors: Xiaolong Tang, Meina Kan, Shiguang Shan, Zhilong Ji, Jinfeng Bai, Xilin Chen

    Abstract: Predicting the trajectories of road agents is essential for autonomous driving systems. The recent mainstream methods follow a static paradigm, which predicts the future trajectory by using a fixed duration of historical frames. These methods make the predictions independently even at adjacent time steps, which leads to potential instability and temporal inconsistency. As successive time steps hav… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  42. arXiv:2403.08206  [pdf, other

    cs.IR

    Discrete Semantic Tokenization for Deep CTR Prediction

    Authors: Qijiong Liu, Hengchang Hu, Jiahao Wu, Jieming Zhu, Min-Yen Kan, Xiao-Ming Wu

    Abstract: Incorporating item content information into click-through rate (CTR) prediction models remains a challenge, especially with the time and space constraints of industrial scenarios. The content-encoding paradigm, which integrates user and item encoders directly into CTR models, prioritizes space over time. In contrast, the embedding-based paradigm transforms item and user semantics into latent embed… ▽ More

    Submitted 21 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: TheWebConf 2024 accepted paper

  43. arXiv:2403.07805  [pdf, other

    cs.CL cs.AI

    Beyond Memorization: The Challenge of Random Memory Access in Language Models

    Authors: Tongyao Zhu, Qian Liu, Liang Pang, Zhengbao Jiang, Min-Yen Kan, Min Lin

    Abstract: Recent developments in Language Models (LMs) have shown their effectiveness in NLP tasks, particularly in knowledge-intensive tasks. However, the mechanisms underlying knowledge storage and memory access within their parameters remain elusive. In this paper, we investigate whether a generative LM (e.g., GPT-2) is able to access its memory sequentially or randomly. Through carefully-designed synthe… ▽ More

    Submitted 22 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures; accepted by ACL 2024 (oral)

  44. arXiv:2401.17092  [pdf, other

    cs.CL

    NNOSE: Nearest Neighbor Occupational Skill Extraction

    Authors: Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank

    Abstract: The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks -- combining and leveraging multiple datasets for skill extraction, to identify rare… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted at EACL 2024 Main

  45. arXiv:2401.07257  [pdf, other

    cs.IR

    Lightweight Modality Adaptation to Sequential Recommendation via Correlation Supervision

    Authors: Hengchang Hu, Qijiong Liu, Chuang Li, Min-Yen Kan

    Abstract: In Sequential Recommenders (SR), encoding and utilizing modalities in an end-to-end manner is costly in terms of modality encoder sizes. Two-stage approaches can mitigate such concerns, but they suffer from poor performance due to modality forgetting, where the sequential objective overshadows modality representation. We propose a lightweight knowledge distillation solution that preserves both mer… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Accepted by ECIR 2024

  46. arXiv:2311.08385  [pdf, other

    cs.CL

    Aligning Large Language Models with Human Opinions through Persona Selection and Value--Belief--Norm Reasoning

    Authors: Do Xuan Long, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen

    Abstract: Reasoning and predicting human opinions with large language models (LLMs) is essential yet challenging. Current methods employ role-playing with personae but face two major issues: LLMs are sensitive to even a single irrelevant persona, skewing predictions by up to 30%, and LLMs fail to reason strategically over personae. We propose Chain-of-Opinion (COO), a simple four-step solution modeling whic… ▽ More

    Submitted 14 December, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: COLING 2025

  47. CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation

    Authors: Minzhi Li, Taiwei Shi, Caleb Ziems, Min-Yen Kan, Nancy F. Chen, Zhengyuan Liu, Diyi Yang

    Abstract: Annotated data plays a critical role in Natural Language Processing (NLP) in training models and evaluating their performance. Given recent developments in Large Language Models (LLMs), models such as ChatGPT demonstrate zero-shot capability on many text-annotation tasks, comparable with or even exceeding human annotators. Such LLMs can serve as alternatives for manual annotation, due to lower cos… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  48. arXiv:2310.10492  [pdf, other

    cs.CL

    UNO-DST: Leveraging Unlabelled Data in Zero-Shot Dialogue State Tracking

    Authors: Chuang Li, Yan Zhang, Min-Yen Kan, Haizhou Li

    Abstract: Previous zero-shot dialogue state tracking (DST) methods only apply transfer learning, ignoring unlabelled data in the target domain. We transform zero-shot DST into few-shot DST by utilising such unlabelled data via joint and self-training methods. Our method incorporates auxiliary tasks that generate slot types as inverse prompts for main tasks, creating slot values during joint training. Cycle… ▽ More

    Submitted 3 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of NAACL 2024

  49. arXiv:2310.07609  [pdf, other

    cs.CL

    QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-Checking

    Authors: Liangming Pan, Xinyuan Lu, Min-Yen Kan, Preslav Nakov

    Abstract: Fact-checking real-world claims often requires complex, multi-step reasoning due to the absence of direct evidence to support or refute them. However, existing fact-checking systems often lack transparency in their decision-making, making it challenging for users to comprehend their reasoning process. To address this, we propose the Question-guided Multi-hop Fact-Checking (QACHECK) system, which g… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 System Demonstrations Track

  50. Automatic Feature Fairness in Recommendation via Adversaries

    Authors: Hengchang Hu, Yiming Cao, Zhankui He, Samson Tan, Min-Yen Kan

    Abstract: Fairness is a widely discussed topic in recommender systems, but its practical implementation faces challenges in defining sensitive features while maintaining recommendation accuracy. We propose feature fairness as the foundation to achieve equitable treatment across diverse groups defined by various feature combinations. This improves overall accuracy through balanced feature generalizability. W… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: SIGIR-AP'23