-
Towards Generalizable Deepfake Detection via Real Distribution Bias Correction
Authors:
Ming-Hui Liu,
Harry Cheng,
Xin Luo,
Xin-Shun Xu,
Mohan S. Kankanhalli
Abstract:
To generalize deepfake detectors to future unseen forgeries, most existing methods attempt to simulate the dynamically evolving forgery types using available source domain data. However, predicting an unbounded set of future manipulations from limited prior examples is infeasible. To overcome this limitation, we propose to exploit the invariance of \textbf{real data} from two complementary perspec…
▽ More
To generalize deepfake detectors to future unseen forgeries, most existing methods attempt to simulate the dynamically evolving forgery types using available source domain data. However, predicting an unbounded set of future manipulations from limited prior examples is infeasible. To overcome this limitation, we propose to exploit the invariance of \textbf{real data} from two complementary perspectives: the fixed population distribution of the entire real class and the inherent Gaussianity of individual real images. Building on these properties, we introduce the Real Distribution Bias Correction (RDBC) framework, which consists of two key components: the Real Population Distribution Estimation module and the Distribution-Sampled Feature Whitening module. The former utilizes the independent and identically distributed (\iid) property of real samples to derive the normal distribution form of their statistics, from which the distribution parameters can be estimated using limited source domain data. Based on the learned population distribution, the latter utilizes the inherent Gaussianity of real data as a discriminative prior and performs a sampling-based whitening operation to amplify the Gaussianity gap between real and fake samples. Through synergistic coupling of the two modules, our model captures the real-world properties of real samples, thereby enhancing its generalizability to unseen target domains. Extensive experiments demonstrate that RDBC achieves state-of-the-art performance in both in-domain and cross-domain deepfake detection.
△ Less
Submitted 14 March, 2026;
originally announced March 2026.
-
Word-Anchored Temporal Forgery Localization
Authors:
Tianyi Wang,
Xi Shao,
Harry Cheng,
Yinglong Wang,
Mohan Kankanhalli
Abstract:
Current temporal forgery localization (TFL) approaches typically rely on temporal boundary regression or continuous frame-level anomaly detection paradigms to derive candidate forgery proposals. However, they suffer not only from feature granularity misalignment but also from costly computation. To address these issues, we propose word-anchored temporal forgery localization (WAFL), a novel paradig…
▽ More
Current temporal forgery localization (TFL) approaches typically rely on temporal boundary regression or continuous frame-level anomaly detection paradigms to derive candidate forgery proposals. However, they suffer not only from feature granularity misalignment but also from costly computation. To address these issues, we propose word-anchored temporal forgery localization (WAFL), a novel paradigm that shifts the TFL task from temporal regression and continuous localization to discrete word-level binary classification. Specifically, we first analyze the essence of temporal forgeries and identify the minimum meaningful forgery units, word tokens, and then align data preprocessing with the natural linguistic boundaries of speech. To adapt powerful pre-trained foundation backbones for feature extraction, we introduce the forensic feature realignment (FFR) module, mapping representations from the pre-trained semantic space to a discriminative forensic manifold. This allows subsequent lightweight linear classifiers to efficiently perform binary classification and accomplish the TFL task. Furthermore, to overcome the extreme class imbalance inherent to forgery detection, we design the artifact-centric asymmetric (ACA) loss, which breaks the standard precision-recall trade-off by dynamically suppressing overwhelming authentic gradients while asymmetrically prioritizing subtle forensic artifacts. Extensive experiments demonstrate that WAFL significantly outperforms state-of-the-art approaches in localization performance under both in- and cross-dataset settings, while requiring substantially fewer learnable parameters and operating at high computational efficiency.
△ Less
Submitted 6 March, 2026;
originally announced March 2026.
-
Buy versus Build an LLM: A Decision Framework for Governments
Authors:
Jiahao Lu,
Ziwei Xu,
William Tjhi,
Junnan Li,
Antoine Bosselut,
Pang Wei Koh,
Mohan Kankanhalli
Abstract:
Large Language Models (LLMs) represent a new frontier of digital infrastructure that can support a wide range of public-sector applications, from general purpose citizen services to specialized and sensitive state functions. When expanding AI access, governments face a set of strategic choices over whether to buy existing services, build domestic capabilities, or adopt hybrid approaches across dif…
▽ More
Large Language Models (LLMs) represent a new frontier of digital infrastructure that can support a wide range of public-sector applications, from general purpose citizen services to specialized and sensitive state functions. When expanding AI access, governments face a set of strategic choices over whether to buy existing services, build domestic capabilities, or adopt hybrid approaches across different domains and use cases. These are critical decisions especially when leading model providers are often foreign corporations, and LLM outputs are increasingly treated as trusted inputs to public decision-making and public discourse. In practice, these decisions are not intended to mandate a single approach across all domains; instead, national AI strategies are typically pluralistic, with sovereign, commercial and open-source models coexisting to serve different purposes. Governments may rely on commercial models for non-sensitive or commodity tasks, while pursuing greater control for critical, high-risk or strategically important applications.
This paper provides a strategic framework for making this decision by evaluating these options across dimensions including sovereignty, safety, cost, resource capability, cultural fit, and sustainability. Importantly, "building" does not imply that governments must act alone: domestic capabilities may be developed through public research institutions, universities, state-owned enterprises, joint ventures, or broader national ecosystems. By detailing the technical requirements and practical challenges of each pathway, this work aims to serve as a reference for policy-makers to determine whether a buy or build approach best aligns with their specific national needs and societal goals.
△ Less
Submitted 23 February, 2026; v1 submitted 13 February, 2026;
originally announced February 2026.
-
Do Prompts Guarantee Safety? Mitigating Toxicity from LLM Generations through Subspace Intervention
Authors:
Himanshu Singh,
Ziwei Xu,
A. V. Subramanyam,
Mohan Kankanhalli
Abstract:
Large Language Models (LLMs) are powerful text generators, yet they can produce toxic or harmful content even when given seemingly harmless prompts. This presents a serious safety challenge and can cause real-world harm. Toxicity is often subtle and context-dependent, making it difficult to detect at the token level or through coarse sentence-level signals. Moreover, efforts to mitigate toxicity o…
▽ More
Large Language Models (LLMs) are powerful text generators, yet they can produce toxic or harmful content even when given seemingly harmless prompts. This presents a serious safety challenge and can cause real-world harm. Toxicity is often subtle and context-dependent, making it difficult to detect at the token level or through coarse sentence-level signals. Moreover, efforts to mitigate toxicity often face a trade-off between safety and the coherence, or fluency of the generated text. In this work, we present a targeted subspace intervention strategy for identifying and suppressing hidden toxic patterns from underlying model representations, while preserving overall ability to generate safe fluent content. On the RealToxicityPrompts, our method achieves strong mitigation performance compared to existing baselines, with minimal impact on inference complexity. Across multiple LLMs, our approach reduces toxicity of state-of-the-art detoxification systems by 8-20%, while maintaining comparable fluency. Through extensive quantitative and qualitative analyses, we show that our approach achieves effective toxicity reduction without impairing generative performance, consistently outperforming existing baselines.
△ Less
Submitted 6 February, 2026;
originally announced February 2026.
-
LLMs Can Unlearn Refusal with Only 1,000 Benign Samples
Authors:
Yangyang Guo,
Ziwei Xu,
Si Liu,
Zhiming Zheng,
Mohan Kankanhalli
Abstract:
This study reveals a previously unexplored vulnerability in the safety alignment of Large Language Models (LLMs). Existing aligned LLMs predominantly respond to unsafe queries with refusals, which often begin with a fixed set of prefixes (I'm sorry). We demonstrate that this rigid refusal pattern is a vulnerability and introduce a novel \textbf{refusal unlearning} technique that exploits it. Speci…
▽ More
This study reveals a previously unexplored vulnerability in the safety alignment of Large Language Models (LLMs). Existing aligned LLMs predominantly respond to unsafe queries with refusals, which often begin with a fixed set of prefixes (I'm sorry). We demonstrate that this rigid refusal pattern is a vulnerability and introduce a novel \textbf{refusal unlearning} technique that exploits it. Specifically, we fine-tune LLMs using merely 1,000 benign samples, where each response is prepended with a refusal prefix. The underlying intuition is to disrupt the refusal completion pathway, thereby driving the model to forget how to refuse while following harmful instructions. This intuition is further supported by theoretical proofs. We apply this approach to a total of 16 LLMs, including various open-source models from Llama, Qwen, and Gemma families, as well as closed-source models such as Gemini and GPT. Experimental results show that the safety scores of previously aligned LLMs degrade both consistently and substantially. Importantly, we verify that the observed gain cannot be attributed to plain fine-tuning or random prefix effects. Our findings suggest that current safety alignment may rely heavily on token sequence memorization rather than reasoning, motivating future work beyond simple refusal mechanisms. Code has been released: https://github.com/guoyang9/refusal-unlearning.
△ Less
Submitted 27 January, 2026;
originally announced January 2026.
-
Aggregating Diverse Cue Experts for AI-Generated Image Detection
Authors:
Lei Tan,
Shuwei Li,
Mohan Kankanhalli,
Robby T. Tan
Abstract:
The rapid emergence of image synthesis models poses challenges to the generalization of AI-generated image detectors. However, existing methods often rely on model-specific features, leading to overfitting and poor generalization. In this paper, we introduce the Multi-Cue Aggregation Network (MCAN), a novel framework that integrates different yet complementary cues in a unified network. MCAN emplo…
▽ More
The rapid emergence of image synthesis models poses challenges to the generalization of AI-generated image detectors. However, existing methods often rely on model-specific features, leading to overfitting and poor generalization. In this paper, we introduce the Multi-Cue Aggregation Network (MCAN), a novel framework that integrates different yet complementary cues in a unified network. MCAN employs a mixture-of-encoders adapter to dynamically process these cues, enabling more adaptive and robust feature representation. Our cues include the input image itself, which represents the overall content, and high-frequency components that emphasize edge details. Additionally, we introduce a Chromatic Inconsistency (CI) cue, which normalizes intensity values and captures noise information introduced during the image acquisition process in real images, making these noise patterns more distinguishable from those in AI-generated content. Unlike prior methods, MCAN's novelty lies in its unified multi-cue aggregation framework, which integrates spatial, frequency-domain, and chromaticity-based information for enhanced representation learning. These cues are intrinsically more indicative of real images, enhancing cross-model generalization. Extensive experiments on the GenImage, Chameleon, and UniversalFakeDetect benchmark validate the state-of-the-art performance of MCAN. In the GenImage dataset, MCAN outperforms the best state-of-the-art method by up to 7.4% in average ACC across eight different image generators.
△ Less
Submitted 13 January, 2026;
originally announced January 2026.
-
Multimodal Fact-Checking: An Agent-based Approach
Authors:
Danni Xu,
Shaojing Fan,
Harry Cheng,
Mohan Kankanhalli
Abstract:
The rapid spread of multimodal misinformation poses a growing challenge for automated fact-checking systems. Existing approaches, including large vision language models (LVLMs) and deep multimodal fusion methods, often fall short due to limited reasoning and shallow evidence utilization. A key bottleneck is the lack of dedicated datasets that provide complete real-world multimodal misinformation i…
▽ More
The rapid spread of multimodal misinformation poses a growing challenge for automated fact-checking systems. Existing approaches, including large vision language models (LVLMs) and deep multimodal fusion methods, often fall short due to limited reasoning and shallow evidence utilization. A key bottleneck is the lack of dedicated datasets that provide complete real-world multimodal misinformation instances accompanied by annotated reasoning processes and verifiable evidence. To address this limitation, we introduce RW-Post, a high-quality and explainable dataset for real-world multimodal fact-checking. RW-Post aligns real-world multimodal claims with their original social media posts, preserving the rich contextual information in which the claims are made. In addition, the dataset includes detailed reasoning and explicitly linked evidence, which are derived from human written fact-checking articles via a large language model assisted extraction pipeline, enabling comprehensive verification and explanation. Building upon RW-Post, we propose AgentFact, an agent-based multimodal fact-checking framework designed to emulate the human verification workflow. AgentFact consists of five specialized agents that collaboratively handle key fact-checking subtasks, including strategy planning, high-quality evidence retrieval, visual analysis, reasoning, and explanation generation. These agents are orchestrated through an iterative workflow that alternates between evidence searching and task-aware evidence filtering and reasoning, facilitating strategic decision-making and systematic evidence analysis. Extensive experimental results demonstrate that the synergy between RW-Post and AgentFact substantially improves both the accuracy and interpretability of multimodal fact-checking.
△ Less
Submitted 4 January, 2026; v1 submitted 28 December, 2025;
originally announced December 2025.
-
Object-Centric Framework for Video Moment Retrieval
Authors:
Zongyao Li,
Yongkang Wong,
Satoshi Yamazaki,
Jianquan Liu,
Mohan Kankanhalli
Abstract:
Most existing video moment retrieval methods rely on temporal sequences of frame- or clip-level features that primarily encode global visual and semantic information. However, such representations often fail to capture fine-grained object semantics and appearance, which are crucial for localizing moments described by object-oriented queries involving specific entities and their interactions. In pa…
▽ More
Most existing video moment retrieval methods rely on temporal sequences of frame- or clip-level features that primarily encode global visual and semantic information. However, such representations often fail to capture fine-grained object semantics and appearance, which are crucial for localizing moments described by object-oriented queries involving specific entities and their interactions. In particular, temporal dynamics at the object level have been largely overlooked, limiting the effectiveness of existing approaches in scenarios requiring detailed object-level reasoning. To address this limitation, we propose a novel object-centric framework for moment retrieval. Our method first extracts query-relevant objects using a scene graph parser and then generates scene graphs from video frames to represent these objects and their relationships. Based on the scene graphs, we construct object-level feature sequences that encode rich visual and semantic information. These sequences are processed by a relational tracklet transformer, which models spatio-temporal correlations among objects over time. By explicitly capturing object-level state changes, our framework enables more accurate localization of moments aligned with object-oriented queries. We evaluated our method on three benchmarks: Charades-STA, QVHighlights, and TACoS. Experimental results demonstrate that our method outperforms existing state-of-the-art methods across all benchmarks.
△ Less
Submitted 20 December, 2025;
originally announced December 2025.
-
MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer
Authors:
Zenghao Chai,
Chen Tang,
Yongkang Wong,
Xulei Yang,
Mohan Kankanhalli
Abstract:
3D pose transfer aims to transfer the pose-style of a source mesh to a target character while preserving both the target's geometry and the source's pose characteristic. Existing methods are largely restricted to characters with similar structures and fail to generalize to category-free settings (e.g., transferring a humanoid's pose to a quadruped). The key challenge lies in the structural and tra…
▽ More
3D pose transfer aims to transfer the pose-style of a source mesh to a target character while preserving both the target's geometry and the source's pose characteristic. Existing methods are largely restricted to characters with similar structures and fail to generalize to category-free settings (e.g., transferring a humanoid's pose to a quadruped). The key challenge lies in the structural and transformation diversity inherent in distinct character types, which often leads to mismatched regions and poor transfer quality. To address these issues, we first construct a million-scale pose dataset across hundreds of distinct characters. We further propose MimiCAT, a cascade-transformer model designed for category-free 3D pose transfer. Instead of relying on strict one-to-one correspondence mappings, MimiCAT leverages semantic keypoint labels to learn a novel soft correspondence that enables flexible many-to-many matching across characters. The pose transfer is then formulated as a conditional generation process, in which the source transformations are first projected onto the target through soft correspondence matching and subsequently refined using shape-conditioned representations. Extensive qualitative and quantitative experiments demonstrate that MimiCAT generalizes plausible poses across diverse character morphologies, surpassing prior approaches restricted to narrow-category transfer (e.g., humanoid-to-humanoid).
△ Less
Submitted 25 March, 2026; v1 submitted 23 November, 2025;
originally announced November 2025.
-
Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
Authors:
Hehe Fan,
Yi Yang,
Mohan Kankanhalli,
Fei Wu
Abstract:
When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (e.g., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in self-attention, and 2) encoding these tokens effectively. Self-attention can adaptively identify these elements but relies on absolute positional embedding for…
▽ More
When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (e.g., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in self-attention, and 2) encoding these tokens effectively. Self-attention can adaptively identify these elements but relies on absolute positional embedding for structural representation learning. In contrast, convolution encodes elements in a relative manner, yet their fixed kernel size limits their ability to adaptively select the relevant elements. In this paper, we introduce Translution, an operation that unifies the adaptive identification capability of self-attention and the relative encoding advantage of convolution. However, this integration leads to a substantial increase in the number of parameters, exceeding most currently available computational resources. Therefore, we propose a lightweight variant of Translution, named α-Translution. Experiments on computer vision and natural language processing tasks show that Translution (including α-Translution) achieves superior accuracy compared to self-attention. The code is available at https://github.com/hehefan/Translution.
△ Less
Submitted 11 October, 2025;
originally announced October 2025.
-
A New Dataset and Benchmark for Grounding Multimodal Misinformation
Authors:
Bingjian Yang,
Danni Xu,
Kaipeng Niu,
Wenxuan Liu,
Zheng Wang,
Mohan Kankanhalli
Abstract:
The proliferation of online misinformation videos poses serious societal risks. Current datasets and detection methods primarily target binary classification or single-modality localization based on post-processed data, lacking the interpretability needed to counter persuasive misinformation. In this paper, we introduce the task of Grounding Multimodal Misinformation (GroundMM), which verifies mul…
▽ More
The proliferation of online misinformation videos poses serious societal risks. Current datasets and detection methods primarily target binary classification or single-modality localization based on post-processed data, lacking the interpretability needed to counter persuasive misinformation. In this paper, we introduce the task of Grounding Multimodal Misinformation (GroundMM), which verifies multimodal content and localizes misleading segments across modalities. We present the first real-world dataset for this task, GroundLie360, featuring a taxonomy of misinformation types, fine-grained annotations across text, speech, and visuals, and validation with Snopes evidence and annotator reasoning. We also propose a VLM-based, QA-driven baseline, FakeMark, using single- and cross-modal cues for effective detection and grounding. Our experiments highlight the challenges of this task and lay a foundation for explainable multimodal misinformation detection.
△ Less
Submitted 8 September, 2025;
originally announced September 2025.
-
Nearest Neighbor Projection Removal Adversarial Training
Authors:
Himanshu Singh,
A. V. Subramanyam,
Shivank Rajput,
Mohan Kankanhalli
Abstract:
Deep neural networks have exhibited impressive performance in image classification tasks but remain vulnerable to adversarial examples. Standard adversarial training enhances robustness but typically fails to explicitly address inter-class feature overlap, a significant contributor to adversarial susceptibility. In this work, we introduce a novel adversarial training framework that actively mitiga…
▽ More
Deep neural networks have exhibited impressive performance in image classification tasks but remain vulnerable to adversarial examples. Standard adversarial training enhances robustness but typically fails to explicitly address inter-class feature overlap, a significant contributor to adversarial susceptibility. In this work, we introduce a novel adversarial training framework that actively mitigates inter-class proximity by projecting out inter-class dependencies from adversarial and clean samples in the feature space. Specifically, our approach first identifies the nearest inter-class neighbors for each adversarial sample and subsequently removes projections onto these neighbors to enforce stronger feature separability. Theoretically, we demonstrate that our proposed logits correction reduces the Lipschitz constant of neural networks, thereby lowering the Rademacher complexity, which directly contributes to improved generalization and robustness. Extensive experiments across standard benchmarks including CIFAR-10, CIFAR-100, SVHN, and TinyImagenet show that our method demonstrates strong performance that is competitive with leading adversarial training techniques, highlighting significant achievements in both robust and clean accuracy. Our findings reveal the importance of addressing inter-class feature proximity explicitly to bolster adversarial robustness in DNNs. The code is available in the supplementary material.
△ Less
Submitted 8 April, 2026; v1 submitted 9 September, 2025;
originally announced September 2025.
-
Involuntary Jailbreak: On Self-Prompting Attacks
Authors:
Yangyang Guo,
Yangyan Li,
Mohan Kankanhalli
Abstract:
In this study, we disclose a worrying new vulnerability in Large Language Models (LLMs), which we term \textbf{involuntary jailbreak}. Unlike existing jailbreak attacks, this weakness is distinct in that it does not involve a specific attack objective, such as generating instructions for \textit{building a bomb}. Prior attack methods predominantly target localized components of the LLM guardrail.…
▽ More
In this study, we disclose a worrying new vulnerability in Large Language Models (LLMs), which we term \textbf{involuntary jailbreak}. Unlike existing jailbreak attacks, this weakness is distinct in that it does not involve a specific attack objective, such as generating instructions for \textit{building a bomb}. Prior attack methods predominantly target localized components of the LLM guardrail. In contrast, involuntary jailbreaks may potentially compromise the entire guardrail structure, which our method reveals to be surprisingly fragile. We merely employ a single universal prompt to achieve this goal. In particular, we instruct LLMs to generate several questions that would typically be rejected, along with their corresponding in-depth responses (rather than a refusal). Remarkably, this simple prompt strategy consistently jailbreaks the majority of leading LLMs, including Claude Opus 4.1, Grok 4, Gemini 2.5 Pro, and GPT 4.1. We hope this problem can motivate researchers and practitioners to re-evaluate the robustness of LLM guardrails and contribute to stronger safety alignment in future.
△ Less
Submitted 27 December, 2025; v1 submitted 18 August, 2025;
originally announced August 2025.
-
Modeling Human Responses to Multimodal AI Content
Authors:
Zhiqi Shen,
Shaojing Fan,
Danni Xu,
Terence Sim,
Mohan Kankanhalli
Abstract:
As AI-generated content becomes widespread, so does the risk of misinformation. While prior research has primarily focused on identifying whether content is authentic, much less is known about how such content influences human perception and behavior. In domains like trading or the stock market, predicting how people react (e.g., whether a news post will go viral), can be more critical than verify…
▽ More
As AI-generated content becomes widespread, so does the risk of misinformation. While prior research has primarily focused on identifying whether content is authentic, much less is known about how such content influences human perception and behavior. In domains like trading or the stock market, predicting how people react (e.g., whether a news post will go viral), can be more critical than verifying its factual accuracy. To address this, we take a human-centered approach and introduce the MhAIM Dataset, which contains 154,552 online posts (111,153 of them AI-generated), enabling large-scale analysis of how people respond to AI-generated content. Our human study reveals that people are better at identifying AI content when posts include both text and visuals, particularly when inconsistencies exist between the two. We propose three new metrics: trustworthiness, impact, and openness, to quantify how users judge and engage with online content. We present T-Lens, an LLM-based agent system designed to answer user queries by incorporating predicted human responses to multimodal information. At its core is HR-MCP (Human Response Model Context Protocol), built on the standardized Model Context Protocol (MCP), enabling seamless integration with any LLM. This integration allows T-Lens to better align with human reactions, enhancing both interpretability and interaction capabilities. Our work provides empirical insights and practical tools to equip LLMs with human-awareness capabilities. By highlighting the complex interplay among AI, human cognition, and information reception, our findings suggest actionable strategies for mitigating the risks of AI-driven misinformation.
△ Less
Submitted 14 August, 2025;
originally announced August 2025.
-
Fair Deepfake Detectors Can Generalize
Authors:
Harry Cheng,
Ming-Hui Liu,
Yangyang Guo,
Tianyi Wang,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generali…
▽ More
Deepfake detection models face two critical challenges: generalization to unseen manipulations and demographic fairness among population groups. However, existing approaches often demonstrate that these two objectives are inherently conflicting, revealing a trade-off between them. In this paper, we, for the first time, uncover and formally define a causal relationship between fairness and generalization. Building on the back-door adjustment, we show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions. Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals. Across three cross-domain benchmarks, DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art detectors, validating both its theoretical foundation and practical effectiveness.
△ Less
Submitted 3 July, 2025;
originally announced July 2025.
-
The Singapore Consensus on Global AI Safety Research Priorities
Authors:
Yoshua Bengio,
Tegan Maharaj,
Luke Ong,
Stuart Russell,
Dawn Song,
Max Tegmark,
Lan Xue,
Ya-Qin Zhang,
Stephen Casper,
Wan Sie Lee,
Sören Mindermann,
Vanessa Wilfred,
Vidhisha Balachandran,
Fazl Barez,
Michael Belinsky,
Imane Bello,
Malo Bourgon,
Mark Brakel,
Siméon Campos,
Duncan Cass-Beggs,
Jiahao Chen,
Rumman Chowdhury,
Kuan Chua Seah,
Jeff Clune,
Juntao Dai
, et al. (63 additional authors not shown)
Abstract:
Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash.
The "2025 Singapore Conference on…
▽ More
Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash.
The "2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. This resulting report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control).
△ Less
Submitted 30 June, 2025; v1 submitted 25 June, 2025;
originally announced June 2025.
-
Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models
Authors:
Yash Sinha,
Manit Baser,
Murari Mandal,
Dinil Mon Divakaran,
Mohan Kankanhalli
Abstract:
Knowledge erasure in large language models (LLMs) is important for ensuring compliance with data and AI regulations, safeguarding user privacy, mitigating bias, and misinformation. Existing unlearning methods aim to make the process of knowledge erasure more efficient and effective by removing specific knowledge while preserving overall model performance, especially for retained information. Howev…
▽ More
Knowledge erasure in large language models (LLMs) is important for ensuring compliance with data and AI regulations, safeguarding user privacy, mitigating bias, and misinformation. Existing unlearning methods aim to make the process of knowledge erasure more efficient and effective by removing specific knowledge while preserving overall model performance, especially for retained information. However, it has been observed that the unlearning techniques tend to suppress and leave the knowledge beneath the surface, thus making it retrievable with the right prompts. In this work, we demonstrate that \textit{step-by-step reasoning} can serve as a backdoor to recover this hidden information. We introduce a step-by-step reasoning-based black-box attack, Sleek, that systematically exposes unlearning failures. We employ a structured attack framework with three core components: (1) an adversarial prompt generation strategy leveraging step-by-step reasoning built from LLM-generated queries, (2) an attack mechanism that successfully recalls erased content, and exposes unfair suppression of knowledge intended for retention and (3) a categorization of prompts as direct, indirect, and implied, to identify which query types most effectively exploit unlearning weaknesses. Through extensive evaluations on four state-of-the-art unlearning techniques and two widely used LLMs, we show that existing approaches fail to ensure reliable knowledge removal. Of the generated adversarial prompts, 62.5% successfully retrieved forgotten Harry Potter facts from WHP-unlearned Llama, while 50% exposed unfair suppression of retained knowledge. Our work highlights the persistent risks of information leakage, emphasizing the need for more robust unlearning strategies for erasure.
△ Less
Submitted 14 June, 2025;
originally announced June 2025.
-
Nine Ways to Break Copyright Law and Why Our LLM Won't: A Fair Use Aligned Generation Framework
Authors:
Aakash Sen Sharma,
Debdeep Sanyal,
Priyansh Srivastava,
Sundar Atreya H.,
Shirish Karande,
Mohan Kankanhalli,
Murari Mandal
Abstract:
Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated…
▽ More
Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated closely with intellectual property experts to develop FUA-LLM (Fair Use Aligned Language Models), a legally-grounded framework explicitly designed to align LLM outputs with fair-use doctrine. Central to our method is FairUseDB, a carefully constructed dataset containing 18,000 expert-validated examples covering nine realistic infringement scenarios. Leveraging this dataset, we apply Direct Preference Optimization (DPO) to fine-tune open-source LLMs, encouraging them to produce legally compliant and practically useful alternatives rather than resorting to blunt refusal. Recognizing the shortcomings of traditional evaluation metrics, we propose new measures: Weighted Penalty Utility and Compliance Aware Harmonic Mean (CAH) to balance infringement risk against response utility. Extensive quantitative experiments coupled with expert evaluations confirm that FUA-LLM substantially reduces problematic outputs (up to 20\%) compared to state-of-the-art approaches, while preserving real-world usability.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Reasoning LLMs are Wandering Solution Explorers
Authors:
Jiahao Lu,
Ziwei Xu,
Mohan Kankanhalli
Abstract:
Large Language Models (LLMs) have demonstrated impressive reasoning abilities through test-time computation (TTC) techniques such as chain-of-thought prompting and tree-based reasoning. However, we argue that current reasoning LLMs (RLLMs) lack the ability to systematically explore the solution space. This paper formalizes what constitutes systematic problem solving and identifies common failure m…
▽ More
Large Language Models (LLMs) have demonstrated impressive reasoning abilities through test-time computation (TTC) techniques such as chain-of-thought prompting and tree-based reasoning. However, we argue that current reasoning LLMs (RLLMs) lack the ability to systematically explore the solution space. This paper formalizes what constitutes systematic problem solving and identifies common failure modes that reveal reasoning LLMs to be wanderers rather than systematic explorers. Through qualitative and quantitative analysis across multiple state-of-the-art LLMs, we uncover persistent issues: invalid reasoning steps, redundant explorations, hallucinated or unfaithful conclusions, and so on. Our findings suggest that current models' performance can appear to be competent on simple tasks yet degrade sharply as complexity increases. Based on the findings, we advocate for new metrics and tools that evaluate not just final outputs but the structure of the reasoning process itself.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
OrgAccess: A Benchmark for Role Based Access Control in Organization Scale LLMs
Authors:
Debdeep Sanyal,
Umakanta Maharana,
Yash Sinha,
Hong Ming Tan,
Shirish Karande,
Mohan Kankanhalli,
Murari Mandal
Abstract:
Role-based access control (RBAC) and hierarchical structures are foundational to how information flows and decisions are made within virtually all organizations. As the potential of Large Language Models (LLMs) to serve as unified knowledge repositories and intelligent assistants in enterprise settings becomes increasingly apparent, a critical, yet under explored, challenge emerges: \textit{can th…
▽ More
Role-based access control (RBAC) and hierarchical structures are foundational to how information flows and decisions are made within virtually all organizations. As the potential of Large Language Models (LLMs) to serve as unified knowledge repositories and intelligent assistants in enterprise settings becomes increasingly apparent, a critical, yet under explored, challenge emerges: \textit{can these models reliably understand and operate within the complex, often nuanced, constraints imposed by organizational hierarchies and associated permissions?} Evaluating this crucial capability is inherently difficult due to the proprietary and sensitive nature of real-world corporate data and access control policies. We introduce a synthetic yet representative \textbf{OrgAccess} benchmark consisting of 40 distinct types of permissions commonly relevant across different organizational roles and levels. We further create three types of permissions: 40,000 easy (1 permission), 10,000 medium (3-permissions tuple), and 20,000 hard (5-permissions tuple) to test LLMs' ability to accurately assess these permissions and generate responses that strictly adhere to the specified hierarchical rules, particularly in scenarios involving users with overlapping or conflicting permissions. Our findings reveal that even state-of-the-art LLMs struggle significantly to maintain compliance with role-based structures, even with explicit instructions, with their performance degrades further when navigating interactions involving two or more conflicting permissions. Specifically, even \textbf{GPT-4.1 only achieves an F1-Score of 0.27 on our hardest benchmark}. This demonstrates a critical limitation in LLMs' complex rule following and compositional reasoning capabilities beyond standard factual or STEM-based benchmarks, opening up a new paradigm for evaluating their fitness for practical, structured environments.
△ Less
Submitted 17 June, 2025; v1 submitted 25 May, 2025;
originally announced May 2025.
-
Bullying the Machine: How Personas Increase LLM Vulnerability
Authors:
Ziwei Xu,
Udit Sanghi,
Mohan Kankanhalli
Abstract:
Large Language Models (LLMs) are increasingly deployed in interactions where they are prompted to adopt personas. This paper investigates whether such persona conditioning affects model safety under bullying, an adversarial manipulation that applies psychological pressures in order to force the victim to comply to the attacker. We introduce a simulation framework in which an attacker LLM engages a…
▽ More
Large Language Models (LLMs) are increasingly deployed in interactions where they are prompted to adopt personas. This paper investigates whether such persona conditioning affects model safety under bullying, an adversarial manipulation that applies psychological pressures in order to force the victim to comply to the attacker. We introduce a simulation framework in which an attacker LLM engages a victim LLM using psychologically grounded bullying tactics, while the victim adopts personas aligned with the Big Five personality traits. Experiments using multiple open-source LLMs and a wide range of adversarial goals reveal that certain persona configurations -- such as weakened agreeableness or conscientiousness -- significantly increase victim's susceptibility to unsafe outputs. Bullying tactics involving emotional or sarcastic manipulation, such as gaslighting and ridicule, are particularly effective. These findings suggest that persona-driven interaction introduces a novel vector for safety risks in LLMs and highlight the need for persona-aware safety evaluation and alignment strategies.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Privacy Risks and Preservation Methods in Explainable Artificial Intelligence: A Scoping Review
Authors:
Sonal Allana,
Mohan Kankanhalli,
Rozita Dara
Abstract:
Explainable Artificial Intelligence (XAI) has emerged as a pillar of Trustworthy AI and aims to bring transparency in complex models that are opaque by nature. Despite the benefits of incorporating explanations in models, an urgent need is found in addressing the privacy concerns of providing this additional information to end users. In this article, we conduct a scoping review of existing literat…
▽ More
Explainable Artificial Intelligence (XAI) has emerged as a pillar of Trustworthy AI and aims to bring transparency in complex models that are opaque by nature. Despite the benefits of incorporating explanations in models, an urgent need is found in addressing the privacy concerns of providing this additional information to end users. In this article, we conduct a scoping review of existing literature to elicit details on the conflict between privacy and explainability. Using the standard methodology for scoping review, we extracted 57 articles from 1,943 studies published from January 2019 to December 2024. The review addresses 3 research questions to present readers with more understanding of the topic: (1) what are the privacy risks of releasing explanations in AI systems? (2) what current methods have researchers employed to achieve privacy preservation in XAI systems? (3) what constitutes a privacy preserving explanation? Based on the knowledge synthesized from the selected studies, we categorize the privacy risks and preservation methods in XAI and propose the characteristics of privacy preserving explanations to aid researchers and practitioners in understanding the requirements of XAI that is privacy compliant. Lastly, we identify the challenges in balancing privacy with other system desiderata and provide recommendations for achieving privacy preserving XAI. We expect that this review will shed light on the complex relationship of privacy and explainability, both being the fundamental principles of Trustworthy AI.
△ Less
Submitted 2 December, 2025; v1 submitted 5 May, 2025;
originally announced May 2025.
-
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
Authors:
Tianyi Wang,
Harry Cheng,
Ming-Hui Liu,
Mohan Kankanhalli
Abstract:
Proactive Deepfake detection via robust watermarks has seen interest ever since passive Deepfake detectors encountered challenges in identifying high-quality synthetic images. However, while demonstrating reasonable detection performance, they lack localization functionality and explainability in detection results. Additionally, the unstable robustness of watermarks can significantly affect the de…
▽ More
Proactive Deepfake detection via robust watermarks has seen interest ever since passive Deepfake detectors encountered challenges in identifying high-quality synthetic images. However, while demonstrating reasonable detection performance, they lack localization functionality and explainability in detection results. Additionally, the unstable robustness of watermarks can significantly affect the detection performance. In this study, we propose novel fractal watermarks for proactive Deepfake detection and localization, namely FractalForensics. Benefiting from the characteristics of fractals, we devise a parameter-driven watermark generation pipeline that derives fractal-based watermarks and performs one-way encryption of the selected parameters. Subsequently, we propose a semi-fragile watermarking framework for watermark embedding and recovery, trained to be robust against benign image processing operations and fragile when facing Deepfake manipulations in a black-box setting. Moreover, we introduce an entry-to-patch strategy that implicitly embeds the watermark matrix entries into image patches at corresponding positions, achieving localization of Deepfake manipulations. Extensive experiments demonstrate satisfactory robustness and fragility of our approach against common image processing operations and Deepfake manipulations, outperforming state-of-the-art semi-fragile watermarking algorithms and passive detectors for Deepfake detection. Furthermore, by highlighting the areas manipulated, our method provides explainability for the proactive Deepfake detection results.
△ Less
Submitted 3 November, 2025; v1 submitted 13 April, 2025;
originally announced April 2025.
-
GroMo: Plant Growth Modeling with Multiview Images
Authors:
Ruchi Bhatt,
Shreya Bansal,
Amanpreet Chander,
Rupinder Kaur,
Malya Singh,
Mohan Kankanhalli,
Abdulmotaleb El Saddik,
Mukesh Kumar Saini
Abstract:
Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish,…
▽ More
Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish, okra, wheat, and mustard. Each crop consists of multiple plants (p1, p2, ..., pn) captured over different days (d1, d2, ..., dm) and categorized into five levels (L1, L2, L3, L4, L5). Each plant is captured from 24 different angles with a 15-degree gap between images. Participants are required to perform both tasks for all four crops with these multiview images. We proposed a Multiview Vision Transformer (MVVT) model for the GroMo challenge and evaluated the crop-wise performance on GroMo25. MVVT reports an average MAE of 7.74 for age prediction and an MAE of 5.52 for leaf count. The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth. The GitHub repository is publicly available at https://github.com/mriglab/GroMo-Plant-Growth-Modeling-with-Multiview-Images.
△ Less
Submitted 6 June, 2025; v1 submitted 9 March, 2025;
originally announced March 2025.
-
Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM
Authors:
Yangyang Guo,
Ziwei Xu,
Xilie Xu,
YongKang Wong,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
This technical report introduces our top-ranked solution that employs two approaches, \ie suffix injection and projected gradient descent (PGD) , to address the TiFA workshop MLLM attack challenge. Specifically, we first append the text from an incorrectly labeled option (pseudo-labeled) to the original query as a suffix. Using this modified query, our second approach applies the PGD method to add…
▽ More
This technical report introduces our top-ranked solution that employs two approaches, \ie suffix injection and projected gradient descent (PGD) , to address the TiFA workshop MLLM attack challenge. Specifically, we first append the text from an incorrectly labeled option (pseudo-labeled) to the original query as a suffix. Using this modified query, our second approach applies the PGD method to add imperceptible perturbations to the image. Combining these two techniques enables successful attacks on the LLaVA 1.5 model.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
Authors:
Wey Yeh Choong,
Yangyang Guo,
Mohan Kankanhalli
Abstract:
Vision Large Language Models (VLLMs) are widely acknowledged to be prone to hallucinations. Existing research addressing this problem has primarily been confined to image inputs, with limited exploration of video-based hallucinations. Furthermore, current evaluation methods fail to capture nuanced errors in generated responses, which are often exacerbated by the rich spatiotemporal dynamics of vid…
▽ More
Vision Large Language Models (VLLMs) are widely acknowledged to be prone to hallucinations. Existing research addressing this problem has primarily been confined to image inputs, with limited exploration of video-based hallucinations. Furthermore, current evaluation methods fail to capture nuanced errors in generated responses, which are often exacerbated by the rich spatiotemporal dynamics of videos. To address this, we introduce VidHal, a benchmark specially designed to evaluate video-based hallucinations in VLLMs. VidHal is constructed by bootstrapping video instances across a wide range of common temporal aspects. A defining feature of our benchmark lies in the careful creation of captions which represent varying levels of hallucination associated with each video. To enable fine-grained evaluation, we propose a novel caption ordering task requiring VLLMs to rank captions by hallucinatory extent. We conduct extensive experiments on VidHal and comprehensively evaluate a broad selection of models. Our results uncover significant limitations in existing VLLMs regarding hallucination generation. Through our benchmark, we aim to inspire further research on 1) holistic understanding of VLLM capabilities, particularly regarding hallucination, and 2) extensive development of advanced VLLMs to alleviate this problem.
△ Less
Submitted 7 March, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Authors:
Ziyang Luo,
Haoning Wu,
Dongxu Li,
Jing Ma,
Mohan Kankanhalli,
Junnan Li
Abstract:
Large multimodal models (LMMs) with advanced video analysis capabilities have recently garnered significant attention. However, most evaluations rely on traditional methods like multiple-choice questions in benchmarks such as VideoMME and LongVideoBench, which are prone to lack the depth needed to capture the complex demands of real-world users. To address this limitation-and due to the prohibitiv…
▽ More
Large multimodal models (LMMs) with advanced video analysis capabilities have recently garnered significant attention. However, most evaluations rely on traditional methods like multiple-choice questions in benchmarks such as VideoMME and LongVideoBench, which are prone to lack the depth needed to capture the complex demands of real-world users. To address this limitation-and due to the prohibitive cost and slow pace of human annotation for video tasks-we introduce VideoAutoArena, an arena-style benchmark inspired by LMSYS Chatbot Arena's framework, designed to automatically assess LMMs' video analysis abilities. VideoAutoArena utilizes user simulation to generate open-ended, adaptive questions that rigorously assess model performance in video understanding. The benchmark features an automated, scalable evaluation framework, incorporating a modified ELO Rating System for fair and continuous comparisons across multiple LMMs. To validate our automated judging system, we construct a 'gold standard' using a carefully curated subset of human annotations, demonstrating that our arena strongly aligns with human judgment while maintaining scalability. Additionally, we introduce a fault-driven evolution strategy, progressively increasing question complexity to push models toward handling more challenging video analysis scenarios. Experimental results demonstrate that VideoAutoArena effectively differentiates among state-of-the-art LMMs, providing insights into model strengths and areas for improvement. To further streamline our evaluation, we introduce VideoAutoBench as an auxiliary benchmark, where human annotators label winners in a subset of VideoAutoArena battles. We use GPT-4o as a judge to compare responses against these human-validated answers. Together, VideoAutoArena and VideoAutoBench offer a cost-effective, and scalable framework for evaluating LMMs in user-centric video analysis.
△ Less
Submitted 23 March, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Joint Vision-Language Social Bias Removal for CLIP
Authors:
Haoyu Zhang,
Yangyang Guo,
Mohan Kankanhalli
Abstract:
Vision-Language (V-L) pre-trained models such as CLIP show prominent capabilities in various downstream tasks. Despite this promise, V-L models are notoriously limited by their inherent social biases. A typical demonstration is that V-L models often produce biased predictions against specific groups of people, significantly undermining their real-world applicability. Existing approaches endeavor t…
▽ More
Vision-Language (V-L) pre-trained models such as CLIP show prominent capabilities in various downstream tasks. Despite this promise, V-L models are notoriously limited by their inherent social biases. A typical demonstration is that V-L models often produce biased predictions against specific groups of people, significantly undermining their real-world applicability. Existing approaches endeavor to mitigate the social bias problem in V-L models by removing biased attribute information from model embeddings. However, after our revisiting of these methods, we find that their bias removal is frequently accompanied by greatly compromised V-L alignment capabilities. We then reveal that this performance degradation stems from the unbalanced debiasing in image and text embeddings. To address this issue, we propose a novel V-L debiasing framework to align image and text biases followed by removing them from both modalities. By doing so, our method achieves multi-modal bias mitigation while maintaining the V-L alignment in the debiased embeddings. Additionally, we advocate a new evaluation protocol that can 1) holistically quantify the model debiasing and V-L alignment ability, and 2) evaluate the generalization of social bias removal models. We believe this work will offer new insights and guidance for future studies addressing the social bias problem in CLIP.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Authors:
Yangyang Guo,
Mohan Kankanhalli
Abstract:
While contrastive pre-training is widely employed, its data efficiency problem has remained relatively under-explored thus far. Existing methods often rely on static coreset selection algorithms to pre-identify important data for training. However, this static nature renders them unable to dynamically track the data usefulness throughout pre-training, leading to subpar pre-trained models. To addre…
▽ More
While contrastive pre-training is widely employed, its data efficiency problem has remained relatively under-explored thus far. Existing methods often rely on static coreset selection algorithms to pre-identify important data for training. However, this static nature renders them unable to dynamically track the data usefulness throughout pre-training, leading to subpar pre-trained models. To address this challenge, our paper introduces a novel dynamic bootstrapping dataset pruning method. It involves pruning data preparation followed by dataset mutation operations, both of which undergo iterative and dynamic updates. We apply this method to two prevalent contrastive pre-training frameworks: \textbf{CLIP} and \textbf{MoCo}, representing vision-language and vision-centric domains, respectively. In particular, we individually pre-train seven CLIP models on two large-scale image-text pair datasets, and two MoCo models on the ImageNet dataset, resulting in a total of 16 pre-trained models. With a data pruning rate of 30-35\% across all 16 models, our method exhibits only marginal performance degradation (less than \textbf{1\%} on average) compared to corresponding models trained on the full dataset counterparts across various downstream datasets, and also surpasses several baselines with a large performance margin. Additionally, the byproduct from our method, \ie coresets derived from the original datasets after pre-training, also demonstrates significant superiority in terms of downstream performance over other static coreset selection approaches.
△ Less
Submitted 13 November, 2024;
originally announced November 2024.
-
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Authors:
Yangyang Guo,
Fangkai Jiao,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise. However, recent defense mechanisms against these attacks have reached near-saturation performance on benchmark evaluations, often with minimal effort. This \emph{dual high performance} in both attack and defense raises a fundamental and perplexing paradox. To gain a deep understanding of this iss…
▽ More
The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise. However, recent defense mechanisms against these attacks have reached near-saturation performance on benchmark evaluations, often with minimal effort. This \emph{dual high performance} in both attack and defense raises a fundamental and perplexing paradox. To gain a deep understanding of this issue and thus further help strengthen the trustworthiness of VLLMs, this paper makes three key contributions: i) One tentative explanation for VLLMs being prone to jailbreak attacks--\textbf{inclusion of vision inputs}, as well as its in-depth analysis. ii) The recognition of a largely ignored problem in existing defense mechanisms--\textbf{over-prudence}. The problem causes these defense methods to exhibit unintended abstention, even in the presence of benign inputs, thereby undermining their reliability in faithfully defending against attacks. iii) A simple safety-aware method--\textbf{LLM-Pipeline}. Our method repurposes the more advanced guardrails of LLMs on the shelf, serving as an effective alternative detector prior to VLLM response. Last but not least, we find that the two representative evaluation methods for jailbreak often exhibit chance agreement. This limitation makes it potentially misleading when evaluating attack strategies or defense mechanisms. We believe the findings from this paper offer useful insights to rethink the foundational development of VLLM safety with respect to benchmark datasets, defense strategies, and evaluation methods.
△ Less
Submitted 5 March, 2025; v1 submitted 13 November, 2024;
originally announced November 2024.
-
UnStar: Unlearning with Self-Taught Anti-Sample Reasoning for LLMs
Authors:
Yash Sinha,
Murari Mandal,
Mohan Kankanhalli
Abstract:
The key components of machine learning are data samples for training, model for learning patterns, and loss function for optimizing accuracy. Analogously, unlearning can potentially be achieved through anti-data samples (or anti-samples), unlearning method, and reversed loss function. While prior research has explored unlearning methods and reversed loss functions, the potential of anti-samples re…
▽ More
The key components of machine learning are data samples for training, model for learning patterns, and loss function for optimizing accuracy. Analogously, unlearning can potentially be achieved through anti-data samples (or anti-samples), unlearning method, and reversed loss function. While prior research has explored unlearning methods and reversed loss functions, the potential of anti-samples remains largely untapped. In this paper, we introduce UnSTAR: Unlearning with Self-Taught Anti-Sample Reasoning for large language models (LLMs). Our contributions are threefold; first, we propose a novel concept of anti-sample-induced unlearning; second, we generate anti-samples by leveraging misleading rationales, which help reverse learned associations and accelerate the unlearning process; and third, we enable fine-grained targeted unlearning, allowing for the selective removal of specific associations without impacting related knowledge - something not achievable by previous works. Results demonstrate that anti-samples offer an efficient, targeted unlearning strategy for LLMs, opening new avenues for privacy-preserving machine learning and model modification.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Strong Preferences Affect the Robustness of Preference Models and Value Alignment
Authors:
Ziwei Xu,
Mohan Kankanhalli
Abstract:
Value alignment, which aims to ensure that large language models (LLMs) and other AI agents behave in accordance with human values, is critical for ensuring safety and trustworthiness of these systems. A key component of value alignment is the modeling of human preferences as a representation of human values. In this paper, we investigate the robustness of value alignment by examining the sensitiv…
▽ More
Value alignment, which aims to ensure that large language models (LLMs) and other AI agents behave in accordance with human values, is critical for ensuring safety and trustworthiness of these systems. A key component of value alignment is the modeling of human preferences as a representation of human values. In this paper, we investigate the robustness of value alignment by examining the sensitivity of preference models. Specifically, we ask: how do changes in the probabilities of some preferences affect the predictions of these models for other preferences? To answer this question, we theoretically analyze the robustness of widely used preference models by examining their sensitivities to minor changes in preferences they model. Our findings reveal that, in the Bradley-Terry and the Placket-Luce model, the probability of a preference can change significantly as other preferences change, especially when these preferences are dominant (i.e., with probabilities near 0 or 1). We identify specific conditions where this sensitivity becomes significant for these models and discuss the practical implications for the robustness and safety of value alignment in AI systems.
△ Less
Submitted 7 March, 2025; v1 submitted 3 October, 2024;
originally announced October 2024.
-
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
Authors:
Zenghao Chai,
Chen Tang,
Yongkang Wong,
Mohan Kankanhalli
Abstract:
The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Di…
▽ More
The creation of 4D avatars (i.e., animated 3D avatars) from text description typically uses text-to-image (T2I) diffusion models to synthesize 3D avatars in the canonical space and subsequently applies animation with target motions. However, such an optimization-by-animation paradigm has several drawbacks. (1) For pose-agnostic optimization, the rendered images in canonical pose for naive Score Distillation Sampling (SDS) exhibit domain gap and cannot preserve view-consistency using only T2I priors, and (2) For post hoc animation, simply applying the source motions to target 3D avatars yields translation artifacts and misalignment. To address these issues, we propose Skeleton-aware Text-based 4D Avatar generation with in-network motion Retargeting (STAR). STAR considers the geometry and skeleton differences between the template mesh and target avatar, and corrects the mismatched source motion by resorting to the pretrained motion retargeting techniques. With the informatively retargeted and occlusion-aware skeleton, we embrace the skeleton-conditioned T2I and text-to-video (T2V) priors, and propose a hybrid SDS module to coherently provide multi-view and frame-consistent supervision signals. Hence, STAR can progressively optimize the geometry, texture, and motion in an end-to-end manner. The quantitative and qualitative experiments demonstrate our proposed STAR can synthesize high-quality 4D avatars with vivid animations that align well with the text description. Additional ablation studies shows the contributions of each component in STAR. The source code and demos are available at: \href{https://star-avatar.github.io}{https://star-avatar.github.io}.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Authors:
Zhenyang Li,
Yangyang Guo,
Kejie Wang,
Xiaolin Chen,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
Visual Commonsense Reasoning (VCR) calls for explanatory reasoning behind question answering over visual scenes. To achieve this goal, a model is required to provide an acceptable rationale as the reason for the predicted answers. Progress on the benchmark dataset stems largely from the recent advancement of Vision-Language Transformers (VL Transformers). These models are first pre-trained on some…
▽ More
Visual Commonsense Reasoning (VCR) calls for explanatory reasoning behind question answering over visual scenes. To achieve this goal, a model is required to provide an acceptable rationale as the reason for the predicted answers. Progress on the benchmark dataset stems largely from the recent advancement of Vision-Language Transformers (VL Transformers). These models are first pre-trained on some generic large-scale vision-text datasets, and then the learned representations are transferred to the downstream VCR task. Despite their attractive performance, this paper posits that the VL Transformers do not exhibit visual commonsense, which is the key to VCR. In particular, our empirical results pinpoint several shortcomings of existing VL Transformers: small gains from pre-training, unexpected language bias, limited model architecture for the two inseparable sub-tasks, and neglect of the important object-tag correlation. With these findings, we tentatively suggest some future directions from the aspect of dataset, evaluation metric, and training tricks. We believe this work could make researchers revisit the intuition and goals of VCR, and thus help tackle the remaining challenges in visual reasoning.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Multi-Modal Recommendation Unlearning for Legal, Licensing, and Modality Constraints
Authors:
Yash Sinha,
Murari Mandal,
Mohan Kankanhalli
Abstract:
User data spread across multiple modalities has popularized multi-modal recommender systems (MMRS). They recommend diverse content such as products, social media posts, TikTok reels, etc., based on a user-item interaction graph. With rising data privacy demands, recent methods propose unlearning private user data from uni-modal recommender systems (RS). However, methods for unlearning item data re…
▽ More
User data spread across multiple modalities has popularized multi-modal recommender systems (MMRS). They recommend diverse content such as products, social media posts, TikTok reels, etc., based on a user-item interaction graph. With rising data privacy demands, recent methods propose unlearning private user data from uni-modal recommender systems (RS). However, methods for unlearning item data related to outdated user preferences, revoked licenses, and legally requested removals are still largely unexplored.
Previous RS unlearning methods are unsuitable for MMRS due to the incompatibility of their matrix-based representation with the multi-modal user-item interaction graph. Moreover, their data partitioning step degrades performance on each shard due to poor data heterogeneity and requires costly performance aggregation across shards.
This paper introduces MMRecUn, the first approach known to us for unlearning in MMRS and unlearning item data. Given a trained RS model, MMRecUn employs a novel Reverse Bayesian Personalized Ranking (BPR) objective to enable the model to forget marked data. The reverse BPR attenuates the impact of user-item interactions within the forget set, while the forward BPR reinforces the significance of user-item interactions within the retain set. Our experiments demonstrate that MMRecUn outperforms baseline methods across various unlearning requests when evaluated on benchmark MMRS datasets. MMRecUn achieves recall performance improvements of up to 49.85% compared to baseline methods and is up to 1.3x faster than the Gold model, which is trained on retain set from scratch. MMRecUn offers significant advantages, including superiority in removing target interactions, preserving retained interactions, and zero overhead costs compared to previous methods.
Code: https://github.com/MachineUnlearn/MMRecUN
Extended version: arXiv:2405.15328
△ Less
Submitted 29 June, 2025; v1 submitted 24 May, 2024;
originally announced May 2024.
-
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
Authors:
Wei Li,
Hehe Fan,
Yongkang Wong,
Mohan Kankanhalli,
Yi Yang
Abstract:
Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises from the inherent complexity of videos and the inefficient language supervision in recent web-collected video-text datasets. In this paper, we introduc…
▽ More
Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises from the inherent complexity of videos and the inefficient language supervision in recent web-collected video-text datasets. In this paper, we introduce Text-Only Pre-Alignment (TOPA), a novel approach to extend large language models (LLMs) for video understanding, without the need for pre-training on real video data. Specifically, we first employ an advanced LLM to automatically generate Textual Videos comprising continuous textual frames, along with corresponding annotations to simulate real video-text data. Then, these annotated textual videos are used to pre-align a language-only LLM with the video modality. To bridge the gap between textual and real videos, we employ the CLIP model as the feature extractor to align image and text modalities. During text-only pre-alignment, the continuous textual frames, encoded as a sequence of CLIP text features, are analogous to continuous CLIP image features, thus aligning the LLM with real video representation. Extensive experiments, including zero-shot evaluation and finetuning on various video understanding tasks, demonstrate that TOPA is an effective and efficient framework for aligning video content with LLMs. In particular, without training on any video data, the TOPA-Llama2-13B model achieves a Top-1 accuracy of 51.0% on the challenging long-form video understanding benchmark, Egoschema. This performance surpasses previous video-text pre-training approaches and proves competitive with recent GPT-3.5-based video agents.
△ Less
Submitted 3 November, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
Authors:
Yi Cheng,
Ziwei Xu,
Dongyun Lin,
Harry Cheng,
Yongkang Wong,
Ying Sun,
Joo Hwee Lim,
Mohan Kankanhalli
Abstract:
For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leadi…
▽ More
For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources, including human insights, pre-trained models, logic rules, and world knowledge, which can be leveraged to address these challenges. Furthermore, we propose a novel visual generation framework that incorporates a knowledge-based feedback module to iteratively refine the generation process. This module gradually improves the alignment between the generated content and user intentions. We demonstrate the efficacy of the proposed framework through preliminary results, highlighting the potential of knowledge-enhanced generative models for intention-aligned content generation.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
DPTraj-PM: Differentially Private Trajectory Synthesis Using Prefix Tree and Markov Process
Authors:
Nana Wang,
Mohan Kankanhalli
Abstract:
The increasing use of GPS-enabled devices has generated a large amount of trajectory data. These data offer us vital insights to understand the movements of individuals and populations, benefiting a broad range of applications from transportation planning to epidemic modeling. However, improper release of trajectory data is increasing concerns on individual privacy. Previous attempts either lack s…
▽ More
The increasing use of GPS-enabled devices has generated a large amount of trajectory data. These data offer us vital insights to understand the movements of individuals and populations, benefiting a broad range of applications from transportation planning to epidemic modeling. However, improper release of trajectory data is increasing concerns on individual privacy. Previous attempts either lack strong privacy guarantees, or fail to preserve sufficient basic characteristics of the original data. In this paper, we propose DPTraj-PM, a method to synthesize trajectory dataset under the differential privacy (DP) framework while ensures high data utility. Based on the assumption that an individual's trajectory could be mainly determined by the initial trajectory segment (which depicts the starting point and the initial direction) and the next location point, DPTraj-PM discretizes the raw trajectories into neighboring cells, and models them by combining a prefix tree structure and an m-order Markov process. After adding noise to the model under differential privacy, DPTraj-PM generates a synthetic dataset from the noisy model to enable a wider spectrum of data mining and modeling tasks. The output traces crafted by DPTraj-PM not only preserves the patterns and variability in individuals' mobility behaviors, but also protects individual privacy. Experiments on two real-world datasets demonstrate that DPTraj-PM substantially outperforms the state-of-the-art techniques in terms of data utility. Our code is available at https://github.com/wnn5/DP-PrefixTreeMarkov.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
MCM: Multi-condition Motion Synthesis Framework
Authors:
Zeyu Ling,
Bo Han,
Yongkang Wongkan,
Han Lin,
Mohan Kankanhalli,
Weidong Geng
Abstract:
Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framewor…
▽ More
Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framework, termed MCM, based on a dual-branch structure composed of a main branch and a control branch. This framework effectively extends the applicability of the diffusion model, which is initially predicated solely on textual conditions, to auditory conditions. This extension encompasses both music-to-dance and co-speech HMS while preserving the intrinsic quality of motion and the capabilities for semantic association inherent in the original model. Furthermore, we propose the implementation of a Transformer-based diffusion model, designated as MWNet, as the main branch. This model adeptly apprehends the spatial intricacies and inter-joint correlations inherent in motion sequences, facilitated by the integration of multi-wise self-attention modules. Extensive experiments show that our method achieves competitive results in single-condition and multi-condition HMS tasks.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Cluster-based Graph Collaborative Filtering
Authors:
Fan Liu,
Shuai Zhao,
Zhiyong Cheng,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
Graph Convolution Networks (GCNs) have significantly succeeded in learning user and item representations for recommendation systems. The core of their efficacy is the ability to explicitly exploit the collaborative signals from both the first- and high-order neighboring nodes. However, most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convol…
▽ More
Graph Convolution Networks (GCNs) have significantly succeeded in learning user and item representations for recommendation systems. The core of their efficacy is the ability to explicitly exploit the collaborative signals from both the first- and high-order neighboring nodes. However, most existing GCN-based methods overlook the multiple interests of users while performing high-order graph convolution. Thus, the noisy information from unreliable neighbor nodes (e.g., users with dissimilar interests) negatively impacts the representation learning of the target node. Additionally, conducting graph convolution operations without differentiating high-order neighbors suffers the over-smoothing issue when stacking more layers, resulting in performance degradation. In this paper, we aim to capture more valuable information from high-order neighboring nodes while avoiding noise for better representation learning of the target node. To achieve this goal, we propose a novel GCN-based recommendation model, termed Cluster-based Graph Collaborative Filtering (ClusterGCF). This model performs high-order graph convolution on cluster-specific graphs, which are constructed by capturing the multiple interests of users and identifying the common interests among them. Specifically, we design an unsupervised and optimizable soft node clustering approach to classify user and item nodes into multiple clusters. Based on the soft node clustering results and the topology of the user-item interaction graph, we assign the nodes with probabilities for different clusters to construct the cluster-specific graphs. To evaluate the effectiveness of ClusterGCF, we conducted extensive experiments on four publicly available datasets. Experimental results demonstrate that our model can significantly improve recommendation performance.
△ Less
Submitted 8 November, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing
Authors:
Guangzhi Wang,
Tianyi Chen,
Kamran Ghasedi,
HsiangTao Wu,
Tianyu Ding,
Chris Nuesmeyer,
Ilya Zharkov,
Mohan Kankanhalli,
Luming Liang
Abstract:
Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introdu…
▽ More
Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introduce S3Editor, a Sparse Semantic-disentangled Self-training framework for face video editing. S3Editor is a generic solution that comprehensively addresses these challenges with three key contributions. Firstly, S3Editor adopts a self-training paradigm to enhance the training process through semi-supervision. Secondly, we propose a semantic disentangled architecture with a dynamic routing mechanism that accommodates diverse editing requirements. Thirdly, we present a structured sparse optimization schema that identifies and deactivates malicious neurons to further disentangle impacts from untarget attributes. S3Editor is model-agnostic and compatible with various editing approaches. Our extensive qualitative and quantitative results affirm that our approach significantly enhances identity preservation, editing fidelity, as well as temporal consistency.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
How to Understand Named Entities: Using Common Sense for News Captioning
Authors:
Ning Xu,
Yanhui Wang,
Tingting Zhang,
Hongshuo Tian,
Mohan Kankanhalli,
An-An Liu
Abstract:
News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By ``understand'', we mean correlating the news content with common sense in the wild, which helps an agent to 1) dist…
▽ More
News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By ``understand'', we mean correlating the news content with common sense in the wild, which helps an agent to 1) distinguish semantically similar named entities and 2) describe named entities using words outside of training corpora. Our approach consists of three modules: (a) Filter Module aims to clarify the common sense concerning a named entity from two aspects: what does it mean? and what is it related to?, which divide the common sense into explanatory knowledge and relevant knowledge, respectively. (b) Distinguish Module aggregates explanatory knowledge from node-degree, dependency, and distinguish three aspects to distinguish semantically similar named entities. (c) Enrich Module attaches relevant knowledge to named entities to enrich the entity description by commonsense information (e.g., identity and social position). Finally, the probability distributions from both modules are integrated to generate the news captions. Extensive experiments on two challenging datasets (i.e., GoodNews and NYTimes) demonstrate the superiority of our method. Ablation studies and visualization further validate its effectiveness in understanding named entities.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
EcoVal: An Efficient Data Valuation Framework for Machine Learning
Authors:
Ayush K Tarun,
Vikram S Chundawat,
Murari Mandal,
Hong Ming Tan,
Bowei Chen,
Mohan Kankanhalli
Abstract:
Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an…
▽ More
Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an efficient data valuation framework EcoVal, to estimate the value of data for machine learning models in a fast and practical manner. Instead of directly working with individual data sample, we determine the value of a cluster of similar data points. This value is further propagated amongst all the member cluster points. We show that the overall value of the data can be determined by estimating the intrinsic and extrinsic value of each data. This is enabled by formulating the performance of a model as a \textit{production function}, a concept which is popularly used to estimate the amount of output based on factors like labor and capital in a traditional free economic market. We provide a formal proof of our valuation technique and elucidate the principles and mechanisms that enable its accelerated performance. We demonstrate the real-world applicability of our method by showcasing its effectiveness for both in-distribution and out-of-sample data. This work addresses one of the core challenges of efficient data valuation at scale in machine learning models. The code is available at \underline{https://github.com/respai-lab/ecoval}.
△ Less
Submitted 9 July, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Diffusion Facial Forgery Detection
Authors:
Harry Cheng,
Yangyang Guo,
Tianyi Wang,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose a more severe social risk, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images.…
▽ More
Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose a more severe social risk, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. DiFF comprises over 500,000 images that are synthesized using thirteen distinct generation methods under four conditions. In particular, this dataset leverages 30,000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency. We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods. The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%, shedding light on the challenges in detecting diffusion-generated facial forgeries. Furthermore, we propose an edge graph regularization approach to effectively enhance the generalization capability of existing detectors.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Authors:
Ziwei Xu,
Sanjay Jain,
Mohan Kankanhalli
Abstract:
Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminat…
▽ More
Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all the computable functions and will therefore inevitably hallucinate if used as general problem solvers. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs. Furthermore, for real world LLMs constrained by provable time complexity, we describe the hallucination-prone tasks and empirically validate our claims. Finally, using the formal world framework, we discuss the possible mechanisms and efficacies of existing hallucination mitigators as well as the practical implications on the safe deployment of LLMs.
△ Less
Submitted 13 February, 2025; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models
Authors:
Fan Liu,
Yaqi Liu,
Huilin Chen,
Zhiyong Cheng,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations. Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents. However, the aspects and intents are inferred directly from user reviews or behavior patterns, suffering from the data noise and the data sparsity problem.…
▽ More
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations. Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents. However, the aspects and intents are inferred directly from user reviews or behavior patterns, suffering from the data noise and the data sparsity problem. Furthermore, it is difficult to understand the reasons behind recommendations due to the challenges of interpreting implicit aspects and intents. Inspired by the deep semantic understanding offered by large language models (LLMs), we introduce a chain-based prompting approach to uncover semantic aspect-aware interactions, which provide clearer insights into user behaviors at a fine-grained semantic level. To incorporate the abundant interactions of various aspects, we propose the simple yet effective Semantic Aspect-based Graph Convolution Network (short for SAGCN). By performing graph convolutions on multiple semantic aspect graphs, SAGCN efficiently combines embeddings across multiple semantic aspects for final user and item representations. The effectiveness of the SAGCN was evaluated on three publicly available datasets through extensive experiments, which revealed that it outperforms all other competitors. Furthermore, interpretability analysis experiments were conducted to demonstrate the interpretability of incorporating semantic aspects into the model.
△ Less
Submitted 16 November, 2024; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Attribute-driven Disentangled Representation Learning for Multimodal Recommendation
Authors:
Zhenyang Li,
Fan Liu,
Yinwei Wei,
Zhiyong Cheng,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
Recommendation algorithms forecast user preferences by correlating user and item representations derived from historical interaction patterns. In pursuit of enhanced performance, many methods focus on learning robust and independent representations by disentangling the intricate factors within interaction data across various modalities in an unsupervised manner. However, such an approach obfuscate…
▽ More
Recommendation algorithms forecast user preferences by correlating user and item representations derived from historical interaction patterns. In pursuit of enhanced performance, many methods focus on learning robust and independent representations by disentangling the intricate factors within interaction data across various modalities in an unsupervised manner. However, such an approach obfuscates the discernment of how specific factors (e.g., category or brand) influence the outcomes, making it challenging to regulate their effects. In response to this challenge, we introduce a novel method called Attribute-Driven Disentangled Representation Learning (short for AD-DRL), which explicitly incorporates attributes from different modalities into the disentangled representation learning process. By assigning a specific attribute to each factor in multimodal features, AD-DRL can disentangle the factors at both attribute and attribute-value levels. To obtain robust and independent representations for each factor associated with a specific attribute, we first disentangle the representations of features both within and across different modalities. Moreover, we further enhance the robustness of the representations by fusing the multimodal features of the same factor. Empirical evaluations conducted on three public real-world datasets substantiate the effectiveness of AD-DRL, as well as its interpretability and controllability.
△ Less
Submitted 31 July, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models
Authors:
Yu-Wei Zhan,
Fan Liu,
Xin Luo,
Xin-Shun Xu,
Liqiang Nie,
Mohan Kankanhalli
Abstract:
Human-Object Interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions. However, conventional HOI detection methods often struggle to fully capture the contextual information needed to accurately identify these interactions. While large Vision-Language Models (VLMs) show promise in tasks involving human interactions, they are not tailored for HOI detection.…
▽ More
Human-Object Interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions. However, conventional HOI detection methods often struggle to fully capture the contextual information needed to accurately identify these interactions. While large Vision-Language Models (VLMs) show promise in tasks involving human interactions, they are not tailored for HOI detection. The complexity of human behavior and the diverse contexts in which these interactions occur make it further challenging. Contextual cues, such as the participants involved, body language, and the surrounding environment, play crucial roles in predicting these interactions, especially those that are unseen or ambiguous. Moreover, large VLMs are trained on vast image and text data, enabling them to generate contextual cues that help in understanding real-world contexts, object relationships, and typical interactions. Building on this, in this paper we introduce ConCue, a novel approach for improving visual feature extraction in HOI detection. Specifically, we first design specialized prompts to utilize large VLMs to generate contextual cues within an image. To fully leverage these cues, we develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors. Extensive experiments and analyses demonstrate the effectiveness of using these contextual cues for HOI detection. The experimental results show that integrating ConCue with existing state-of-the-art methods significantly enhances their performance on two widely used datasets.
△ Less
Submitted 8 October, 2024; v1 submitted 26 November, 2023;
originally announced November 2023.
-
Finetuning Text-to-Image Diffusion Models for Fairness
Authors:
Xudong Shen,
Chao Du,
Tianyu Pang,
Min Lin,
Yongkang Wong,
Mohan Kankanhalli
Abstract:
The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a skewed worldview and restrict opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss…
▽ More
The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a skewed worldview and restrict opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) adjusted direct finetuning of diffusion model's sampling process (adjusted DFT), which leverages an adjusted gradient to directly optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We share code and various fair diffusion model adaptors at https://sail-sg.github.io/finetune-fair-diffusion/.
△ Less
Submitted 15 March, 2024; v1 submitted 11 November, 2023;
originally announced November 2023.
-
Image-Based Virtual Try-On: A Survey
Authors:
Dan Song,
Xuanpu Zhang,
Juan Zhou,
Weizhi Nie,
Ruofeng Tong,
Mohan Kankanhalli,
An-An Liu
Abstract:
Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, which revolutionizes online shopping and inspires related topics within image generation, showing both research significance and commercial potential. However, there is a gap between current research progress and commercial applications and an absence of comprehensive overview of this field to acc…
▽ More
Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, which revolutionizes online shopping and inspires related topics within image generation, showing both research significance and commercial potential. However, there is a gap between current research progress and commercial applications and an absence of comprehensive overview of this field to accelerate the development.In this survey, we provide a comprehensive analysis of the state-of-the-art techniques and methodologies in aspects of pipeline architecture, person representation and key modules such as try-on indication, clothing warping and try-on stage. We additionally apply CLIP to assess the semantic alignment of try-on results, and evaluate representative methods with uniformly implemented evaluation metrics on the same dataset.In addition to quantitative and qualitative evaluation of current open-source methods, unresolved issues are highlighted and future research directions are prospected to identify key trends and inspire further exploration. The uniformly implemented evaluation metrics, dataset and collected methods will be made public available at https://github.com/little-misfit/Survey-Of-Virtual-Try-On.
△ Less
Submitted 2 September, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.