Computer Science > Machine Learning

arXiv:2506.11402 (cs)

[Submitted on 13 Jun 2025 (v1), last revised 1 Oct 2025 (this version, v2)]

Title:LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model

Authors:Marcel Mateos Salles, Praney Goyal, Pradyut Sekhsaria, Hai Huang, Randall Balestriero

Abstract:Large Language Models (LLMs) are commonly finetuned for a variety of use cases and domains. A common approach is to leverage Low-Rank Adaptation (LoRA) -- known to provide strong performance at low resource costs. In this study, we demonstrate that LoRA actually opens the door to short-cut vulnerabilities -- and the more resource efficient is the LoRA setup, the more vulnerable will be the finetuned model to aggressive attacks. To measure that vulnerability, we introduce Seamless Spurious Token Injection (SSTI), where we find that LoRA exclusively focuses on even just a single token that is spuriously correlated with downstream labels. In short, injection of that spurious token during finetuning ensure that the model's prediction at test-time can be manipulated on-demand. We conducted experiments across model families and datasets to evaluate the impact of SSTI during LoRA finetuning while providing possible mitigations. Our experiments conclude that none of the existing checkers and preprocessors can sanitize a dataset raising new concerns for data quality and AI safety.

Comments:	46 pages, 17 figures, 26 tables. Submitted for publication. for associated blog post, see this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2506.11402 [cs.LG]
	(or arXiv:2506.11402v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2506.11402

Submission history

From: Marcel Mateos Salles [view email]
[v1] Fri, 13 Jun 2025 02:02:57 UTC (1,808 KB)
[v2] Wed, 1 Oct 2025 02:16:42 UTC (1,848 KB)

Computer Science > Machine Learning

Title:LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators