Computer Science > Machine Learning

arXiv:2411.13814 (cs)

[Submitted on 21 Nov 2024]

Title:AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning

Authors:Changhai Zhou, Shiyang Zhang, Yuhua Zhou, Zekai Liu, Shichao Weng

Abstract:Fine-tuning large language models (LLMs) under resource constraints is a significant challenge in deep learning. Low-Rank Adaptation (LoRA), pruning, and quantization are all effective methods for improving resource efficiency. However, combining them directly often results in suboptimal performance, especially with uniform quantization across all model layers. This is due to the complex, uneven interlayer relationships introduced by pruning, necessitating more refined quantization strategies. To address this, we propose AutoMixQ, an end-to-end optimization framework that selects optimal quantization configurations for each LLM layer. AutoMixQ leverages lightweight performance models to guide the selection process, significantly reducing time and computational resources compared to exhaustive search methods. By incorporating Pareto optimality, AutoMixQ balances memory usage and performance, approaching the upper bounds of model capability under strict resource constraints. Our experiments on widely used benchmarks show that AutoMixQ reduces memory consumption while achieving superior performance. For example, at a 30\% pruning rate in LLaMA-7B, AutoMixQ achieved 66.21\% on BoolQ compared to 62.45\% for LoRA and 58.96\% for LoftQ, while reducing memory consumption by 35.5\% compared to LoRA and 27.5\% compared to LoftQ.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.13814 [cs.LG]
	(or arXiv:2411.13814v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.13814

Submission history

From: Shiyang Zhang [view email]
[v1] Thu, 21 Nov 2024 03:35:07 UTC (1,500 KB)

Computer Science > Machine Learning

Title:AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:AutoMixQ: Self-Adjusting Quantization for High Performance Memory-Efficient Fine-Tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators