Computer Science > Machine Learning

arXiv:2509.15816 (cs)

[Submitted on 19 Sep 2025 (v1), last revised 31 Mar 2026 (this version, v4)]

Title:On the Convergence of Muon and Beyond

Authors:Da Chang, Yongxiang Liu, Ganzhao Yuan

Abstract:The Muon optimizer has demonstrated remarkable empirical success in handling matrix-structured parameters for training neural networks. However, a significant gap remains between its practical performance and theoretical understanding. Existing analyses show that the Muon variants achieve only a suboptimal iteration complexity of $\mathcal{O}(T^{-1/4})$ in stochastic non-convex settings, where $T$ denotes the number of iterations. To study the theoretical limits of Muon, we analyze two momentum-based variance-reduced variants: the one-batch Muon-MVR1 and the two-batch Muon-MVR2. We provide the first rigorous proof that, under horizon-free learning-rate schedules, variance reduction enables Muon-MVR2 to attain the optimal anytime convergence rate $\tilde{\mathcal{O}}(T^{-1/3})$, matching the lower bound for this problem class. Under the Polyak--Łojasiewicz (PL) condition, we further establish anytime best-iterate guarantees for the expected square-root suboptimality: Muon-MVR1 achieves $\widetilde{\mathcal{O}}(T^{-1/4})$, while Muon-MVR2 achieves $\widetilde{\mathcal{O}}(T^{-1/3})$. Experiments on CIFAR-10 and C4 support the practical effectiveness of the proposed variance-reduced Muon variants.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2509.15816 [cs.LG]
	(or arXiv:2509.15816v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2509.15816

Submission history

From: Da Chang [view email]
[v1] Fri, 19 Sep 2025 09:43:37 UTC (179 KB)
[v2] Mon, 22 Sep 2025 06:30:34 UTC (174 KB)
[v3] Sun, 9 Nov 2025 08:11:57 UTC (169 KB)
[v4] Tue, 31 Mar 2026 09:39:06 UTC (183 KB)

Computer Science > Machine Learning

Title:On the Convergence of Muon and Beyond

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Convergence of Muon and Beyond

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators