Computer Science > Machine Learning

arXiv:2510.05023 (cs)

[Submitted on 6 Oct 2025]

Title:Rethinking Langevin Thompson Sampling from A Stochastic Approximation Perspective

Authors:Weixin Wang, Haoyang Zheng, Guang Lin, Wei Deng, Pan Xu

Abstract:Most existing approximate Thompson Sampling (TS) algorithms for multi-armed bandits use Stochastic Gradient Langevin Dynamics (SGLD) or its variants in each round to sample from the posterior, relaxing the need for conjugacy assumptions between priors and reward distributions in vanilla TS. However, they often require approximating a different posterior distribution in different round of the bandit problem. This requires tricky, round-specific tuning of hyperparameters such as dynamic learning rates, causing challenges in both theoretical analysis and practical implementation. To alleviate this non-stationarity, we introduce TS-SA, which incorporates stochastic approximation (SA) within the TS framework. In each round, TS-SA constructs a posterior approximation only using the most recent reward(s), performs a Langevin Monte Carlo (LMC) update, and applies an SA step to average noisy proposals over time. This can be interpreted as approximating a stationary posterior target throughout the entire algorithm, which further yields a fixed step-size, a unified convergence analysis framework, and improved posterior estimates through temporal averaging. We establish near-optimal regret bounds for TS-SA, with a simplified and more intuitive theoretical analysis enabled by interpreting the entire algorithm as a simulation of a stationary SGLD process. Our empirical results demonstrate that even a single-step Langevin update with certain warm-up outperforms existing methods substantially on bandit tasks.

Comments:	39 pages, 3 figures, 2 tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2510.05023 [cs.LG]
	(or arXiv:2510.05023v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2510.05023

Submission history

From: Pan Xu [view email]
[v1] Mon, 6 Oct 2025 17:01:29 UTC (2,256 KB)

Computer Science > Machine Learning

Title:Rethinking Langevin Thompson Sampling from A Stochastic Approximation Perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Rethinking Langevin Thompson Sampling from A Stochastic Approximation Perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators