Computer Science > Machine Learning

arXiv:2510.23049 (cs)

[Submitted on 27 Oct 2025 (v1), last revised 14 Nov 2025 (this version, v2)]

Title:Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Authors:Christos Thrampoulidis, Sadegh Mahdavi, Wenlong Deng

Abstract:This note reconciles two seemingly distinct approaches to policy gradient optimization for the Pass@K objective in reinforcement learning with verifiable rewards: (1) direct REINFORCE-style methods, and (2) advantage-shaping techniques that directly modify GRPO. We show that these are two sides of the same coin. By reverse-engineering existing advantage-shaping algorithms, we reveal that they implicitly optimize surrogate rewards. We specifically interpret practical "hard-example up-weighting" modifications to GRPO as reward-level regularization. Conversely, starting from surrogate reward objectives, we provide a simple recipe for deriving both existing and new advantage-shaping methods. This perspective provides a lens for RLVR policy gradient optimization beyond our original motivation of Pass@K.

Comments:	v2: Typos fixed. Added summary table in intro. Clarified PPO-style objective vs. surrogate reward
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2510.23049 [cs.LG]
	(or arXiv:2510.23049v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2510.23049

Submission history

From: Christos Thrampoulidis [view email]
[v1] Mon, 27 Oct 2025 06:24:56 UTC (598 KB)
[v2] Fri, 14 Nov 2025 07:48:25 UTC (599 KB)

Computer Science > Machine Learning

Title:Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators