Skip to main content

Showing 1–34 of 34 results for author: Gower, R M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2602.09842  [pdf, ps, other

    math.OC cs.LG

    Step-Size Stability in Stochastic Optimization: A Theoretical Perspective

    Authors: Fabian Schaipp, Robert M. Gower, Adrien Taylor

    Abstract: We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes how the performance degrades as the step size becomes too large. For convex problems, we show that this quantity directly impacts the suboptimality bound of the method. Most importantly, our analysis provides d… ▽ More

    Submitted 10 February, 2026; originally announced February 2026.

    MSC Class: 90C26

  2. arXiv:2511.07767  [pdf, ps, other

    cs.LG

    Schedulers for Schedule-free: Theoretically inspired hyperparameters

    Authors: Yuen-Man Pun, Matthew Buchholz, Robert M. Gower

    Abstract: The recently proposed schedule-free method has been shown to achieve strong performance when hyperparameter tuning is limited. The current theory for schedule-free only supports a constant learning rate, where-as the implementation used in practice uses a warm-up schedule. We show how to extend the last-iterate convergence theory of schedule-free to allow for any scheduler, and how the averaging p… ▽ More

    Submitted 10 November, 2025; originally announced November 2025.

  3. arXiv:2510.21598  [pdf, ps, other

    stat.ML cs.LG

    Fisher meets Feynman: score-based variational inference with a product of experts

    Authors: Diana Cai, Robert M. Gower, David M. Blei, Lawrence K. Saul

    Abstract: We introduce a highly expressive yet distinctly tractable family for black-box variational inference (BBVI). Each member of this family is a weighted product of experts (PoE), and each weighted expert in the product is proportional to a multivariate $t$-distribution. These products of experts can model distributions with skew, heavy tails, and multiple modes, but to use them for BBVI, we must be a… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: 27 pages, 11 figures. To appear in Advances in Neural Processing Information Systems (NeurIPS), 2025

  4. arXiv:2510.09827  [pdf, ps, other

    cs.LG stat.ML

    An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

    Authors: Michael Crawshaw, Chirag Modi, Mingrui Liu, Robert M. Gower

    Abstract: To define a steepest descent method over a neural network, we need to choose a norm for each layer, a way to aggregate these norms across layers, and whether to use normalization. We systematically explore different alternatives for aggregating norms across layers, both formalizing existing combinations of Adam and the recently proposed Muon as a type of non-Euclidean gradient descent, and derivin… ▽ More

    Submitted 10 October, 2025; originally announced October 2025.

  5. arXiv:2505.21829  [pdf, ps, other

    cs.LG

    In Search of Adam's Secret Sauce

    Authors: Antonio Orvieto, Robert M. Gower

    Abstract: Understanding the remarkable efficacy of Adam when training transformer-based language models has become a central research topic within the optimization community. To gain deeper insights, several simplifications of Adam have been proposed, such as the signed gradient and signed momentum methods. In this work, we conduct an extensive empirical study - training over 1500 language models across dif… ▽ More

    Submitted 30 November, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: Accepted at NeurIPS 2025

  6. arXiv:2505.16932  [pdf, ps, other

    cs.LG cs.AI cs.CL math.NA math.OC

    The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm

    Authors: Noah Amsel, David Persson, Christopher Musco, Robert M. Gower

    Abstract: Computing the polar decomposition and the related matrix sign function has been a well-studied problem in numerical analysis for decades. Recently, it has emerged as an important subroutine within the Muon optimizer for training deep neural networks. However, the requirements of this application differ sharply from classical settings: deep learning demands GPU-friendly algorithms that prioritize h… ▽ More

    Submitted 7 April, 2026; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 34 pages, 8 figures, 4 algorithms

    MSC Class: 65F30; 68T07; 68N19 ACM Class: G.1.3; I.2.6; F.2.1; G.1.6

  7. arXiv:2504.01898  [pdf, other

    cs.LG

    Analysis of an Idealized Stochastic Polyak Method and its Application to Black-Box Model Distillation

    Authors: Robert M. Gower, Guillaume Garrigos, Nicolas Loizou, Dimitris Oikonomou, Konstantin Mishchenko, Fabian Schaipp

    Abstract: We provide a general convergence theorem of an idealized stochastic Polyak step size called SPS$^*$. Besides convexity, we only assume a local expected gradient bound, that includes locally smooth and locally Lipschitz losses as special cases. We refer to SPS$^*$ as idealized because it requires access to the loss for every training batch evaluated at a solution. It is also ideal, in that it achie… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 44 pages, 7 figures

    MSC Class: 90C53; 74S60; 90C06; 62L20; 68W20; 15B52; 65Y20; 68W40 ACM Class: G.1.6

  8. arXiv:2410.24054  [pdf, other

    stat.ML cs.LG stat.CO

    EigenVI: score-based variational inference with orthogonal function expansions

    Authors: Diana Cai, Chirag Modi, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

    Abstract: We develop EigenVI, an eigenvalue-based approach for black-box variational inference (BBVI). EigenVI constructs its variational approximations from orthogonal function expansions. For distributions over $\mathbb{R}^D$, the lowest order term in these expansions provides a Gaussian variational approximation, while higher-order terms provide a systematic way to model non-Gaussianity. These approximat… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: 25 pages, 9 figures. Advances in Neural Information Processing Systems (NeurIPS), 2024

  9. arXiv:2404.07525  [pdf, other

    cs.LG

    Enhancing Policy Gradient with the Polyak Step-Size Adaption

    Authors: Yunxiang Li, Rui Yuan, Chen Fan, Mark Schmidt, Samuel Horváth, Robert M. Gower, Martin Takáč

    Abstract: Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL). Renowned for its convergence guarantees and stability compared to other RL algorithms, its practical application is often hindered by sensitivity to hyper-parameters, particularly the step-size. In this paper, we introduce the integration of the Polyak step-size in RL, which automatically a… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  10. arXiv:2403.04081  [pdf, other

    cs.LG math.OC

    Directional Smoothness and Gradient Methods: Convergence and Adaptivity

    Authors: Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

    Abstract: We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a seq… ▽ More

    Submitted 13 January, 2025; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Published as a poster at NeurIPS 2024

  11. arXiv:2403.03362  [pdf, other

    cs.LG math.OC

    Level Set Teleportation: An Optimization Perspective

    Authors: Aaron Mishkin, Alberto Bietti, Robert M. Gower

    Abstract: We study level set teleportation, an optimization routine which tries to accelerate gradient descent (GD) by maximizing the gradient norm over a level set of the objective. While teleportation intuitively speeds-up GD via bigger steps, current work lacks convergence theory for convex functions, guarantees for solving the teleportation operator, and even clear empirical evidence showing this accele… ▽ More

    Submitted 18 March, 2025; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Published at AISTATS 2025

  12. arXiv:2402.14758  [pdf, other

    stat.ML cs.AI cs.LG stat.CO

    Batch and match: black-box variational inference with a score-based divergence

    Authors: Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

    Abstract: Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Not… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 49 pages, 14 figures. To appear in the Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

  13. arXiv:2307.14528  [pdf, other

    cs.LG math.OC

    Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM

    Authors: Guillaume Garrigos, Robert M. Gower, Fabian Schaipp

    Abstract: Here we develop variants of SGD (stochastic gradient descent) with an adaptive step size that make use of the sampled loss values. In particular, we focus on solving a finite sum-of-terms problem, also known as empirical risk minimization. We first detail an idealized adaptive method called $\texttt{SPS}_+$ that makes use of the sampled loss values and assumes knowledge of the sampled loss at opti… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: 38 pages, 2 figures

    MSC Class: 90C53; 74S60; 90C06; 62L20; 68W20; 15B52; 65Y20; 68W40 ACM Class: G.1.6

  14. arXiv:2305.17498  [pdf, other

    math.OC cs.LG

    A Model-Based Method for Minimizing CVaR and Beyond

    Authors: Si Yi Meng, Robert M. Gower

    Abstract: We develop a variant of the stochastic prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective. CVaR is a risk measure focused on minimizing worst-case performance, defined as the average of the top quantile of the losses. In machine learning, such a risk measure is useful to train more robust models. Although the stochastic subgradient method (SGM) is a natural choice for… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  15. arXiv:2305.13404  [pdf, other

    cs.LG math.OC

    Improving Convergence and Generalization Using Parameter Symmetries

    Authors: Bo Zhao, Robert M. Gower, Robin Walters, Rose Yu

    Abstract: In many neural networks, different values of the parameters may result in the same loss value. Parameter space symmetries are loss-invariant transformations that change the model parameters. Teleportation applies such transformations to accelerate optimization. However, the exact mechanism behind this algorithm's success is not well understood. In this paper, we show that teleportation not only sp… ▽ More

    Submitted 13 April, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 28 pages, 13 figures, ICLR 2024

  16. arXiv:2305.07583  [pdf, other

    cs.LG math.OC

    MoMo: Momentum Models for Adaptive Learning Rates

    Authors: Fabian Schaipp, Ruben Ohana, Michael Eickenberg, Aaron Defazio, Robert M. Gower

    Abstract: Training a modern machine learning architecture on a new task requires extensive learning-rate tuning, which comes at a high computational cost. Here we develop new Polyak-type adaptive learning rates that can be used on top of any momentum method, and require less tuning to perform well. We first develop MoMo, a Momentum Model based adaptive learning rate for SGD-M (stochastic gradient descent wi… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

    MSC Class: 90C53; 74S60; 90C06; 62L20; 68W20; 15B52; 65Y20; 68W40 ACM Class: G.1.6

  17. arXiv:2301.04935  [pdf, other

    math.OC cs.LG stat.ML

    A Stochastic Proximal Polyak Step Size

    Authors: Fabian Schaipp, Robert M. Gower, Michael Ulbrich

    Abstract: Recently, the stochastic Polyak step size (SPS) has emerged as a competitive adaptive step size scheme for stochastic gradient descent. Here we develop ProxSPS, a proximal variant of SPS that can handle regularization terms. Developing a proximal variant of SPS is particularly important, since SPS requires a lower bound of the objective function to work well. When the objective function is the sum… ▽ More

    Submitted 4 May, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    MSC Class: 90C26

  18. arXiv:2210.01400  [pdf, ps, other

    cs.LG cs.AI math.OC

    Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

    Authors: Rui Yuan, Simon S. Du, Robert M. Gower, Alessandro Lazaric, Lin Xiao

    Abstract: We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linea… ▽ More

    Submitted 21 February, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: This version adds a table of comparison for the literature review. The paper is published as a conference paper at ICLR 2023

  19. arXiv:2207.08171  [pdf, other

    cs.LG math.OC

    SP2: A Second Order Stochastic Polyak Method

    Authors: Shuang Li, William J. Swartworth, Martin Takáč, Deanna Needell, Robert M. Gower

    Abstract: Recently the "SP" (Stochastic Polyak step size) method has emerged as a competitive adaptive method for setting the step sizes of SGD. SP can be interpreted as a method specialized to interpolated models, since it solves the interpolation equations. SP solves these equation by using local linearizations of the model. We take a step further and develop a method for solving the interpolation equatio… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

  20. arXiv:2202.12328  [pdf, other

    cs.LG math.OC

    Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

    Authors: Robert M. Gower, Mathieu Blondel, Nidham Gazagnadou, Fabian Pedregosa

    Abstract: Tuning the step size of stochastic gradient descent is tedious and error prone. This has motivated the development of methods that automatically adapt the step size using readily available information. In this paper, we consider the family of SPS (Stochastic gradient with a Polyak Stepsize) adaptive methods. These are methods that make use of gradient and loss value at the sampled points to adapti… ▽ More

    Submitted 20 May, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: 48 pages, 7 figures

    MSC Class: 90C53; 74S60; 90C06; 62L20; 68W20; 15B52; 65Y20; 68W40 ACM Class: G.1.6

  21. arXiv:2107.11433  [pdf, ps, other

    cs.LG math.OC stat.ML

    A general sample complexity analysis of vanilla policy gradient

    Authors: Rui Yuan, Robert M. Gower, Alessandro Lazaric

    Abstract: We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex optimization to obtain convergence and sample complexity guarantees for the vanilla policy gradient (PG). Our only assumptions are that the expected return is smooth w.r.t. the policy parameters, that its $H$-step truncated gradient is close to the exact gradient, and a certain ABC assumption. This… ▽ More

    Submitted 18 November, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: Accepted at AISTATS 2022. This version updates references and adds acknowledgement to Matteo Papini who greatly improved our work before the submission

  22. arXiv:2106.11851  [pdf, other

    cs.LG math.OC

    Stochastic Polyak Stepsize with a Moving Target

    Authors: Robert M. Gower, Aaron Defazio, Michael Rabbat

    Abstract: We propose a new stochastic gradient method called MOTAPS (Moving Targetted Polyak Stepsize) that uses recorded past loss values to compute adaptive stepsizes. MOTAPS can be seen as a variant of the Stochastic Polyak (SP) which is also a method that also uses loss values to adjust the stepsize. The downside to the SP method is that it only converges when the interpolation condition holds. MOTAPS i… ▽ More

    Submitted 23 September, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: 49 pages, 13 figures, 1 table

    MSC Class: 90C53; 74S60; 90C06; 62L20; 68W20; 15B52; 65Y20; 68W40 ACM Class: G.1.6

  23. arXiv:2010.00892  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Reduced Methods for Machine Learning

    Authors: Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

    Abstract: Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster conver… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 16 pages, 7 figures, 1 table

    MSC Class: 65K05; 68T99 ACM Class: G.1.6

  24. A new framework for the computation of Hessians

    Authors: Robert M. Gower, Margarida P. Mello

    Abstract: We investigate the computation of Hessian matrices via Automatic Differentiation, using a graph model and an algebraic model. The graph model reveals the inherent symmetries involved in calculating the Hessian. The algebraic model, based on Griewank and Walther's state transformations, synthesizes the calculation of the Hessian as a formula. These dual points of view, graphical and algebraic, lead… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: 24 pages, 9 figures, 2 tables

    MSC Class: 65K10; 65D25; 65F50

    Journal ref: Optimization Methods and Software, 27(2):251--273, 2012

  25. arXiv:2006.11573  [pdf, other

    cs.LG math.OC stat.ML

    Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

    Authors: Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik

    Abstract: We present a unified theorem for the convergence analysis of stochastic gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely \& Richtárik (2020) and dropping the requirement that the loss function be strongly convex. Instead, we only rely on convexity of the loss function. Our unified analysis appli… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  26. arXiv:2006.10311  [pdf, other

    math.OC cs.LG stat.ML

    SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation

    Authors: Robert M. Gower, Othmane Sebbouh, Nicolas Loizou

    Abstract: Stochastic Gradient Descent (SGD) is being used routinely for optimizing non-convex functions. Yet, the standard convergence theory for SGD in the smooth non-convex setting gives a slow sublinear convergence to a stationary point. In this work, we provide several convergence theorems for SGD showing convergence to a global minimum for non-convex problems satisfying some extra structural assumption… ▽ More

    Submitted 22 March, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  27. arXiv:2006.07867  [pdf, other

    cs.LG math.OC stat.ML

    Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball

    Authors: Othmane Sebbouh, Robert M. Gower, Aaron Defazio

    Abstract: We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB, otherwise known as the momentum method) for the general stochastic approximation problem. For SGD, in the convex and smooth setting, we provide the first \emph{almost sure} asymptotic convergence \emph{rates} for a weighted average of the iterates . More precisely, we show that the convergence rate of the funct… ▽ More

    Submitted 5 February, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

  28. arXiv:2006.01244  [pdf, other

    cs.LG math.OC stat.ML

    The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization

    Authors: Aaron Defazio, Robert M. Gower

    Abstract: The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step sizes, Lyapunov function constants and momentum constants. In this work we propose the use of factorial powers as a flexible tool for defining constants that appear in convergence proofs. We list a number of remarkable properties that these sequences enjoy, and show how… ▽ More

    Submitted 11 April, 2023; v1 submitted 1 June, 2020; originally announced June 2020.

  29. arXiv:1908.02725  [pdf, other

    math.OC cs.LG

    Towards closing the gap between the theory and practice of SVRG

    Authors: Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower

    Abstract: Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013). SVRG is an inner-outer loop based method, where in the outer loop a reference full gradient is evaluated, after which $m \in \mathbb{N}$ steps of an inner loop are executed where the reference gradient is used to build a variance reduced estimate… ▽ More

    Submitted 2 July, 2021; v1 submitted 31 July, 2019; originally announced August 2019.

    Comments: 39 pages, 23 figures

    MSC Class: 90C15; 90C25; 68W20

  30. arXiv:1902.00071  [pdf, other

    math.OC cs.LG stat.ML

    Optimal mini-batch and step sizes for SAGA

    Authors: Nidham Gazagnadou, Robert M. Gower, Joseph Salmon

    Abstract: Recently it has been shown that the step sizes of a family of variance reduced gradient methods called the JacSketch methods depend on the expected smoothness constant. In particular, if this expected smoothness constant could be calculated a priori, then one could safely set much larger step sizes which would result in a much faster convergence rate. We fill in this gap, and provide simple closed… ▽ More

    Submitted 18 September, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

    Comments: 34 pages, 27 figures

    MSC Class: 90C15; 90C25; 68W20

  31. arXiv:1901.09401  [pdf, other

    cs.LG math.OC stat.ML

    SGD: General Analysis and Improved Rates

    Authors: Robert Mansel Gower, Nicolas Loizou, Xun Qian, Alibek Sailanbayev, Egor Shulgin, Peter Richtarik

    Abstract: We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of variants of SGD, each of which is associated with a specific probability law governing the data selection rule used to form mini-batches. This is the first time such an analysis is performed, and most of our variants of SGD w… ▽ More

    Submitted 1 May, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: 23 pages, 6 figures

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5200-5209, 2019

  32. arXiv:1803.01347  [pdf, other

    stat.ML cs.LG

    Greedy stochastic algorithms for entropy-regularized optimal transport problems

    Authors: Brahim Khalil Abid, Robert M. Gower

    Abstract: Optimal transport (OT) distances are finding evermore applications in machine learning and computer vision, but their wide spread use in larger-scale problems is impeded by their high computational cost. In this work we develop a family of fast and practical stochastic algorithms for solving the optimal transport problem with an entropic penalization. This work extends the recently developed Green… ▽ More

    Submitted 4 March, 2018; originally announced March 2018.

    Comments: 17 pages, 3 figures, AISTATS 2018

  33. arXiv:1710.07462  [pdf, other

    math.OC cs.LG stat.ML

    Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

    Authors: Robert M. Gower, Nicolas Le Roux, Francis Bach

    Abstract: Our goal is to improve variance reducing stochastic methods through better control variates. We first propose a modification of SVRG which uses the Hessian to track gradients over time, rather than to recondition, increasing the correlation of the control variates and leading to faster theoretical convergence close to the optimum. We then propose accurate and computationally efficient approximatio… ▽ More

    Submitted 31 March, 2018; v1 submitted 20 October, 2017; originally announced October 2017.

    Comments: 17 pages, 2 figures, 1 table

    MSC Class: 90C15; 90C25; 68W20

  34. arXiv:1309.5479  [pdf, ps, other

    cs.MS cs.SC math.OC

    Higher-order Reverse Automatic Differentiation with emphasis on the third-order

    Authors: Robert M. Gower, Artur L. Gower

    Abstract: It is commonly assumed that calculating third order information is too expensive for most applications. But we show that the directional derivative of the Hessian ($D^3f(x)\cdot d$) can be calculated at a cost proportional to that of a state-of-the-art method for calculating the Hessian matrix. We do this by first presenting a simple procedure for designing high order reverse methods and applying… ▽ More

    Submitted 21 September, 2013; originally announced September 2013.