Skip to main content

Showing 1–19 of 19 results for author: Savarese, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.00174  [pdf, ps, other

    cs.LG cs.AI

    Principled Approximation Methods for Efficient and Scalable Deep Learning

    Authors: Pedro Savarese

    Abstract: Recent progress in deep learning has been driven by increasingly larger models. However, their computational and energy demands have grown proportionally, creating significant barriers to their deployment and to a wider adoption of deep learning technologies. This thesis investigates principled approximation methods for improving the efficiency of deep learning systems, with a particular focus on… ▽ More

    Submitted 13 September, 2025; v1 submitted 29 August, 2025; originally announced September 2025.

    Comments: PhD thesis

  2. arXiv:2408.11804  [pdf, other

    cs.LG cs.AI

    Approaching Deep Learning through the Spectral Dynamics of Weights

    Authors: David Yunis, Kumar Kshitij Patel, Samuel Wheeler, Pedro Savarese, Gal Vardi, Karen Livescu, Michael Maire, Matthew R. Walter

    Abstract: We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning. We identify a consistent bias in optimization across various experiments, from small-scale ``grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets,… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  3. arXiv:2404.15498  [pdf, other

    cs.ET

    Drop-Connect as a Fault-Tolerance Approach for RRAM-based Deep Neural Network Accelerators

    Authors: Mingyuan Xiang, Xuhan Xie, Pedro Savarese, Xin Yuan, Michael Maire, Yanjing Li

    Abstract: Resistive random-access memory (RRAM) is widely recognized as a promising emerging hardware platform for deep neural networks (DNNs). Yet, due to manufacturing limitations, current RRAM devices are highly susceptible to hardware defects, which poses a significant challenge to their practical applicability. In this paper, we present a machine learning technique that enables the deployment of defect… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  4. arXiv:2311.14114  [pdf, ps, other

    cs.AR cs.LG cs.PF

    SONIQ: System-Optimized Noise-Injected Ultra-Low-Precision Quantization with Full-Precision Parity

    Authors: Cyrus Zhou, Pedro Savarese, Zack Hassman, Vaughn Richard, Michael DiBrino, Michael Maire, Yanjing Li

    Abstract: Ultra-low-precision inference can sharply reduce memory and latency but often degrades accuracy and relies on specialized hardware. We present SONIQ, a system-optimized, noise-injected quantization framework that learns per-channel mixed precision for both weights and activations while training under the same rules used at inference. By injecting hardware-calibrated quantization noise during train… ▽ More

    Submitted 8 November, 2025; v1 submitted 23 November, 2023; originally announced November 2023.

  5. arXiv:2306.12700  [pdf, other

    cs.LG

    Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

    Authors: Xin Yuan, Pedro Savarese, Michael Maire

    Abstract: We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple replication heuristics or utilize auxiliary gradient-based local optimization, we craft a parameterization scheme which dynamically stabilizes weight, activation… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  6. arXiv:2012.07287  [pdf, other

    cs.CV

    Information-Theoretic Segmentation by Inpainting Error Maximization

    Authors: Pedro Savarese, Sunnie S. Y. Kim, Michael Maire, Greg Shakhnarovich, David McAllester

    Abstract: We study image segmentation from an information-theoretic perspective, proposing a novel adversarial method that performs unsupervised segmentation by partitioning images into maximally independent sets. More specifically, we group image pixels into foreground and background, with the goal of minimizing predictability of one set from the other. An easily computed loss drives a greedy search proces… ▽ More

    Submitted 29 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: Published as a conference paper at CVPR 2021

  7. arXiv:2007.15353  [pdf, other

    cs.LG stat.ML

    Growing Efficient Deep Networks by Structured Continuous Sparsification

    Authors: Xin Yuan, Pedro Savarese, Michael Maire

    Abstract: We develop an approach to growing deep network architectures over the course of training, driven by a principled combination of accuracy and sparsity objectives. Unlike existing pruning or architecture search techniques that operate on full-sized models or supernet architectures, our method can start from a small, simple seed architecture and dynamically grow and prune both layers and filters. By… ▽ More

    Submitted 5 June, 2023; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: Published as a conference paper at ICLR 2021

  8. arXiv:2002.09277  [pdf, other

    cs.LG stat.ML

    Kernel and Rich Regimes in Overparametrized Models

    Authors: Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

    Abstract: A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich… ▽ More

    Submitted 27 July, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: This updates and significantly extends a previous article (arXiv:1906.05827), Sections 6 and 7 are the most major additions. 31 pages. arXiv admin note: text overlap with arXiv:1906.05827

  9. arXiv:1912.04427  [pdf, other

    cs.LG stat.ML

    Winning the Lottery with Continuous Sparsification

    Authors: Pedro Savarese, Hugo Silva, Michael Maire

    Abstract: The search for efficient, sparse deep neural network models is most prominently performed by pruning: training a dense, overparameterized network and removing parameters, usually via following a manually-crafted heuristic. Additionally, the recent Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks which, when trained from scr… ▽ More

    Submitted 11 January, 2021; v1 submitted 9 December, 2019; originally announced December 2019.

    Comments: Published as a conference paper at NeurIPS 2020

  10. arXiv:1912.01823  [pdf, other

    cs.LG stat.ML

    Domain-independent Dominance of Adaptive Methods

    Authors: Pedro Savarese, David McAllester, Sudarshan Babu, Michael Maire

    Abstract: From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, A… ▽ More

    Submitted 16 March, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

  11. arXiv:1908.05758  [pdf, other

    cs.CL cs.AI

    Building a Massive Corpus for Named Entity Recognition using Free Open Data Sources

    Authors: Daniel Specht Menezes, Pedro Savarese, Ruy Luiz Milidiú

    Abstract: With the recent progress in machine learning, boosted by techniques such as deep learning, many tasks can be successfully solved once a large enough dataset is available for training. Nonetheless, human-annotated datasets are often expensive to produce, especially when labels are fine-grained, as is the case of Named Entity Recognition (NER), a task that operates with labels on a word-level. In… ▽ More

    Submitted 12 August, 2019; originally announced August 2019.

  12. arXiv:1908.04457  [pdf, other

    cs.LG math.OC stat.ML

    On the Convergence of AdaBound and its Connection to SGD

    Authors: Pedro Savarese

    Abstract: Adaptive gradient methods such as Adam have gained extreme popularity due to their success in training complex neural networks and less sensitivity to hyperparameter tuning compared to SGD. However, it has been recently shown that Adam can fail to converge and might cause poor generalization -- this lead to the design of new, sophisticated adaptive methods which attempt to generalize well while be… ▽ More

    Submitted 10 December, 2019; v1 submitted 12 August, 2019; originally announced August 2019.

  13. arXiv:1906.05827   

    cs.LG stat.ML

    Kernel and Rich Regimes in Overparametrized Models

    Authors: Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro

    Abstract: A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich… ▽ More

    Submitted 25 February, 2020; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: This paper has been substantially modified, updated, and expanded with additional content (arXiv:2002.09277). To avoid confusion with already existing citations, we are withdrawing the old version of this article

  14. arXiv:1902.09701  [pdf, other

    cs.LG stat.ML

    Learning Implicitly Recurrent CNNs Through Parameter Sharing

    Authors: Pedro Savarese, Michael Maire

    Abstract: We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. Restricting the number of templates yields a flexible hybridization of traditional CNNs and recurrent networks. Compared to traditional CNNs, we demonstrate substantial parameter savings on stand… ▽ More

    Submitted 13 March, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

    Comments: Published as a conference paper at ICLR 2019

  15. arXiv:1902.05040  [pdf, other

    cs.LG stat.ML

    How do infinite width bounded norm networks look in function space?

    Authors: Pedro Savarese, Itay Evron, Daniel Soudry, Nathan Srebro

    Abstract: We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias term for each unit) is bounded; or equivalently what is the minimal norm required to approximate a given function. For functions… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

  16. arXiv:1803.01905  [pdf, other

    stat.ML cs.LG

    Convergence of Gradient Descent on Separable Data

    Authors: Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

    Abstract: We provide a detailed study on the implicit bias of gradient descent when optimizing loss functions with strictly monotone tails, such as the logistic loss, over separable datasets. We look at two basic questions: (a) what are the conditions on the tail of the loss function under which gradient descent converges in the direction of the $L_2$ maximum-margin separator? (b) how does the rate of margi… ▽ More

    Submitted 24 March, 2019; v1 submitted 5 March, 2018; originally announced March 2018.

    Comments: AISTATS Camera ready version

  17. arXiv:1711.08442  [pdf, other

    cs.LG stat.ML

    From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets

    Authors: Pedro H. P. Savarese, Mayank Kakodkar, Bruno Ribeiro

    Abstract: We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLV-K). We present a MCLV-K gradie… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

    Comments: AAAI2018, 10 Pages

    Journal ref: Proceedings of the Thirty-Second {AAAI} Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018

  18. arXiv:1704.03165  [pdf, other

    cs.SI cs.LG stat.ML

    struc2vec: Learning Node Representations from Structural Identity

    Authors: Leonardo F. R. Ribeiro, Pedro H. P. Savarese, Daniel R. Figueiredo

    Abstract: Structural identity is a concept of symmetry in which network nodes are identified according to the network structure and their relationship to other nodes. Structural identity has been studied in theory and practice over the past decades, but only recently has it been addressed with representational learning techniques. This work presents struc2vec, a novel and flexible framework for learning lat… ▽ More

    Submitted 3 July, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

    Comments: 10 pages, KDD2017, Research Track

  19. arXiv:1611.01260  [pdf, other

    cs.CV cs.LG

    Learning Identity Mappings with Residual Gates

    Authors: Pedro H. P. Savarese, Leonardo O. Mazza, Daniel R. Figueiredo

    Abstract: We propose a new layer design by adding a linear gating mechanism to shortcut connections. By using a scalar parameter to control each gate, we provide a way to learn identity mappings by optimizing only one parameter. We build upon the motivation behind Residual Networks, where a layer is reformulated in order to make learning identity mappings less problematic to the optimizer. The augmentation… ▽ More

    Submitted 28 December, 2016; v1 submitted 4 November, 2016; originally announced November 2016.