Skip to main content

Showing 1–49 of 49 results for author: Bietti, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.26554  [pdf, ps, other

    cs.LG stat.ML

    Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

    Authors: Juno Kim, Eshaan Nichani, Denny Wu, Alberto Bietti, Jason D. Lee

    Abstract: Spectral optimizers such as Muon have recently shown strong empirical performance in large-scale language model training, but the source and extent of their advantage remain poorly understood. We study this question through the linear associative memory problem, a tractable model for factual recall in transformer-based models. In particular, we go beyond orthogonal embeddings and consider Gaussian… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

    Comments: 77 pages, 8 figures

  2. arXiv:2603.20969  [pdf, ps, other

    cs.LG cs.CL

    Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge

    Authors: Bhavya Vasudeva, Puneesh Deora, Alberto Bietti, Vatsal Sharan, Christos Thrampoulidis

    Abstract: Transformer-based language models excel at in-context learning (ICL), where they can adapt to new tasks based on contextual examples, without parameter updates. In a specific form of ICL, which we refer to as \textit{contextual recall}, models pretrained on open-ended text leverage pairwise examples to recall specific facts in novel prompt formats. We investigate whether contextual recall emerges… ▽ More

    Submitted 21 March, 2026; originally announced March 2026.

    Comments: 28 pages, 26 figures

  3. arXiv:2603.15952  [pdf, ps, other

    cs.AI

    Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

    Authors: Jacopo Teneggi, S. M. Bargeen A. Turzo, Tanya Marwah, Alberto Bietti, P. Douglas Renfrew, Vikram Khipple Mulligan, Siavash Golkar

    Abstract: Large language models (LLMs) are capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural testbed: although machine learning (ML) methods achieve strong results, these are largely restricted to canonical amino acids and narrow objectives, leaving unfilled need for a generalist tool for broa… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

  4. arXiv:2603.15923  [pdf, ps, other

    stat.ML cs.LG

    Learning to Recall with Transformers Beyond Orthogonal Embeddings

    Authors: Nuri Mert Vural, Alberto Bietti, Mahdi Soltanolkotabi, Denny Wu

    Abstract: Modern large language models (LLMs) excel at tasks that require storing and retrieving knowledge, such as factual recall and question answering. Transformers are central to this capability because they can encode information during training and retrieve it at inference. Existing theoretical analyses typically study transformers under idealized assumptions such as infinite data or orthogonal embedd… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: ICLR 2026

  5. arXiv:2603.13227  [pdf, ps, other

    cs.LG cs.CV

    Representation Learning for Spatiotemporal Physical Systems

    Authors: Helen Qu, Rudy Morel, Michael McCabe, Alberto Bietti, François Lanusse, Shirley Ho, Yann LeCun

    Abstract: Machine learning approaches to spatiotemporal physical systems have primarily focused on next-frame prediction, with the goal of learning an accurate emulator for the system's evolution in time. However, these emulators are computationally expensive to train and are subject to performance pitfalls, such as compounding errors during autoregressive rollout. In this work, we take a different perspect… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

    Comments: Published at ICLR 2026 Workshop on AI & PDE

  6. arXiv:2512.22768  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Understanding the Mechanisms of Fast Hyperparameter Transfer

    Authors: Nikhil Ghosh, Denny Wu, Alberto Bietti

    Abstract: The growing scale of deep learning models has rendered standard hyperparameter (HP) optimization prohibitively expensive. A promising solution is the use of scale-aware hyperparameters, which can enable direct transfer of optimal HPs from small-scale grid searches to large models with minimal performance loss. To understand the principles governing such transfer strategy, we develop a general conc… ▽ More

    Submitted 27 December, 2025; originally announced December 2025.

    Comments: 43 pages

  7. arXiv:2512.18634  [pdf, ps, other

    cs.LG stat.ML

    From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers

    Authors: Ryotaro Kawata, Yujin Song, Alberto Bietti, Naoki Nishikawa, Taiji Suzuki, Samuel Vaiter, Denny Wu

    Abstract: Transformers can implement both generalizable algorithms (e.g., induction heads) and simple positional shortcuts (e.g., memorizing fixed output positions). In this work, we study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other. Focusing on a minimal trigger-output prediction task -- copying the token immediately following a special trig… ▽ More

    Submitted 21 December, 2025; originally announced December 2025.

    Comments: NeurIPS 2025

  8. arXiv:2511.20798  [pdf, ps, other

    cs.LG cs.AI physics.comp-ph

    Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

    Authors: Rio Alexa Fear, Payel Mukhopadhyay, Michael McCabe, Alberto Bietti, Miles Cranmer

    Abstract: Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be directly manipulated to steer model behaviour. However, it remains an open question whether this phenomenon is uniq… ▽ More

    Submitted 27 November, 2025; v1 submitted 25 November, 2025; originally announced November 2025.

    Comments: 16 Pages, 9 Figures. Code available soon at https://github.com/DJ-Fear/walrus_steering

  9. arXiv:2511.19390  [pdf, ps, other

    cs.LG astro-ph.SR cs.AI stat.ML

    Predicting partially observable dynamical systems via diffusion models with a multiscale inference scheme

    Authors: Rudy Morel, Francesco Pio Ramunno, Jeff Shen, Alberto Bietti, Kyunghyun Cho, Miles Cranmer, Siavash Golkar, Olexandr Gugnin, Geraud Krawezik, Tanya Marwah, Michael McCabe, Lucas Meyer, Payel Mukhopadhyay, Ruben Ohana, Liam Parker, Helen Qu, François Rozet, K. D. Leka, François Lanusse, David Fouhey, Shirley Ho

    Abstract: Conditional diffusion models provide a natural framework for probabilistic prediction of dynamical systems and have been successfully applied to fluid dynamics and weather prediction. However, in many settings, the available information at a given time represents only a small fraction of what is needed to predict future states, either due to measurement uncertainty or because only a small fraction… ▽ More

    Submitted 24 November, 2025; originally announced November 2025.

  10. arXiv:2511.15684  [pdf, ps, other

    cs.LG cs.AI cs.CE

    Walrus: A Cross-Domain Foundation Model for Continuum Dynamics

    Authors: Michael McCabe, Payel Mukhopadhyay, Tanya Marwah, Bruno Regaldo-Saint Blancard, Francois Rozet, Cristiana Diaconu, Lucas Meyer, Kaze W. K. Wong, Hadi Sotoudeh, Alberto Bietti, Irina Espejo, Rio Fear, Siavash Golkar, Tom Hehir, Keiya Hirashima, Geraud Krawezik, Francois Lanusse, Rudy Morel, Ruben Ohana, Liam Parker, Mariel Pettee, Jeff Shen, Kyunghyun Cho, Miles Cranmer, Shirley Ho

    Abstract: Foundation models have transformed machine learning for language and vision, but achieving comparable impact in physical simulation remains a challenge. Data heterogeneity and unstable long-term dynamics inhibit learning from sufficiently diverse dynamics, while varying resolutions and dimensionalities challenge efficient training on modern hardware. Through empirical and theoretical analysis, we… ▽ More

    Submitted 19 November, 2025; originally announced November 2025.

  11. arXiv:2510.17959  [pdf, ps, other

    astro-ph.IM cs.AI cs.LG

    Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning

    Authors: Jeff Shen, Francois Lanusse, Liam Holden Parker, Ollie Liu, Tom Hehir, Leopoldo Sarra, Lucas Meyer, Micah Bowles, Sebastian Wagner-Carena, Sebastian Wagner-Carena, Helen Qu, Siavash Golkar, Alberto Bietti, Hatim Bourfoune, Nathan Cassereau, Pierre Cornette, Keiya Hirashima, Geraud Krawezik, Ruben Ohana, Nicholas Lourie, Michael McCabe, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Bruno Régaldo-Saint Blancard , et al. (3 additional authors not shown)

    Abstract: Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences. Astronomical spectra exemplify this challenge: massive surveys have collected millions of spectra across a wide range of wavelengths and resolutions, yet analyses remain fragmented across spectral domains (e.g., optical vs.… ▽ More

    Submitted 10 November, 2025; v1 submitted 20 October, 2025; originally announced October 2025.

    Comments: Accepted at NeurIPS 2025 Machine Learning and the Physical Sciences Workshop; v2: added collaboration

  12. arXiv:2510.15804  [pdf, ps, other

    cs.CL

    Emergence of Linear Truth Encodings in Language Models

    Authors: Shauli Ravfogel, Gilad Yehudai, Tal Linzen, Joan Bruna, Alberto Bietti

    Abstract: Recent probing studies reveal that large language models exhibit linear subspaces that separate true from false statements, yet the mechanism behind their emergence is unclear. We introduce a transparent, one-layer transformer toy model that reproduces such truth subspaces end-to-end and exposes one concrete route by which they can arise. We study one simple setting in which truth encoding can eme… ▽ More

    Submitted 17 October, 2025; originally announced October 2025.

    Comments: Accepted in Neurips 2025

  13. arXiv:2510.01346  [pdf

    cs.AI cs.CL

    Aristotle: IMO-level Automated Theorem Proving

    Authors: Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Mathïs Fédérico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Rodriguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams, Lawrence Wu

    Abstract: We introduce Aristotle, an AI system that combines formal verification with informal reasoning, achieving gold-medal-equivalent performance on the 2025 International Mathematical Olympiad problems. Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver. Our system demonstrates state-… ▽ More

    Submitted 10 October, 2025; v1 submitted 1 October, 2025; originally announced October 2025.

  14. arXiv:2509.21998  [pdf, ps, other

    cs.AI cs.LG

    GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments

    Authors: Hanlin Zhu, Tianyu Guo, Song Mei, Stuart Russell, Nikhil Ghosh, Alberto Bietti, Jiantao Jiao

    Abstract: As LLMs are increasingly deployed as agents, agentic reasoning - the ability to combine tool use, especially search, and reasoning - becomes a critical skill. However, it is hard to disentangle agentic reasoning when evaluated in complex environments and tasks. Current agent benchmarks often mix agentic reasoning with challenging math reasoning, expert-level knowledge, and other advanced capabilit… ▽ More

    Submitted 2 October, 2025; v1 submitted 26 September, 2025; originally announced September 2025.

    Comments: 39 pages, 8 figures

  15. arXiv:2505.23683  [pdf, ps, other

    cs.LG

    Learning Compositional Functions with Transformers from Easy-to-Hard Data

    Authors: Zixuan Wang, Eshaan Nichani, Alberto Bietti, Alex Damian, Daniel Hsu, Jason D. Lee, Denny Wu

    Abstract: Transformer-based language models have demonstrated impressive capabilities across a range of complex reasoning tasks. Prior theoretical work exploring the expressive power of transformers has shown that they can efficiently perform multi-step reasoning tasks involving parallelizable computations. However, the learnability of such constructions, particularly the conditions on the data distribution… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: COLT 2025

  16. arXiv:2502.21274  [pdf, ps, other

    cs.LG cs.AI q-bio.BM

    BAnG: Bidirectional Anchored Generation for Conditional RNA Design

    Authors: Roman Klypa, Alberto Bietti, Sergei Grudinin

    Abstract: Designing RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Existing computational approaches require a substantial amount of previously known interacting RNA sequences for each specific protein or a detailed knowledge of RNA structure, restricting their utility in practice. To address this limitation, we develop RNA-BAnG, a deep… ▽ More

    Submitted 23 June, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

  17. arXiv:2502.05164  [pdf, ps, other

    cs.LG cond-mat.dis-nn

    In-context denoising with one-layer transformers: connections between attention and associative memory retrieval

    Authors: Matthew Smart, Alberto Bietti, Anirvan M. Sengupta

    Abstract: We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attentio… ▽ More

    Submitted 6 June, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: Accepted to ICML 2025

  18. arXiv:2412.06538  [pdf, other

    cs.LG cs.CL cs.IT stat.ML

    Understanding Factual Recall in Transformers via Associative Memories

    Authors: Eshaan Nichani, Jason D. Lee, Alberto Bietti

    Abstract: Large language models have demonstrated an impressive ability to perform factual recall. Prior work has found that transformers trained on factual recall tasks can store information at a rate proportional to their parameter count. In our work, we show that shallow transformers can use a combination of associative memories to obtain such near optimal storage capacity. We begin by proving that the s… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  19. arXiv:2406.03068  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers

    Authors: Lei Chen, Joan Bruna, Alberto Bietti

    Abstract: Large language models have been successful at tasks involving basic forms of in-context reasoning, such as generating coherent language, as well as storing vast amounts of knowledge. At the core of the Transformer architecture behind such models are feed-forward and attention layers, which are often associated to knowledge and reasoning, respectively. In this paper, we study this distinction empir… ▽ More

    Submitted 6 March, 2025; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: ICLR 2025

  20. arXiv:2406.02585  [pdf, other

    cs.LG cs.AI stat.ML

    Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

    Authors: Siavash Golkar, Alberto Bietti, Mariel Pettee, Michael Eickenberg, Miles Cranmer, Keiya Hirashima, Geraud Krawezik, Nicholas Lourie, Michael McCabe, Rudy Morel, Ruben Ohana, Liam Holden Parker, Bruno Régaldo-Saint Blancard, Kyunghyun Cho, Shirley Ho

    Abstract: Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datas… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  21. arXiv:2403.03362  [pdf, other

    cs.LG math.OC

    Level Set Teleportation: An Optimization Perspective

    Authors: Aaron Mishkin, Alberto Bietti, Robert M. Gower

    Abstract: We study level set teleportation, an optimization routine which tries to accelerate gradient descent (GD) by maximizing the gradient norm over a level set of the objective. While teleportation intuitively speeds-up GD via bigger steps, current work lacks convergence theory for convex functions, guarantees for solving the teleportation operator, and even clear empirical evidence showing this accele… ▽ More

    Submitted 18 March, 2025; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Published at AISTATS 2025

  22. arXiv:2402.19449  [pdf, other

    cs.LG cs.CL math.OC stat.ML

    Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

    Authors: Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti

    Abstract: Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks. When trained with gradient descent, the loss of infrequent words decreases more slowly than the loss of frequent ones. This leads to a slow decrease o… ▽ More

    Submitted 12 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  23. arXiv:2402.18724  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Associative Memories with Gradient Descent

    Authors: Vivien Cabannes, Berfin Simsek, Alberto Bietti

    Abstract: This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and correlations between embeddings. Through theory and experiments, we provide several insights. In overparameterized regimes, we obtain logarithmic grow… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  24. arXiv:2310.19793  [pdf, other

    stat.ML cs.LG math.OC

    On Learning Gaussian Multi-index Models with Gradient Flow

    Authors: Alberto Bietti, Joan Bruna, Loucas Pillaud-Vivien

    Abstract: We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link… ▽ More

    Submitted 2 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

  25. arXiv:2310.03024  [pdf, other

    astro-ph.IM cs.AI cs.LG

    AstroCLIP: A Cross-Modal Foundation Model for Galaxies

    Authors: Liam Parker, Francois Lanusse, Siavash Golkar, Leopoldo Sarra, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Geraud Krawezik, Michael McCabe, Ruben Ohana, Mariel Pettee, Bruno Regaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

    Abstract: We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used - without any model fine-tuning - for a variety of downstream tasks including (1) accurate in-modality and cross-modality semantic similarity search, (2) photometric redshift estimation, (3) galaxy property estimation fro… ▽ More

    Submitted 14 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 18 pages, accepted in Monthly Notices of the Royal Astronomical Society, Presented at the NeurIPS 2023 AI4Science Workshop

  26. arXiv:2310.02994  [pdf, other

    cs.LG cs.AI stat.ML

    Multiple Physics Pretraining for Physical Surrogate Models

    Authors: Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

    Abstract: We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly… ▽ More

    Submitted 10 December, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  27. arXiv:2310.02989  [pdf, other

    stat.ML cs.AI cs.CL cs.LG

    xVal: A Continuous Numerical Tokenization for Scientific Language Models

    Authors: Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

    Abstract: Due in part to their discontinuous and discrete default encodings for numbers, Large Language Models (LLMs) have not yet been commonly used to process numerically-dense scientific datasets. Rendering datasets as text, however, could help aggregate diverse and multi-modal scientific data into a single training corpus, thereby potentially facilitating the development of foundation models for science… ▽ More

    Submitted 15 December, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 15 pages, 12 figures. Appendix: 8 pages, 2 figures. Accepted contribution at the NeurIPS Workshop on ML for the Physical Sciences

  28. arXiv:2310.02984  [pdf, other

    stat.ML cs.AI cs.CL cs.LG cs.NE

    Scaling Laws for Associative Memories

    Authors: Vivien Cabannes, Elvis Dohmatob, Alberto Bietti

    Abstract: Learning arguably involves the discovery and memorization of abstract rules. The aim of this paper is to study associative memory mechanisms. Our model is based on high-dimensional matrices consisting of outer products of embeddings, which relates to the inner layers of transformer language models. We derive precise scaling laws with respect to sample size and parameter size, and discuss the stati… ▽ More

    Submitted 20 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    ACM Class: I.2.6; G.1.6

  29. arXiv:2306.00802  [pdf, other

    stat.ML cs.CL cs.LG

    Birth of a Transformer: A Memory Viewpoint

    Authors: Alberto Bietti, Vivien Cabannes, Diane Bouchacourt, Herve Jegou, Leon Bottou

    Abstract: Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We stu… ▽ More

    Submitted 6 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  30. arXiv:2302.02774  [pdf, other

    stat.ML cs.AI cs.LG math.ST

    The SSL Interplay: Augmentations, Inductive Bias, and Generalization

    Authors: Vivien Cabannes, Bobak T. Kiani, Randall Balestriero, Yann LeCun, Alberto Bietti

    Abstract: Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architect… ▽ More

    Submitted 1 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    MSC Class: 68Q32 ACM Class: G.3

    Journal ref: Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023

  31. On minimal variations for unsupervised representation learning

    Authors: Vivien Cabannes, Alberto Bietti, Randall Balestriero

    Abstract: Unsupervised representation learning aims at describing raw data efficiently to solve various downstream tasks. It has been approached with many techniques, such as manifold learning, diffusion maps, or more recently self-supervised learning. Those techniques are arguably all based on the underlying assumption that target functions, associated with future downstream tasks, have low variations in d… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: 5 pages, 1 figure; 1 table

    MSC Class: 68Q32 ACM Class: G.3

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1-5,

  32. arXiv:2210.15651  [pdf, other

    cs.LG math.OC stat.ML

    Learning Single-Index Models with Shallow Neural Networks

    Authors: Alberto Bietti, Joan Bruna, Clayton Sanford, Min Jae Song

    Abstract: Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input. These models are particularly relevant in high dimension, when the data might present low-dimensional structure that learning algorithms should adapt to. While several statistical aspects of this model, such as the sample complexity of recover… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 76 pages. To appear at NeurIPS 2022

  33. arXiv:2206.01079  [pdf, other

    cs.LG

    When does return-conditioned supervised learning work for offline reinforcement learning?

    Authors: David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, Joan Bruna

    Abstract: Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous… ▽ More

    Submitted 11 January, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  34. arXiv:2203.11864  [pdf, other

    stat.ML cs.LG

    On the (Non-)Robustness of Two-Layer Neural Networks in Different Learning Regimes

    Authors: Elvis Dohmatob, Alberto Bietti

    Abstract: Neural networks are known to be highly sensitive to adversarial examples. These may arise due to different factors, such as random initialization, or spurious correlations in the learning problem. To better understand these factors, we provide a precise study of the adversarial robustness in different scenarios, from initialization to the end of training in different regimes, as well as intermedia… ▽ More

    Submitted 4 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

  35. arXiv:2202.05638  [pdf, other

    cs.LG

    Efficient Kernel UCB for Contextual Bandits

    Authors: Houssam Zenati, Alberto Bietti, Eustache Diemert, Julien Mairal, Matthieu Martin, Pierre Gaillard

    Abstract: In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits. While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems. Specifically, our method relies on incremental Nystrom approximations of the joint kernel… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

    Comments: To appear at AISTATS2022

  36. arXiv:2202.05318  [pdf, other

    stat.ML cs.CR cs.LG math.OC

    Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

    Authors: Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu

    Abstract: Large-scale machine learning systems often involve data distributed across a collection of users. Federated learning algorithms leverage this structure by communicating model updates to a central server, rather than entire datasets. In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint… ▽ More

    Submitted 15 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: ICML

  37. arXiv:2107.05134  [pdf, other

    cs.LG math.OC stat.ML

    Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks

    Authors: Carles Domingo-Enrich, Alberto Bietti, Marylou Gabrié, Joan Bruna, Eric Vanden-Eijnden

    Abstract: Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is non-convex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow… ▽ More

    Submitted 15 February, 2022; v1 submitted 11 July, 2021; originally announced July 2021.

  38. arXiv:2106.07148  [pdf, other

    stat.ML cs.LG

    On the Sample Complexity of Learning under Invariance and Geometric Stability

    Authors: Alberto Bietti, Luca Venturi, Joan Bruna

    Abstract: Many supervised learning problems involve high-dimensional data such as images, text, or graphs. In order to make efficient use of data, it is often useful to leverage certain geometric priors in the problem at hand, such as invariance to translations, permutation subgroups, or stability to small deformations. We study the sample complexity of learning problems where the target function presents s… ▽ More

    Submitted 4 November, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

  39. arXiv:2105.13099  [pdf, other

    stat.ML cs.LG

    On the Universality of Graph Neural Networks on Large Random Graphs

    Authors: Nicolas Keriven, Alberto Bietti, Samuel Vaiter

    Abstract: We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain "continuous" models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c… ▽ More

    Submitted 28 May, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

  40. arXiv:2104.07531  [pdf, other

    cs.LG stat.ML

    On Energy-Based Models with Overparametrized Shallow Neural Networks

    Authors: Carles Domingo-Enrich, Alberto Bietti, Eric Vanden-Eijnden, Joan Bruna

    Abstract: Energy-based models (EBMs) are a simple yet powerful framework for generative modeling. They are based on a trainable energy function which defines an associated Gibbs measure, and they can be trained and sampled from via well-established statistical tools, such as MCMC. Neural networks may be used as energy function approximators, providing both a rich class of expressive models as well as a flex… ▽ More

    Submitted 5 May, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

  41. arXiv:2102.10032  [pdf, other

    stat.ML cs.LG

    Approximation and Learning with Deep Convolutional Models: a Kernel Perspective

    Authors: Alberto Bietti

    Abstract: The empirical success of deep convolutional networks on tasks involving high-dimensional data such as images or audio suggests that they can efficiently approximate certain functions that are well-suited for such tasks. In this paper, we study this through the lens of kernel methods, by considering simple hierarchical kernels with two or three convolution and pooling layers, inspired by convolutio… ▽ More

    Submitted 18 March, 2022; v1 submitted 19 February, 2021; originally announced February 2021.

    Comments: ICLR 2022

  42. arXiv:2009.14397  [pdf, other

    stat.ML cs.LG

    Deep Equals Shallow for ReLU Networks in Kernel Regimes

    Authors: Alberto Bietti, Francis Bach

    Abstract: Deep networks are often considered to be more expressive than shallow ones in terms of approximation. Indeed, certain functions can be approximated by deep networks provably more efficiently than by shallow ones, however, no tractable algorithms are known for learning such deep models. Separately, a recent line of work has shown that deep networks trained with gradient descent may behave like (tra… ▽ More

    Submitted 26 August, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

  43. arXiv:2006.01868  [pdf, other

    stat.ML cs.LG

    Convergence and Stability of Graph Convolutional Networks on Large Random Graphs

    Authors: Nicolas Keriven, Alberto Bietti, Samuel Vaiter

    Abstract: We study properties of Graph Convolutional Networks (GCNs) by analyzing their behavior on standard models of random graphs, where nodes are represented by random latent variables and edges are drawn according to a similarity kernel. This allows us to overcome the difficulties of dealing with discrete notions such as isomorphisms on very large graphs, by considering instead more natural geometric a… ▽ More

    Submitted 23 October, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

  44. arXiv:2004.11722  [pdf, other

    stat.ML cs.LG

    Counterfactual Learning of Stochastic Policies with Continuous Actions

    Authors: Houssam Zenati, Alberto Bietti, Matthieu Martin, Eustache Diemert, Pierre Gaillard, Julien Mairal

    Abstract: Counterfactual reasoning from logged data has become increasingly important for many applications such as web advertising or healthcare. In this paper, we address the problem of learning stochastic policies with continuous actions from the viewpoint of counterfactual risk minimization (CRM). While the CRM framework is appealing and well studied for discrete actions, the continuous action case rais… ▽ More

    Submitted 21 February, 2025; v1 submitted 22 April, 2020; originally announced April 2020.

  45. arXiv:1905.12173  [pdf, other

    stat.ML cs.LG

    On the Inductive Bias of Neural Tangent Kernels

    Authors: Alberto Bietti, Julien Mairal

    Abstract: State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel… ▽ More

    Submitted 31 October, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019

  46. arXiv:1810.00363  [pdf, other

    stat.ML cs.LG

    A Kernel Perspective for Regularizing Deep Neural Networks

    Authors: Alberto Bietti, Grégoire Mialon, Dexiong Chen, Julien Mairal

    Abstract: We propose a new point of view for regularizing deep neural networks by using the norm of a reproducing kernel Hilbert space (RKHS). Even though this norm cannot be computed, it admits upper and lower approximations leading to various practical strategies. Specifically, this perspective (i) provides a common umbrella for many existing regularization principles, including spectral norm and gradient… ▽ More

    Submitted 13 May, 2019; v1 submitted 30 September, 2018; originally announced October 2018.

    Comments: ICML

  47. arXiv:1802.04064  [pdf, other

    stat.ML cs.LG

    A Contextual Bandit Bake-off

    Authors: Alberto Bietti, Alekh Agarwal, John Langford

    Abstract: Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithm… ▽ More

    Submitted 4 June, 2021; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: JMLR

  48. arXiv:1706.03078  [pdf, other

    stat.ML cs.LG

    Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations

    Authors: Alberto Bietti, Julien Mairal

    Abstract: The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this paper, we consider deep convolutional representations of signals; we study their invariance to translations and to more genera… ▽ More

    Submitted 10 October, 2018; v1 submitted 9 June, 2017; originally announced June 2017.

    Journal ref: Journal of Machine Learning Research 20 (2019) 1-49

  49. arXiv:1610.00970  [pdf, other

    stat.ML cs.LG math.OC

    Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

    Authors: Alberto Bietti, Julien Mairal

    Abstract: Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent me… ▽ More

    Submitted 15 November, 2017; v1 submitted 4 October, 2016; originally announced October 2016.

    Comments: Advances in Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, CA, United States