Skip to main content

Showing 1–50 of 116 results for author: Balestriero, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.03208  [pdf, ps, other

    cs.LG

    Hierarchical Planning with Latent World Models

    Authors: Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas

    Abstract: Model predictive control (MPC) with learned world models has emerged as a promising paradigm for embodied control, particularly for its ability to generalize zero-shot when deployed in new environments. However, learned world models often struggle with long-horizon control due to the accumulation of prediction errors and the exponentially growing search space. In this work, we address these challe… ▽ More

    Submitted 3 April, 2026; originally announced April 2026.

  2. arXiv:2603.19312  [pdf, ps, other

    cs.LG cs.AI

    LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

    Authors: Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero

    Abstract: Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end… ▽ More

    Submitted 24 March, 2026; v1 submitted 13 March, 2026; originally announced March 2026.

  3. arXiv:2603.12231  [pdf, ps, other

    cs.LG

    Temporal Straightening for Latent Planning

    Authors: Ying Wang, Oumayma Bounou, Gaoyue Zhou, Randall Balestriero, Tim G. J. Rudner, Yann LeCun, Mengye Ren

    Abstract: Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant -- or even detrimental -- to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve represent… ▽ More

    Submitted 12 March, 2026; originally announced March 2026.

  4. arXiv:2602.22617  [pdf, ps, other

    cs.LG

    Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA

    Authors: Hai Huang, Yann LeCun, Randall Balestriero

    Abstract: Large Language Models (LLMs) obey consistent scaling laws -- empirical power-law fits that predict how loss decreases with compute, data, and parameters. While predictive, these laws are descriptive rather than prescriptive: they characterize typical training, not optimal training. Surprisingly few works have successfully challenged the data-efficiency bounds implied by these laws -- which is our… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

    Comments: 21 pages, 13 figures

  5. arXiv:2602.14272  [pdf, ps, other

    cs.LG

    Radial-VCReg: More Informative Representation Learning Through Radial Gaussianization

    Authors: Yilun Kuang, Yash Dagade, Deep Chakraborty, Erik Learned-Miller, Randall Balestriero, Tim G. J. Rudner, Yann LeCun

    Abstract: Self-supervised learning aims to learn maximally informative representations, but explicit information maximization is hindered by the curse of dimensionality. Existing methods like VCReg address this by regularizing first and second-order feature statistics, which cannot fully achieve maximum entropy. We propose Radial-VCReg, which augments VCReg with a radial Gaussianization loss that aligns fea… ▽ More

    Submitted 15 February, 2026; originally announced February 2026.

    Comments: Published in the Unifying Representations in Neural Models (UniReps) and Symmetry and Geometry in Neural Representations (NeurReps) Workshops at NeurIPS 2025

  6. arXiv:2602.11389  [pdf, ps, other

    cs.AI

    Causal-JEPA: Learning World Models through Object-Level Latent Interventions

    Authors: Heejeong Nam, Quentin Le Lidec, Lucas Maes, Yann LeCun, Randall Balestriero

    Abstract: World models require robust relational understanding to support prediction, reasoning, and control. While object-centric representations provide a useful abstraction, they are not sufficient to capture interaction-dependent dynamics. We therefore propose C-JEPA, a simple and flexible object-centric world model that extends masked joint embedding prediction from image patches to object-centric repr… ▽ More

    Submitted 11 February, 2026; originally announced February 2026.

    Comments: Project Page: https://hazel-heejeong-nam.github.io/cjepa/

  7. arXiv:2602.08968  [pdf, ps, other

    cs.AI

    stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

    Authors: Lucas Maes, Quentin Le Lidec, Dan Haramati, Nassim Massaudi, Damien Scieur, Yann LeCun, Randall Balestriero

    Abstract: World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct experience. Despite recent interest in World Models, most available implementations remain publication-specific, severely limiting their reusability, increasing the risk of bugs, and reducing evaluation standardizat… ▽ More

    Submitted 17 February, 2026; v1 submitted 9 February, 2026; originally announced February 2026.

  8. arXiv:2602.07050  [pdf, ps, other

    cs.CV cs.AI

    Interpreting Physics in Video World Models

    Authors: Sonia Joseph, Quentin Garrido, Randall Balestriero, Matthew Kowal, Thomas Fel, Shahab Bakhtiari, Blake Richards, Mike Rabbat

    Abstract: A long-standing question in physical reasoning is whether video-based models need to rely on factorized representations of physical variables in order to make physically accurate predictions, or whether they can implicitly represent such variables in a task-specific, distributed manner. While modern video world models achieve strong performance on intuitive physics benchmarks, it remains unclear w… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  9. arXiv:2602.03604  [pdf, ps, other

    cs.CV cs.AI

    A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures

    Authors: Basile Terver, Randall Balestriero, Megi Dervishi, David Fan, Quentin Garrido, Tushar Nagarajan, Koustuv Sinha, Wancong Zhang, Mike Rabbat, Yann LeCun, Amir Bar

    Abstract: We present EB-JEPA, an open-source library for learning representations and world models using Joint-Embedding Predictive Architectures (JEPAs). JEPAs learn to predict in representation space rather than pixel space, avoiding the pitfalls of generative modeling while capturing semantically meaningful features suitable for downstream tasks. Our library provides modular, self-contained implementatio… ▽ More

    Submitted 8 April, 2026; v1 submitted 3 February, 2026; originally announced February 2026.

    Comments: v2: clarify confusion in definition of JEPAs vs. regularization-based JEPAs v3: Camera-ready of ICLR world models workshop, fixed formatting and ViT config / results

  10. arXiv:2602.03190  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning

    Authors: Wenquan Lu, Hai Huang, Randall Balestriero

    Abstract: Reinforcement learning algorithms such as group-relative policy optimization (GRPO) have demonstrated strong potential for improving the mathematical reasoning capabilities of large language models. However, prior work has consistently observed an entropy collapse phenomenon during reinforcement post-training, characterized by a monotonic decrease in policy entropy that ultimately leads to trainin… ▽ More

    Submitted 5 February, 2026; v1 submitted 3 February, 2026; originally announced February 2026.

  11. arXiv:2602.01456  [pdf, ps, other

    cs.LG cs.CV

    Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations

    Authors: Yilun Kuang, Yash Dagade, Tim G. J. Rudner, Randall Balestriero, Yann LeCun

    Abstract: Joint-Embedding Predictive Architectures (JEPA) learn view-invariant representations and admit projection-based distribution matching for collapse prevention. Existing approaches regularize representations towards isotropic Gaussian distributions, but inherently favor dense representations and fail to capture the key property of sparsity observed in efficient representations. We introduce Rectifie… ▽ More

    Submitted 1 February, 2026; originally announced February 2026.

  12. arXiv:2601.23286  [pdf, ps, other

    cs.CV cs.AI cs.LG

    VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

    Authors: Hongyang Du, Junjie Ye, Xiaoyan Cong, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang

    Abstract: While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference A… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

  13. arXiv:2511.20766  [pdf, ps, other

    cs.AI

    OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability

    Authors: Karen Ullrich, Jingtong Su, Claudia Shi, Arjun Subramonian, Amir Bar, Ivan Evtimov, Nikolaos Tsilivis, Randall Balestriero, Julia Kempe, Mark Ibrahim

    Abstract: Reliability is key to realizing the promise of autonomous UI-Agents, multimodal agents that directly interact with apps in the same manner as humans, as users must be able to trust an agent to complete a given task. Current evaluations rely on fixed environments, often clones of existing apps, which are limited in that they can only shed light on whether or how often an agent can complete a task w… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

  14. arXiv:2511.19484  [pdf, ps, other

    cs.SE cs.LG

    stable-pretraining-v1: Foundation Model Research Made Simple

    Authors: Randall Balestriero, Hugues Van Assel, Sami BuGhanem, Lucas Maes

    Abstract: Foundation models and self-supervised learning (SSL) have become central to modern AI, yet research in this area remains hindered by complex codebases, redundant re-implementations, and the heavy engineering burden of scaling experiments. We present stable-pretraining, a modular, extensible, and performance-optimized library built on top of PyTorch, Lightning, Hugging Face, and TorchMetrics. Unlik… ▽ More

    Submitted 22 November, 2025; originally announced November 2025.

  15. arXiv:2511.08544  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

    Authors: Randall Balestriero, Yann LeCun

    Abstract: Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotr… ▽ More

    Submitted 14 November, 2025; v1 submitted 11 November, 2025; originally announced November 2025.

  16. arXiv:2510.27547  [pdf, ps, other

    cs.CV

    MapSAM2: Adapting SAM2 for Automatic Segmentation of Historical Map Images and Time Series

    Authors: Xue Xia, Randall Balestriero, Tao Zhang, Yixin Zhou, Andrew Ding, Dev Saini, Lorenz Hurni

    Abstract: Historical maps are unique and valuable archives that document geographic features across different time periods. However, automated analysis of historical map images remains a significant challenge due to their wide stylistic variability and the scarcity of annotated training data. Constructing linked spatio-temporal datasets from historical map time series is even more time-consuming and labor-i… ▽ More

    Submitted 31 October, 2025; originally announced October 2025.

  17. arXiv:2510.26099  [pdf, ps, other

    cs.LG cs.AI

    SAFE: A Novel Approach to AI Weather Evaluation through Stratified Assessments of Forecasts over Earth

    Authors: Nick Masi, Randall Balestriero

    Abstract: The dominant paradigm in machine learning is to assess model performance based on average loss across all samples in some test set. This amounts to averaging performance geospatially across the Earth in weather and climate settings, failing to account for the non-uniform distribution of human development and geography. We introduce Stratified Assessments of Forecasts over Earth (SAFE), a package f… ▽ More

    Submitted 29 October, 2025; originally announced October 2025.

  18. arXiv:2510.08638  [pdf, ps, other

    cs.CV cs.AI

    Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

    Authors: Thomas Fel, Binxu Wang, Michael A. Lepori, Matthew Kowal, Andrew Lee, Randall Balestriero, Sonia Joseph, Ekdeep S. Lubana, Talia Konkle, Demba Ba, Martin Wattenberg

    Abstract: DINOv2 is routinely deployed to recognize objects, scenes, and actions; yet the nature of what it perceives remains unknown. As a working baseline, we adopt the Linear Representation Hypothesis (LRH) and operationalize it using SAEs, producing a 32,000-unit dictionary that serves as the interpretability backbone of our study, which unfolds in three parts. In the first part, we analyze how differ… ▽ More

    Submitted 26 February, 2026; v1 submitted 8 October, 2025; originally announced October 2025.

    Comments: Accepted at ICLR 2026

    Journal ref: ICLR 2024

  19. arXiv:2510.05949  [pdf, ps, other

    cs.LG cs.AI cs.CV stat.ML

    Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density

    Authors: Randall Balestriero, Nicolas Ballas, Mike Rabbat, Yann LeCun

    Abstract: Joint Embedding Predictive Architectures (JEPAs) learn representations able to solve numerous downstream tasks out-of-the-box. JEPAs combine two objectives: (i) a latent-space prediction term, i.e., the representation of a slightly perturbed sample must be predictable from the original sample's representation, and (ii) an anti-collapse term, i.e., not all samples should have the same representatio… ▽ More

    Submitted 7 October, 2025; originally announced October 2025.

  20. arXiv:2509.14252  [pdf, ps, other

    cs.CL cs.AI

    LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

    Authors: Hai Huang, Yann LeCun, Randall Balestriero

    Abstract: Large Language Model (LLM) pretraining, finetuning, and evaluation rely on input-space reconstruction and generative capabilities. Yet, it has been observed in vision that embedding-space training objectives, e.g., with Joint Embedding Predictive Architectures (JEPAs), are far superior to their input-space counterpart. That mismatch in how training is achieved between language and vision opens up… ▽ More

    Submitted 7 October, 2025; v1 submitted 10 September, 2025; originally announced September 2025.

  21. arXiv:2509.12249  [pdf, ps, other

    cs.LG cs.AI

    Why and How Auxiliary Tasks Improve JEPA Representations

    Authors: Jiacan Yu, Siyi Chen, Mingrui Liu, Nono Horiuchi, Vladimir Braverman, Zicheng Xu, Dan Haramati, Randall Balestriero

    Abstract: Joint-Embedding Predictive Architecture (JEPA) is increasingly used for visual representation learning and as a component in model-based RL, but its behavior remains poorly understood. We provide a theoretical characterization of a simple, practical JEPA variant that has an auxiliary regression head trained jointly with latent dynamics. We prove a No Unhealthy Representation Collapse theorem: in d… ▽ More

    Submitted 18 October, 2025; v1 submitted 12 September, 2025; originally announced September 2025.

  22. arXiv:2509.06990  [pdf, ps, other

    cs.CV cs.LG

    DIET-CP: Lightweight and Data Efficient Self Supervised Continued Pretraining

    Authors: Bryan Rodas, Natalie Montesino, Jakob Ambsdorf, David Klindt, Randall Balestriero

    Abstract: Continued pretraining offers a promising solution for adapting foundation models to a new target domain. However, in specialized domains, available datasets are often very small, limiting the applicability of SSL methods developed for large-scale pretraining and making hyperparameter search infeasible. In addition, pretrained models are usually released as backbone-weights only, lacking important… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

    ACM Class: I.2; I.4

  23. arXiv:2508.15404  [pdf, ps, other

    cs.CV

    From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations

    Authors: Anthony Bisulco, Rahul Ramesh, Randall Balestriero, Pratik Chaudhari

    Abstract: Masked Autoencoders (MAEs) have emerged as a powerful pretraining technique for vision foundation models. Despite their effectiveness, they require extensive hyperparameter tuning (masking ratio, patch size, encoder/decoder layers) when applied to novel datasets. While prior theoretical works have analyzed MAEs in terms of their attention patterns and hierarchical latent variable models, the conne… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

  24. arXiv:2507.20453  [pdf, ps, other

    cs.LG

    Your Attention Matters: to Improve Model Robustness to Noise and Spurious Correlations

    Authors: Camilo Tamayo-Rousseau, Yunjia Zhao, Yiqun Zhang, Randall Balestriero

    Abstract: Self-attention mechanisms are foundational to Transformer architectures, supporting their impressive success in a wide range of tasks. While there are many self-attention variants, their robustness to noise and spurious correlations has not been well studied. This study evaluates Softmax, Sigmoid, Linear, Doubly Stochastic, and Cosine attention within Vision Transformers under different data corru… ▽ More

    Submitted 5 September, 2025; v1 submitted 27 July, 2025; originally announced July 2025.

  25. arXiv:2507.09871  [pdf, ps, other

    cs.LG cs.AI

    Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks

    Authors: Niket Patel, Randall Balestriero

    Abstract: The grand goal of AI research, and particularly Self Supervised Learning (SSL), is to produce systems that can successfully solve any possible task. In contrast, current evaluation methods available to AI researchers typically rely on a fixed collection of hand-picked downstream benchmarks. Hence, a large amount of effort is put into designing and searching for large collection of evaluation tasks… ▽ More

    Submitted 20 October, 2025; v1 submitted 13 July, 2025; originally announced July 2025.

    Comments: NeurIPS UniReps Workshop 2025

  26. arXiv:2507.03779  [pdf, ps, other

    cs.CV cs.AI cs.LG

    FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

    Authors: Jiaqi Zhang, Juntuo Wang, Zhixin Sun, John Zou, Randall Balestriero

    Abstract: Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning--which is currently extremely demanding computation-wise. We thus propose a novel pre-trainin… ▽ More

    Submitted 28 January, 2026; v1 submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted by 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

  27. arXiv:2506.19552  [pdf, ps, other

    cs.CV cs.AI cs.LG

    General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound

    Authors: Jakob Ambsdorf, Asbjørn Munk, Sebastian Llambias, Anders Nymark Christensen, Kamil Mikolaj, Randall Balestriero, Martin Tolsgaard, Aasa Feragen, Mads Nielsen

    Abstract: With access to large-scale, unlabeled medical datasets, researchers are confronted with two questions: Should they attempt to pretrain a custom foundation model on this medical data, or use transfer-learning from an existing generalist model? And, if a custom model is pretrained, are novel methods required? In this paper we explore these questions by conducting a case-study, in which we train a fo… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Submitted version of paper accepted at MICCAI 2025

    ACM Class: I.4

  28. arXiv:2506.12284  [pdf, ps, other

    cs.LG stat.ML

    GrokAlign: Geometric Characterisation and Acceleration of Grokking

    Authors: Thomas Walker, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

    Abstract: A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness… ▽ More

    Submitted 31 July, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

    Comments: 23 pages, 11 figures, 3 tables

  29. arXiv:2506.11402  [pdf, ps, other

    cs.LG cs.AI cs.CL

    LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model

    Authors: Marcel Mateos Salles, Praney Goyal, Pradyut Sekhsaria, Hai Huang, Randall Balestriero

    Abstract: Large Language Models (LLMs) are commonly finetuned for a variety of use cases and domains. A common approach is to leverage Low-Rank Adaptation (LoRA) -- known to provide strong performance at low resource costs. In this study, we demonstrate that LoRA actually opens the door to short-cut vulnerabilities -- and the more resource efficient is the LoRA setup, the more vulnerable will be the finetun… ▽ More

    Submitted 30 September, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

    Comments: 46 pages, 17 figures, 26 tables. Submitted for publication. for associated blog post, see https://pradyut3501.github.io/lora-spur-corr/

  30. arXiv:2505.17169  [pdf, ps, other

    cs.CL cs.AI

    Next Token Perception Score: Analytical Assessment of your LLM Perception Skills

    Authors: Yu-Ang Cheng, Leyang Hu, Hai Huang, Randall Balestriero

    Abstract: Autoregressive pretraining has become the de facto paradigm for learning general-purpose representations in large language models (LLMs). However, linear probe performance across downstream perception tasks shows substantial variability, suggesting that features optimized for next-token prediction do not consistently transfer well to downstream perception tasks. We demonstrate that representations… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  31. arXiv:2505.12477  [pdf, ps, other

    cs.LG cs.AI cs.CV

    Joint Embedding vs Reconstruction: Provable Benefits of Latent Space Prediction for Self Supervised Learning

    Authors: Hugues Van Assel, Mark Ibrahim, Tommaso Biancalani, Aviv Regev, Randall Balestriero

    Abstract: Reconstruction and joint embedding have emerged as two leading paradigms in Self Supervised Learning (SSL). Reconstruction methods focus on recovering the original sample from a different view in input space. On the other hand, joint embedding methods align the representations of different views in latent space. Both approaches offer compelling advantages, yet practitioners lack clear guidelines f… ▽ More

    Submitted 14 October, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

    Comments: 33 pages, 9 figures

  32. arXiv:2505.12191  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum

    Authors: Wenquan Lu, Jiaqi Zhang, Hugues Van Assel, Randall Balestriero

    Abstract: Self-Supervised Learning (SSL) has become a powerful solution to extract rich representations from unlabeled data. Yet, SSL research is mostly focused on clean, curated and high-quality datasets. As a result, applying SSL on noisy data remains a challenge, despite being crucial to applications such as astrophysics, medical imaging, geophysics or finance. In this work, we present a fully self-super… ▽ More

    Submitted 29 October, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

    Comments: NeurIPS 2025

  33. arXiv:2505.11836  [pdf, ps, other

    cs.LG cs.AI

    SplInterp: Improving our Understanding and Training of Sparse Autoencoders

    Authors: Jeremy Budd, Javier Ideami, Benjamin Macdowall Rynne, Keith Duggar, Randall Balestriero

    Abstract: Sparse autoencoders (SAEs) have received considerable recent attention as tools for mechanistic interpretability, showing success at extracting interpretable features even from very large LLMs. However, this research has been largely empirical, and there have been recent doubts about the true utility of SAEs. In this work, we seek to enhance the theoretical understanding of SAEs, using the spline… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 44 pages, 38 figures, under review

    MSC Class: 68T07; 65D07

  34. arXiv:2504.13101  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research

    Authors: Patrik Reizinger, Randall Balestriero, David Klindt, Wieland Brendel

    Abstract: Self-Supervised Learning (SSL) powers many current AI systems. As research interest and investment grow, the SSL design space continues to expand. The Platonic view of SSL, following the Platonic Representation Hypothesis (PRH), suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal. However, this phenomenon lacks precise theoret… ▽ More

    Submitted 24 July, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: ICML2025 camera ready

  35. arXiv:2502.14819  [pdf, ps, other

    cs.LG

    Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models

    Authors: Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim G. J. Rudner, Yann LeCun

    Abstract: A long-standing goal in AI is to develop agents capable of solving diverse tasks across a range of environments, including those never seen during training. Two dominant paradigms address this challenge: (i) reinforcement learning (RL), which learns policies via trial and error, and (ii) optimal control, which plans actions using a known or learned dynamics model. However, their comparative streng… ▽ More

    Submitted 28 October, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Project web page: https://latent-planning.github.io/

  36. arXiv:2502.09500  [pdf, other

    cs.LG

    Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting

    Authors: Nicholas Dronen, Randall Balestriero

    Abstract: Catastrophic forgetting -- the phenomenon of a neural network learning a task t1 and losing the ability to perform it after being trained on some other task t2 -- is a long-standing problem for neural networks [McCloskey and Cohen, 1989]. We present a method, Eidetic Learning, that provably solves catastrophic forgetting. A network trained with Eidetic Learning -- here, an EideticNet -- requires n… ▽ More

    Submitted 14 February, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: 16 pages, 6 figures; code is available at https://github.com/amazon-science/eideticnet-training

  37. arXiv:2502.07783  [pdf, ps, other

    cs.LG

    Curvature Tuning: Provable Training-free Model Steering From a Single Parameter

    Authors: Leyang Hu, Matteo Gamba, Randall Balestriero

    Abstract: The scaling of model and data sizes has reshaped the AI landscape, establishing finetuning pretrained models as the standard paradigm for solving downstream tasks. However, dominant finetuning methods typically rely on weight adaptation, often lack interpretability, and depend on heuristically chosen hyperparameters. In this paper, we take a different perspective and shift the focus from weights t… ▽ More

    Submitted 15 January, 2026; v1 submitted 11 February, 2025; originally announced February 2025.

    Comments: Accepted at NeurIPS 2025

  38. arXiv:2502.06831  [pdf, other

    cs.LG cs.AI

    No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data

    Authors: Daniel Cai, Randall Balestriero

    Abstract: Implicit neural representations (INRs) exhibit growing promise in addressing Earth representation challenges, ranging from emissions monitoring to climate modeling. However, existing methods disproportionately prioritize global average performance, whereas practitioners require fine-grained insights to understand biases and variations in these models. To bridge this gap, we introduce FAIR-Earth: a… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  39. arXiv:2502.05391  [pdf, other

    cs.CV

    Beyond and Free from Diffusion: Invertible Guided Consistency Training

    Authors: Chia-Hong Hsu, Shiu-hong Kao, Randall Balestriero

    Abstract: Guidance in image generation steers models towards higher-quality or more targeted outputs, typically achieved in Diffusion Models (DMs) via Classifier-free Guidance (CFG). However, recent Consistency Models (CMs), which offer fewer function evaluations, rely on distilling CFG knowledge from pretrained DMs to achieve guidance, making them costly and inflexible. In this work, we propose invertible… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  40. arXiv:2412.03215  [pdf, ps, other

    cs.CV cs.LG

    Beyond [cls]: Exploring the true potential of Masked Image Modeling representations

    Authors: Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński, Marek Śmieja, Bartosz Zieliński

    Abstract: Masked Image Modeling (MIM) has emerged as a promising approach for Self-Supervised Learning (SSL) of visual representations. However, the out-of-the-box performance of MIMs is typically inferior to competing approaches. Most users cannot afford fine-tuning due to the need for large amounts of data, high GPU consumption, and specialized user knowledge. Therefore, the practical use of MIM represent… ▽ More

    Submitted 13 October, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

  41. arXiv:2411.17425  [pdf, other

    cs.CV

    Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps

    Authors: Xue Xia, Randall Balestriero, Tao Zhang, Lorenz Hurni

    Abstract: Tracking geographic entities from historical maps, such as buildings, offers valuable insights into cultural heritage, urbanization patterns, environmental changes, and various historical research endeavors. However, linking these entities across diverse maps remains a persistent challenge for researchers. Traditionally, this has been addressed through a two-step process: detecting entities within… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Journal ref: NeurIPS 2024 Workshop on Self-Supervised Learning - Theory and Practice

  42. arXiv:2410.21869  [pdf, other

    cs.LG cs.AI stat.ML

    Cross-Entropy Is All You Need To Invert the Data Generating Process

    Authors: Patrik Reizinger, Alice Bizeul, Attila Juhos, Julia E. Vogt, Randall Balestriero, Wieland Brendel, David Klindt

    Abstract: Supervised learning has become a cornerstone of modern machine learning, yet a comprehensive theory explaining its effectiveness remains elusive. Empirical phenomena, such as neural analogy-making and the linear representation hypothesis, suggest that supervised models can learn interpretable factors of variation in a linear fashion. Recent advances in self-supervised learning, particularly nonlin… ▽ More

    Submitted 25 February, 2025; v1 submitted 29 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 (oral) camera ready

  43. arXiv:2410.11985  [pdf, other

    cs.CL cs.AI cs.LG

    The Fair Language Model Paradox

    Authors: Andrea Pinto, Tomer Galanti, Randall Balestriero

    Abstract: Large Language Models (LLMs) are widely deployed in real-world applications, yet little is known about their training dynamics at the token level. Evaluation typically relies on aggregated training loss, measured at the batch level, which overlooks subtle per-token biases arising from (i) varying token-level dynamics and (ii) structural biases introduced by hyperparameters. While weight decay is c… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  44. arXiv:2410.09692  [pdf, other

    cs.LG cs.AI

    ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws

    Authors: Hai Huang, Randall Balestriero

    Abstract: Low-Rank Adaptation (LoRA) is the bread and butter of Large Language Model (LLM) finetuning. LoRA learns an additive low-rank perturbation, $AB$, of a pretrained matrix parameter $W$ to align the model to a new task or dataset with $W+AB$. We identify three core limitations to LoRA for finetuning--a setting that employs limited amount of data and training steps. First, LoRA employs Dropout to prev… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  45. arXiv:2410.04289  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Supervised Anomaly Detection in the Wild: Favor Joint Embeddings Methods

    Authors: Daniel Otero, Rafael Mateus, Randall Balestriero

    Abstract: Accurate anomaly detection is critical in vision-based infrastructure inspection, where it helps prevent costly failures and enhances safety. Self-Supervised Learning (SSL) offers a promising approach by learning robust representations from unlabeled data. However, its application in anomaly detection remains underexplored. This paper addresses this gap by providing a comprehensive evaluation of S… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  46. arXiv:2408.04810  [pdf, other

    cs.CV cs.AI cs.LG

    UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

    Authors: Haider Al-Tahan, Quentin Garrido, Randall Balestriero, Diane Bouchacourt, Caner Hazirbas, Mark Ibrahim

    Abstract: Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers are tasked with the heavy burden of implementing each protocol, bearing a non-trivial computational cost, and making sense of how all these benchmarks translate into meaningful axes of progress. To facilitate a systematic eval… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  47. arXiv:2408.04809  [pdf, other

    cs.LG cs.AI cs.CV

    On the Geometry of Deep Learning

    Authors: Randall Balestriero, Ahmed Imtiaz Humayun, Richard Baraniuk

    Abstract: In this paper, we overview one promising avenue of progress at the mathematical foundation of deep learning: the connection between deep networks and function approximation by affine splines (continuous piecewise linear functions in multiple dimensions). In particular, we will overview work over the past decade on understanding certain geometrical properties of a deep network's affine spline mappi… ▽ More

    Submitted 14 January, 2025; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted for publication at 'Notices of the American Mathematical Society'

  48. arXiv:2407.18134  [pdf, other

    cs.CV cs.LG

    $\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

    Authors: Vlad Sobal, Mark Ibrahim, Randall Balestriero, Vivien Cabannes, Diane Bouchacourt, Pietro Astolfi, Kyunghyun Cho, Yann LeCun

    Abstract: Learning good representations involves capturing the diverse ways in which data samples relate. Contrastive loss - an objective matching related samples - underlies methods from self-supervised to multimodal learning. Contrastive losses, however, can be viewed more broadly as modifying a similarity graph to indicate how samples should relate in the embedding space. This view reveals a shortcoming… ▽ More

    Submitted 11 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  49. arXiv:2406.10743  [pdf, other

    cs.LG cs.AI

    Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?

    Authors: Mark Ibrahim, David Klindt, Randall Balestriero

    Abstract: Deep Learning is often depicted as a trio of data-architecture-loss. Yet, recent Self Supervised Learning (SSL) solutions have introduced numerous additional design choices, e.g., a projector network, positive views, or teacher-student networks. These additions pose two challenges. First, they limit the impact of theoretical studies that often fail to incorporate all those intertwined designs. Sec… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  50. arXiv:2406.09657  [pdf, other

    cs.LG stat.ML

    Mitigating over-exploration in latent space optimization using LES

    Authors: Omer Ronen, Ahmed Imtiaz Humayun, Richard Baraniuk, Randall Balestriero, Bin Yu

    Abstract: We develop Latent Exploration Score (LES) to mitigate over-exploration in Latent Space Optimization (LSO), a popular method for solving black-box discrete optimization problems. LSO utilizes continuous optimization within the latent space of a Variational Autoencoder (VAE) and is known to be susceptible to over-exploration, which manifests in unrealistic solutions that reduce its practicality. LES… ▽ More

    Submitted 21 February, 2025; v1 submitted 13 June, 2024; originally announced June 2024.