Skip to main content

Showing 1–31 of 31 results for author: Gershman, S J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.05573  [pdf, ps, other

    cs.LG

    Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View

    Authors: Gyuryang Heo, Timothy Ngotiaoco, Kazuki Irie, Samuel J. Gershman, Bernardo Sabatini

    Abstract: Scalable sequence models, such as Transformer variants and structured state-space models, often trade expressivity power for sequence-level parallelism, which enables efficient training. Here we examine the bounds on error and how error scales when models operate outside of their expressivity regimes using a Lie-algebraic control perspective. Our theory formulates a correspondence between the dept… ▽ More

    Submitted 5 March, 2026; originally announced March 2026.

  2. arXiv:2602.17594  [pdf, ps, other

    cs.AI

    AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

    Authors: Lance Ying, Ryan Truong, Prafull Sharma, Kaiya Ivy Zhao, Nathan Cloos, Kelsey R. Allen, Thomas L. Griffiths, Katherine M. Collins, José Hernández-Orallo, Phillip Isola, Samuel J. Gershman, Joshua B. Tenenbaum

    Abstract: Rigorously evaluating machine intelligence against the broad spectrum of human general intelligence has become increasingly important and challenging in this era of rapid technological advance. Conventional AI benchmarks typically assess only narrow capabilities in a limited range of human activity. Most are also static, quickly saturating as developers explicitly or implicitly optimize for them.… ▽ More

    Submitted 19 February, 2026; originally announced February 2026.

    Comments: 29 pages, 14 figures

  3. arXiv:2602.00929  [pdf, ps, other

    cs.AI

    Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents

    Authors: Zergham Ahmed, Kazuki Irie, Joshua B. Tenenbaum, Christopher J. Bates, Samuel J. Gershman

    Abstract: Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks -- an ability that remains challenging for state-of-the-art large language model (LLM) agents and deep reinforcement learning (RL) systems. Inspired by the cognitive science of how people form abstractions and intuitive theories of their world knowledge, Theory-Based RL (TBRL) systems, such as TheoryCoder… ▽ More

    Submitted 31 January, 2026; originally announced February 2026.

    Comments: 20 pages

  4. arXiv:2512.15948  [pdf, ps, other

    cs.AI q-bio.NC

    Subjective functions

    Authors: Samuel J. Gershman

    Abstract: Where do objective functions come from? How do we select what goals to pursue? Human intelligence is adept at synthesizing new objective functions on the fly. How does this work, and can we endow artificial systems with the same ability? This paper proposes an approach to answering these questions, starting with the concept of a subjective function, a higher-order objective function that is endoge… ▽ More

    Submitted 17 December, 2025; originally announced December 2025.

  5. arXiv:2511.22128  [pdf, ps, other

    cs.LG

    A Variational Manifold Embedding Framework for Nonlinear Dimensionality Reduction

    Authors: John J. Vastola, Samuel J. Gershman, Kanaka Rajan

    Abstract: Dimensionality reduction algorithms like principal component analysis (PCA) are workhorses of machine learning and neuroscience, but each has well-known limitations. Variants of PCA are simple and interpretable, but not flexible enough to capture nonlinear data manifold structure. More flexible approaches have other problems: autoencoders are generally difficult to interpret, and graph-embedding-b… ▽ More

    Submitted 27 November, 2025; originally announced November 2025.

    Comments: Accepted to the NeurIPS 2025 Workshop on Symmetry and Geometry in Neural Representations (NeurReps)

  6. arXiv:2510.26997  [pdf, ps, other

    cs.LG

    Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules

    Authors: John J. Vastola, Samuel J. Gershman, Kanaka Rajan

    Abstract: Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered optimal? We propose a theoretical framework that casts learning rules as policies for navigating (partially observable) loss landscapes, and identifies optimal… ▽ More

    Submitted 30 October, 2025; originally announced October 2025.

    Comments: Accepted to NeurIPS 2025

  7. arXiv:2509.21534  [pdf, ps, other

    cs.LG

    A circuit for predicting hierarchical structure in-context in Large Language Models

    Authors: Tankred Saanum, Can Demircan, Samuel J. Gershman, Eric Schulz

    Abstract: Large Language Models (LLMs) excel at in-context learning, the ability to use information provided as context to improve prediction of future tokens. Induction heads have been argued to play a crucial role for in-context learning in Transformer Language Models. These attention heads make a token attend to successors of past occurrences of the same token in the input. This basic mechanism supports… ▽ More

    Submitted 25 September, 2025; originally announced September 2025.

  8. arXiv:2508.08435  [pdf, ps, other

    cs.LG cs.AI q-bio.NC

    Fast weight programming and linear transformers: from machine learning to neurobiology

    Authors: Kazuki Irie, Samuel J. Gershman

    Abstract: Recent advances in artificial neural networks for machine learning, and language modeling in particular, have established a family of recurrent neural network (RNN) architectures that, unlike conventional RNNs with vector-form hidden states, use two-dimensional (2D) matrix-form hidden states. Such 2D-state RNNs, known as Fast Weight Programmers (FWPs), can be interpreted as a neural network whose… ▽ More

    Submitted 17 March, 2026; v1 submitted 11 August, 2025; originally announced August 2025.

    Comments: Accepted to TMLR 2025

  9. arXiv:2507.12821  [pdf, ps, other

    cs.AI cs.LG

    Assessing Adaptive World Models in Machines with Novel Games

    Authors: Lance Ying, Katherine M. Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J. Gershman, Jacob D. Andreas, Thomas L. Griffiths, Francois Chollet, Kelsey R. Allen, Joshua B. Tenenbaum

    Abstract: Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. H… ▽ More

    Submitted 22 July, 2025; v1 submitted 17 July, 2025; originally announced July 2025.

    Comments: 17 pages, 4 figures

  10. arXiv:2507.09409  [pdf, ps, other

    cs.MA

    Adaptive Social Learning using Theory of Mind

    Authors: Lance Ying, Ryan Truong, Joshua B. Tenenbaum, Samuel J. Gershman

    Abstract: Social learning is a powerful mechanism through which agents learn about the world from others. However, humans don't always choose to observe others, since social learning can carry time and cognitive resource costs. How do people balance social and non-social learning? In this paper, we propose a rational mentalizing model of the decision to engage in social learning. This model estimates the ut… ▽ More

    Submitted 12 July, 2025; originally announced July 2025.

    Comments: 7 pages, 4 figure; paper published at CogSci 2025

  11. arXiv:2507.05561  [pdf, ps, other

    cs.LG q-bio.NC

    Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines

    Authors: Wilka Carvalho, Sam Hall-McMaster, Honglak Lee, Samuel J. Gershman

    Abstract: Humans can pursue a near-infinite variety of tasks, but typically can only pursue a small number at the same time. We hypothesize that humans leverage experience on one task to preemptively learn solutions to other tasks that were accessible but not pursued. We formalize this idea as Multitask Preplay, a novel algorithm that replays experience on one task as the starting point for "preplay" -- cou… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  12. arXiv:2506.00744  [pdf, ps, other

    cs.LG

    Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

    Authors: Kazuki Irie, Morris Yau, Samuel J. Gershman

    Abstract: We develop hybrid memory architectures for general-purpose sequence processing neural networks, that combine key-value memory using softmax attention (KV-memory) with fast weight memory through dynamic synaptic modulation (FW-memory) -- the core principles of quadratic and linear transformers, respectively. These two memory systems have complementary but individually limited properties: KV-memory… ▽ More

    Submitted 22 October, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted to NeurIPS 2025

  13. arXiv:2503.20124  [pdf, ps, other

    cs.AI

    Synthesizing world models for bilevel planning

    Authors: Zergham Ahmed, Joshua B. Tenenbaum, Christopher J. Bates, Samuel J. Gershman

    Abstract: Modern reinforcement learning (RL) systems have demonstrated remarkable capabilities in complex environments, such as video games. However, they still fall short of achieving human-like sample efficiency and adaptability when learning new domains. Theory-based reinforcement learning (TBRL) is an algorithmic framework specifically designed to address this gap. Modeled on cognitive theories, TBRL le… ▽ More

    Submitted 13 July, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to TMLR

  14. arXiv:2502.19402  [pdf, ps, other

    cs.LG

    General Intelligence Requires Reward-based Pretraining

    Authors: Seungwook Han, Jyothish Pari, Samuel J. Gershman, Pulkit Agrawal

    Abstract: Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly -- the hallmarks of artificial general intelligence (AGI) -- remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understandi… ▽ More

    Submitted 26 August, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: https://improbableai.notion.site/General-Intelligence-Requires-Reward-Based-Pretraining-2023b66e4cf580d3ab44c7860b75d25f?pvs=73

  15. arXiv:2501.02950  [pdf, other

    q-bio.NC cs.AI cs.LG

    Key-value memory in the brain

    Authors: Samuel J. Gershman, Ila Fiete, Kazuki Irie

    Abstract: Classical models of memory in psychology and neuroscience rely on similarity-based retrieval of stored patterns, where similarity is a function of retrieval cues and the stored patterns. While parsimonious, these models do not allow distinct representations for storage and retrieval, despite their distinct computational demands. Key-value memory systems, in contrast, distinguish representations us… ▽ More

    Submitted 3 March, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

    Comments: Accepted to Neuron

  16. arXiv:2411.03541  [pdf, other

    cs.LG q-bio.NC

    Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex

    Authors: Tanishq Kumar, Blake Bordelon, Cengiz Pehlevan, Venkatesh N. Murthy, Samuel J. Gershman

    Abstract: Does learning of task-relevant representations stop when behavior stops changing? Motivated by recent theoretical advances in machine learning and the intuitive observation that human experts continue to learn from practice even after mastery, we hypothesize that task-specific representation learning can continue, even when behavior plateaus. In a novel reanalysis of recently published neural data… ▽ More

    Submitted 29 November, 2024; v1 submitted 5 November, 2024; originally announced November 2024.

  17. arXiv:2408.14508  [pdf, other

    cs.AI cs.LG q-bio.NC

    Artificial intelligence for science: The easy and hard problems

    Authors: Ruairidh M. Battleday, Samuel J. Gershman

    Abstract: A suite of impressive scientific discoveries have been driven by recent advances in artificial intelligence. These almost all result from training flexible algorithms to solve difficult optimization problems specified in advance by teams of domain scientists and engineers with access to large amounts of data. Although extremely useful, this kind of problem solving only corresponds to one part of s… ▽ More

    Submitted 16 December, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 boxes, 5 figures

  18. arXiv:2402.06590  [pdf, other

    cs.AI cs.LG

    Predictive representations: building blocks of intelligence

    Authors: Wilka Carvalho, Momchil S. Tomov, William de Cothi, Caswell Barry, Samuel J. Gershman

    Abstract: Adaptive behavior often requires predicting future events. The theory of reinforcement learning prescribes what kinds of predictive representations are useful and how to compute them. This paper integrates these theoretical ideas with work on cognition and neuroscience. We pay special attention to the successor representation (SR) and its generalizations, which have been widely applied both as eng… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: accepted to Neural Computation

  19. arXiv:2312.08519  [pdf

    q-bio.NC cs.AI

    Reconciling Shared versus Context-Specific Information in a Neural Network Model of Latent Causes

    Authors: Qihong Lu, Tan T. Nguyen, Qiong Zhang, Uri Hasson, Thomas L. Griffiths, Jeffrey M. Zacks, Samuel J. Gershman, Kenneth A. Norman

    Abstract: It has been proposed that, when processing a stream of events, humans divide their experiences in terms of inferred latent causes (LCs) to support context-dependent learning. However, when shared structure is present across contexts, it is still unclear how the "splitting" of LCs and learning of shared structure can be simultaneously achieved. Here, we present the Latent Cause Network (LCNet), a n… ▽ More

    Submitted 6 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  20. arXiv:2312.03759  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.DL

    How should the advent of large language models affect the practice of science?

    Authors: Marcel Binz, Stephan Alaniz, Adina Roskies, Balazs Aczel, Carl T. Bergstrom, Colin Allen, Daniel Schad, Dirk Wulff, Jevin D. West, Qiong Zhang, Richard M. Shiffrin, Samuel J. Gershman, Ven Popov, Emily M. Bender, Marco Marelli, Matthew M. Botvinick, Zeynep Akata, Eric Schulz

    Abstract: Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schu… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  21. arXiv:2310.06110  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Grokking as the Transition from Lazy to Rich Training Dynamics

    Authors: Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan

    Abstract: We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To illustrate this mechanism, we study the simple setting of vanilla gradient descent on a polynomial regression problem with a two layer neural network which exhi… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Adding new experiments on higher degree Hermite polynomials, multi-index targets, removed DMFT analysis from this version

  22. arXiv:2305.15277  [pdf, other

    cs.LG

    Successor-Predecessor Intrinsic Exploration

    Authors: Changmin Yu, Neil Burgess, Maneesh Sahani, Samuel J. Gershman

    Abstract: Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. Although the study of intrinsic rewards has a long history, existing methods focus on composing the intrinsic reward based on measures… ▽ More

    Submitted 25 January, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  23. arXiv:2107.12544  [pdf, other

    cs.AI

    Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning

    Authors: Pedro A. Tsividis, Joao Loula, Jake Burga, Nathan Foss, Andres Campero, Thomas Pouncy, Samuel J. Gershman, Joshua B. Tenenbaum

    Abstract: Reinforcement learning (RL) studies how an agent comes to achieve reward in an environment through interactions over time. Recent advances in machine RL have surpassed human expertise at the world's oldest board games and many classic video games, but they require vast quantities of experience to learn successfully -- none of today's algorithms account for the human ability to learn so many differ… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

  24. arXiv:2107.06393  [pdf, other

    cs.CV cs.AI cs.LG

    Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface

    Authors: Tuan Anh Le, Katherine M. Collins, Luke Hewitt, Kevin Ellis, N. Siddharth, Samuel J. Gershman, Joshua B. Tenenbaum

    Abstract: Modeling complex phenomena typically involves the use of both discrete and continuous variables. Such a setting applies across a wide range of problems, from identifying trends in time-series data to performing effective compositional scene understanding in images. Here, we propose Hybrid Memoised Wake-Sleep (HMWS), an algorithm for effective inference in such hybrid discrete-continuous models. Pr… ▽ More

    Submitted 20 April, 2022; v1 submitted 3 July, 2021; originally announced July 2021.

    Journal ref: ICLR 2022

  25. arXiv:2012.15814  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Language-Mediated, Object-Centric Representation Learning

    Authors: Ruocheng Wang, Jiayuan Mao, Samuel J. Gershman, Jiajun Wu

    Abstract: We present Language-mediated, Object-centric Representation Learning (LORL), a paradigm for learning disentangled, object-centric scene representations from vision and language. LORL builds upon recent advances in unsupervised object discovery and segmentation, notably MONet and Slot Attention. While these algorithms learn an object-centric representation just by reconstructing the input image, LO… ▽ More

    Submitted 8 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: ACL 2021 Findings. First two authors contributed equally; last two authors contributed equally. Project page: https://lang-orl.github.io/

  26. arXiv:1909.05885  [pdf, other

    cs.CL cs.LG stat.ML

    Analyzing machine-learned representations: A natural language case study

    Authors: Ishita Dasgupta, Demi Guo, Samuel J. Gershman, Noah D. Goodman

    Abstract: As modern deep networks become more complex, and get closer to human-like capabilities in certain domains, the question arises of how the representations and decision rules they learn compare to the ones in humans. In this work, we study representations of sentences in one such artificial system for natural language processing. We first present a diagnostic test dataset to examine the degree of ab… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: This article supersedes a previous article arXiv:1802.04302

  27. arXiv:1805.11571  [pdf, other

    stat.ML cs.LG

    Human-in-the-Loop Interpretability Prior

    Authors: Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez

    Abstract: We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user stu… ▽ More

    Submitted 30 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: To appear at NIPS 2018, selected for a spotlight. 13 pages (incl references and appendix)

  28. arXiv:1802.06426  [pdf, other

    cs.AI q-bio.NC

    Estimating scale-invariant future in continuous time

    Authors: Zoran Tiganj, Samuel J. Gershman, Per B. Sederberg, Marc W. Howard

    Abstract: Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Widely used reinforcement learning algorithms discretize continuous time and estimate either transition functions from one step to the next (model-based algorithms) or a scalar value of exponentially-discounted future reward using the Bellman equation (model-free algorithms). An important d… ▽ More

    Submitted 26 October, 2018; v1 submitted 18 February, 2018; originally announced February 2018.

    Comments: 25 pages, 10 figures

  29. arXiv:1802.04302  [pdf, other

    cs.CL stat.ML

    Evaluating Compositionality in Sentence Embeddings

    Authors: Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller, Samuel J. Gershman, Noah D. Goodman

    Abstract: An important challenge for human-like AI is compositional semantics. Recent research has attempted to address this by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to other tasks. We present a new dataset for one such task, `natural language inference' (NLI), that cannot be solved using only word-level knowledge and requires some compositionali… ▽ More

    Submitted 17 May, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

  30. arXiv:1606.02396  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Deep Successor Reinforcement Learning

    Authors: Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman

    Abstract: Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any giv… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

    Comments: 10 pages, 6 figures

  31. arXiv:1604.00289  [pdf, other

    cs.AI cs.CV cs.LG cs.NE stat.ML

    Building Machines That Learn and Think Like People

    Authors: Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, Samuel J. Gershman

    Abstract: Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achieveme… ▽ More

    Submitted 2 November, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentary