Skip to main content

Showing 1–15 of 15 results for author: Voudouris, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.00063  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Measuring What AI Systems Might Do: Towards A Measurement Science in AI

    Authors: Konstantinos Voudouris, Mirko Thalmann, Alex Kipnis, José Hernández-Orallo, Eric Schulz

    Abstract: Scientists, policy-makers, business leaders, and members of the public care about what modern artificial intelligence systems are disposed to do. Yet terms such as capabilities, propensities, skills, values, and abilities are routinely used interchangeably and conflated with observable performance, with AI evaluation practices rarely specifying what quantity they purport to measure. We argue that… ▽ More

    Submitted 10 February, 2026; originally announced March 2026.

  2. arXiv:2602.11863  [pdf, ps, other

    cs.LG

    In-Context Function Learning in Large Language Models

    Authors: Elif Akata, Konstantinos Voudouris, Vincent Fortuin, Eric Schulz

    Abstract: Large language models (LLMs) can learn from a few demonstrations provided at inference time. We study this in-context learning phenomenon through the lens of Gaussian Processes (GPs). We build controlled experiments where models observe sequences of multivariate scalar-valued function samples drawn from known GP priors. We evaluate prediction error in relation to the number of demonstrations and c… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

  3. arXiv:2602.06033  [pdf, ps, other

    cs.LG

    Can vision language models learn intuitive physics from interaction?

    Authors: Luca M. Schulze Buschoff, Konstantinos Voudouris, Can Demircan, Eric Schulz

    Abstract: Pre-trained vision language models do not have good intuitions about the physical world. Recent work has shown that supervised fine-tuning can improve model performance on simple physical tasks. However, fine-tuned models do not appear to learn robust physical rules that can generalize to new contexts. Based on research in cognitive science, we hypothesize that models need to interact with an envi… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

  4. arXiv:2509.13968  [pdf, ps, other

    cs.AI cs.CL cs.FL cs.LG

    Exploring Major Transitions in the Evolution of Biological Cognition With Artificial Neural Networks

    Authors: Konstantinos Voudouris, Andrew Barron, Marta Halina, Colin Klein, Matishalin Patel

    Abstract: Transitional accounts of evolution emphasise a few changes that shape what is evolvable, with dramatic consequences for derived lineages. More recently it has been proposed that cognition might also have evolved via a series of major transitions that manipulate the structure of biological neural networks, fundamentally changing the flow of information. We used idealised models of information flow,… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  5. arXiv:2503.21668  [pdf, other

    cs.AI cs.CV cs.LG

    Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI

    Authors: Danaja Rutar, Alva Markelius, Konstantinos Voudouris, José Hernández-Orallo, Lucy Cheke

    Abstract: One of the core components of our world models is 'intuitive physics' - an understanding of objects, space, and causality. This capability enables us to predict events, plan action and navigate environments, all of which rely on a composite sense of objecthood. Despite its importance, there is no single, unified account of objecthood, though multiple theoretical frameworks provide insights. In the… ▽ More

    Submitted 7 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  6. arXiv:2503.02882  [pdf, ps, other

    cs.AI

    Bringing Comparative Cognition To Computers

    Authors: Konstantinos Voudouris, Lucy G. Cheke, Eric Schulz

    Abstract: Researchers are increasingly subjecting artificial intelligence systems to psychological testing. But to rigorously compare their cognitive capacities with humans and other animals, we must avoid both over- and under-stating our similarities and differences. By embracing a comparative approach, we can integrate AI cognition research into the broader cognitive sciences.

    Submitted 4 March, 2025; originally announced March 2025.

  7. arXiv:2502.15678  [pdf, ps, other

    cs.LG

    Testing the Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models

    Authors: Luca M. Schulze Buschoff, Konstantinos Voudouris, Elif Akata, Matthias Bethge, Joshua B. Tenenbaum, Eric Schulz

    Abstract: Pre-trained vision language models still fall short of human visual cognition. In an effort to improve visual cognition and align models with human behavior, we introduce visual stimuli and human judgments on visual cognition tasks, allowing us to systematically evaluate performance across cognitive domains under a consistent environment. We fine-tune models on ground truth data for intuitive phys… ▽ More

    Submitted 30 May, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  8. arXiv:2502.14445  [pdf, ps, other

    cs.CL cs.AI stat.ML

    PredictaBoard: Benchmarking LLM Score Predictability

    Authors: Lorenzo Pacchiardi, Konstantinos Voudouris, Ben Slater, Fernando Martínez-Plumed, José Hernández-Orallo, Lexin Zhou, Wout Schellaert

    Abstract: Despite possessing impressive skills, Large Language Models (LLMs) often fail unpredictably, demonstrating inconsistent success in even basic common sense reasoning tasks. This unpredictability poses a significant challenge to ensuring their safe deployment, as identifying and operating within a reliable "safe zone" is essential for mitigating risks. To address this, we present PredictaBoard, a no… ▽ More

    Submitted 17 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted at ACL Findings 2025

  9. arXiv:2410.23242  [pdf, other

    cs.AI

    A little less conversation, a little more action, please: Investigating the physical common-sense of LLMs in a 3D embodied environment

    Authors: Matteo G. Mecattaf, Ben Slater, Marko Tešić, Jonathan Prunty, Konstantinos Voudouris, Lucy G. Cheke

    Abstract: As general-purpose tools, Large Language Models (LLMs) must often reason about everyday physical environments. In a question-and-answer capacity, understanding the interactions of physical objects may be necessary to give appropriate responses. Moreover, LLMs are increasingly used as reasoning engines in agentic systems, designing and controlling their action sequences. The vast majority of resear… ▽ More

    Submitted 3 January, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: 25 pages, 4 figures; v2: Added AFMR Acknowledgment

  10. arXiv:2410.20268  [pdf, other

    cs.LG

    Centaur: a foundation model of human cognition

    Authors: Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K. Eckstein, Noémi Éltető, Thomas L. Griffiths, Susanne Haridi, Akshay K. Jagadish, Li Ji-An, Alexander Kipnis, Sreejan Kumar, Tobias Ludwig, Marvin Mathony, Marcelo Mattar, Alireza Modirshanechi, Surabhi S. Nath, Joshua C. Peterson, Milena Rmus, Evan M. Russek, Tankred Saanum , et al. (15 additional authors not shown)

    Abstract: Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. A first step in this direction is to create a model that can predict human behavior in a wide range of settings. Here we introduce Centa… ▽ More

    Submitted 28 April, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

  11. arXiv:2407.12844  [pdf, other

    cs.CL cs.LG stat.ML

    metabench -- A Sparse Benchmark of Reasoning and Knowledge in Large Language Models

    Authors: Alex Kipnis, Konstantinos Voudouris, Luca M. Schulze Buschoff, Eric Schulz

    Abstract: Large Language Models (LLMs) vary in their abilities on a range of tasks. Initiatives such as the Open LLM Leaderboard aim to quantify these differences with several large benchmarks (sets of test items to which an LLM can respond either correctly or incorrectly). However, high correlations within and between benchmark scores suggest that (1) there exists a small set of common underlying abilities… ▽ More

    Submitted 20 February, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: accepted for publication at ICLR 2025

  12. arXiv:2312.11414  [pdf, other

    cs.AI

    The Animal-AI Environment: A Virtual Laboratory For Comparative Cognition and Artificial Intelligence Research

    Authors: Konstantinos Voudouris, Ibrahim Alhas, Wout Schellaert, Matteo G. Mecattaf, Ben Slater, Matthew Crosby, Joel Holmes, John Burden, Niharika Chaubey, Niall Donnelly, Matishalin Patel, Marta Halina, José Hernández-Orallo, Lucy G. Cheke

    Abstract: The Animal-AI Environment is a unique game-based research platform designed to facilitate collaboration between the artificial intelligence and comparative cognition research communities. In this paper, we present the latest version of the Animal-AI Environment, outlining several major features that make the game more engaging for humans and more complex for AI systems. These features include inte… ▽ More

    Submitted 17 January, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 37 pages, 16 figures, 6 tables

  13. arXiv:2310.06167  [pdf, other

    cs.AI

    Predictable Artificial Intelligence

    Authors: Lexin Zhou, Pablo A. Moreno-Casares, Fernando Martínez-Plumed, John Burden, Ryan Burnell, Lucy Cheke, Cèsar Ferri, Alexandru Marcoci, Behzad Mehrbakhsh, Yael Moros-Daval, Seán Ó hÉigeartaigh, Danaja Rutar, Wout Schellaert, Konstantinos Voudouris, José Hernández-Orallo

    Abstract: We introduce the fundamental ideas and challenges of Predictable AI, a nascent research area that explores the ways in which we can anticipate key validity indicators (e.g., performance, safety) of present and future AI ecosystems. We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems, and thus should be prioritised over pe… ▽ More

    Submitted 6 January, 2025; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Paper Under Review

    ACM Class: I.2

  14. arXiv:2309.11975  [pdf, ps, other

    cs.AI

    Inferring Capabilities from Task Performance with Bayesian Triangulation

    Authors: John Burden, Konstantinos Voudouris, Ryan Burnell, Danaja Rutar, Lucy Cheke, José Hernández-Orallo

    Abstract: As machine learning models become more general, we need to characterise them in richer, more meaningful ways. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts that model how task-instance features interact with system capabilities to affect performance. These features must be triangulated in complex ways to b… ▽ More

    Submitted 8 October, 2025; v1 submitted 21 September, 2023; originally announced September 2023.

  15. Harms from Increasingly Agentic Algorithmic Systems

    Authors: Alan Chan, Rebecca Salganik, Alva Markelius, Chris Pang, Nitarshan Rajkumar, Dmitrii Krasheninnikov, Lauro Langosco, Zhonghao He, Yawen Duan, Micah Carroll, Michelle Lin, Alex Mayhew, Katherine Collins, Maryam Molamohammadi, John Burden, Wanru Zhao, Shalaleh Rismani, Konstantinos Voudouris, Umang Bhatt, Adrian Weller, David Krueger, Tegan Maharaj

    Abstract: Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems… ▽ More

    Submitted 11 May, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted at FAccT 2023