Skip to main content

Showing 1–7 of 7 results for author: Pochinkov, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2602.12413  [pdf, ps, other

    cs.LG cs.AI

    Soft Contamination Means Benchmarks Test Shallow Generalization

    Authors: Ari Spiesberger, Juan J. Vazquez, Nicky Pochinkov, Tomáš Gavenčiak, Peli Grietzer, Gavin Leech, Nandi Schoots

    Abstract: If LLM training data is polluted with benchmark test data, then benchmark performance gives biased estimates of out-of-distribution (OOD) generalization. Typical decontamination filters use n-gram matching which fail to detect semantic duplicates: sentences with equivalent (or near-equivalent) content that are not close in string space. We study this soft contamination of training data by semantic… ▽ More

    Submitted 12 February, 2026; originally announced February 2026.

  2. arXiv:2511.16540  [pdf, ps, other

    cs.CL cs.LG

    Beyond Tokens in Language Models: Interpreting Activations through Text Genre Chunks

    Authors: Éloïse Benito-Rodriguez, Einar Urdshals, Jasmina Nasufi, Nicky Pochinkov

    Abstract: Understanding Large Language Models (LLMs) is key to ensure their safe and beneficial deployment. This task is complicated by the difficulty of interpretability of LLM structures, and the inability to have all their outputs human-evaluated. In this paper, we present the first step towards a predictive framework, where the genre of a text used to prompt an LLM, is predicted based on its activations… ▽ More

    Submitted 20 November, 2025; originally announced November 2025.

    Comments: 13 pages, 5 figures

    MSC Class: cs.LG

  3. arXiv:2511.00180  [pdf, ps, other

    cs.CL cs.LG

    ParaScopes: What do Language Models Activations Encode About Future Text?

    Authors: Nicky Pochinkov, Yulia Volkova, Anna Vasileva, Sai V R Chereddy

    Abstract: Interpretability studies in language models often investigate forward-looking representations of activations. However, as language models become capable of doing ever longer time horizon tasks, methods for understanding activations often remain limited to testing specific concepts or tokens. We develop a framework of Residual Stream Decoders as a method of probing model activations for paragraph-s… ▽ More

    Submitted 31 October, 2025; originally announced November 2025.

    Comments: Main paper: 9 pages, 10 figures. Total 24 pages

  4. arXiv:2409.06328  [pdf, other

    cs.CL

    Extracting Paragraphs from LLM Token Activations

    Authors: Nicholas Pochinkov, Angelo Benoit, Lovkush Agarwal, Zainab Ali Majid, Lucile Ter-Minassian

    Abstract: Generative large language models (LLMs) excel in natural language processing tasks, yet their inner workings remain underexplored beyond token-level predictions. This study investigates the degree to which these models decide the content of a paragraph at its onset, shedding light on their contextual understanding. By examining the information encoded in single-token activations, specifically the… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  5. arXiv:2408.17324  [pdf, other

    cs.LG cs.AI cs.CL

    Modularity in Transformers: Investigating Neuron Separability & Specialization

    Authors: Nicholas Pochinkov, Thomas Jones, Mohammed Rashidur Rahman

    Abstract: Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited. This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models. Using a combination of selective pruning and MoEfication clustering techniques, we analyze… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 11 pages, 6 figures

    MSC Class: 68T07 (Primary) 68Q32; 68T05 (Secondary) ACM Class: I.2.4; I.2.6; I.2.7

  6. arXiv:2408.17322  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering

    Authors: Nicholas Pochinkov, Ben Pasero, Skylar Shibayama

    Abstract: The use of transformer-based models is growing rapidly throughout society. With this growth, it is important to understand how they work, and in particular, how the attention mechanisms represent concepts. Though there are many interpretability methods, many look at models through their neuronal activations, which are poorly understood. We describe different lenses through which to view neuron act… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 9 pages, 2 figures, XAI World Conference 2024 Late-Breaking Work

    MSC Class: 68T07 (Primary) 68T30; 68T50 (Secondary) ACM Class: I.2.4; I.2.6; I.2.7

  7. arXiv:2403.01267  [pdf, other

    cs.LG cs.CL

    Dissecting Language Models: Machine Unlearning via Selective Pruning

    Authors: Nicholas Pochinkov, Nandi Schoots

    Abstract: Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall netw… ▽ More

    Submitted 24 July, 2024; v1 submitted 2 March, 2024; originally announced March 2024.