arXiv:2604.06648 [pdf, ps, other]

Euclid Quick Data Release (Q1). AgileLens: A scalable CNN-based pipeline for strong gravitational lens identification

Authors: Euclid Collaboration, X. Xu, R. Chen, T. Li, A. R. Cooray, S. Schuldt, J. A. Acevedo Barroso, D. Stern, D. Scott, M. Meneghetti, G. Despali, J. Chopra, Y. Cao, M. Cheng, J. Buda, J. Zhang, J. Furumizo, R. Valencia, Z. Jiang, C. Tortora, N. E. P. Lines, T. E. Collett, S. Fotopoulou, A. Galan, A. Manjón-García , et al. (286 additional authors not shown)

Abstract: We present an end-to-end, iterative pipeline for efficient identification of strong galaxy--galaxy lensing systems, applied to the Euclid Q1 imaging data. Starting from VIS catalogues, we reject point sources, apply a magnitude cut (I$_E$ $\leq$ 24) on deflectors, and run a pixel-level artefact/noise filter to build 96 $\times$ 96 pix cutouts; VIS+NISP colour composites are constructed with a VIS-… ▽ More We present an end-to-end, iterative pipeline for efficient identification of strong galaxy--galaxy lensing systems, applied to the Euclid Q1 imaging data. Starting from VIS catalogues, we reject point sources, apply a magnitude cut (I$_E$ $\leq$ 24) on deflectors, and run a pixel-level artefact/noise filter to build 96 $\times$ 96 pix cutouts; VIS+NISP colour composites are constructed with a VIS-anchored luminance scheme that preserves VIS morphology and NISP colour contrast. A VIS-only seed classifier supplies clear positives and typical impostors, from which we curate a morphology-balanced negative set and augment scarce positives. Among the six CNNs studied initially, a modified VGG16 (GlobalAveragePooling + 256/128 dense layers with the last nine layers trainable) performs best; the training set grows from 27 seed lenses (augmented to 1809) plus 2000 negatives to a colour dataset of 30,686 images. After three rounds of iterative fine-tuning, human grading of the top 4000 candidates ranked by the final model yields 441 Grade A/B candidate lensing systems, including 311 overlapping with the existing Q1 strong-lens catalogue, and 130 additional A/B candidates (9 As and 121 Bs) not previously reported. Independently, the model recovers 740 out of 905 (81.8%) candidate Q1 lenses within its top 20,000 predictions, considering off-centred samples. Candidates span I$_E$ $\simeq$ 17--24 AB mag (median 21.3 AB mag) and are redder in Y$_E$--H$_E$ than the parent population, consistent with massive early-type deflectors. Each training iteration required a week for a small team, and the approach easily scales to future Euclid releases; future work will calibrate the selection function via lens injection, extend recall through uncertainty-aware active learning, explore multi-scale or attention-based neural networks with fast post-hoc vetters that incorporate lens models into the classification. △ Less

Submitted 7 April, 2026; originally announced April 2026.

Comments: 30 pages, 16 figures

arXiv:2604.04826 [pdf, ps, other]

Efficient Multi-Objective Planning with Weighted Maximization Using Large Neighbourhood Search

Authors: Krishna Kalavadia, Shamak Dutta, Yash Vardhan Pant, Stephen L. Smith

Abstract: Autonomous navigation often requires the simultaneous optimization of multiple objectives. The most common approach scalarizes these into a single cost function using a weighted sum, but this method is unable to find all possible trade-offs and can therefore miss critical solutions. An alternative, the weighted maximum of objectives, can find all Pareto-optimal solutions, including those in non-co… ▽ More Autonomous navigation often requires the simultaneous optimization of multiple objectives. The most common approach scalarizes these into a single cost function using a weighted sum, but this method is unable to find all possible trade-offs and can therefore miss critical solutions. An alternative, the weighted maximum of objectives, can find all Pareto-optimal solutions, including those in non-convex regions of the trade-off space that weighted sum methods cannot find. However, the increased computational complexity of finding weighted maximum solutions in the discrete domain has limited its practical use. To address this challenge, we propose a novel search algorithm based on the Large Neighbourhood Search framework that efficiently solves the weighted maximum planning problem. Through extensive simulations, we demonstrate that our algorithm achieves comparable solution quality to existing weighted maximum planners with a runtime improvement of 1-2 orders of magnitude, making it a viable option for autonomous navigation. △ Less

Submitted 6 April, 2026; originally announced April 2026.

arXiv:2604.04604 [pdf, ps, other]

AI Agents Under EU Law

Authors: Luca Nannini, Adam Leon Smith, Michele Joshua Maggini, Enrico Panai, Sandra Feliciano, Aleksandr Tiulkanov, Elena Maran, James Gealy, Piercosma Bisconti

Abstract: AI agents - i.e. AI systems that autonomously plan, invoke external tools, and execute multi-step action chains with reduced human involvement - are being deployed at scale across enterprise functions ranging from customer service and recruitment to clinical decision support and critical infrastructure management. The EU AI Act (Regulation 2024/1689) regulates these systems through a risk-based fr… ▽ More AI agents - i.e. AI systems that autonomously plan, invoke external tools, and execute multi-step action chains with reduced human involvement - are being deployed at scale across enterprise functions ranging from customer service and recruitment to clinical decision support and critical infrastructure management. The EU AI Act (Regulation 2024/1689) regulates these systems through a risk-based framework, but it does not operate in isolation: providers face simultaneous obligations under the GDPR, the Cyber Resilience Act, the Digital Services Act, the Data Act, the Data Governance Act, sector-specific legislation, the NIS2 Directive, and the revised Product Liability Directive. This paper provides the first systematic regulatory mapping for AI agent providers integrating (a) draft harmonised standards under Standardisation Request M/613 to CEN/CENELEC JTC 21 as of January 2026, (b) the GPAI Code of Practice published in July 2025, (c) the CRA harmonised standards programme under Mandate M/606 accepted in April 2025, and (d) the Digital Omnibus proposals of November 2025. We present a practical taxonomy of nine agent deployment categories mapping concrete actions to regulatory triggers, identify agent-specific compliance challenges in cybersecurity, human oversight, transparency across multi-party action chains, and runtime behavioral drift. We propose a twelve-step compliance architecture and a regulatory trigger mapping connecting agent actions to applicable legislation. We conclude that high-risk agentic systems with untraceable behavioral drift cannot currently satisfy the AI Act's essential requirements, and that the provider's foundational compliance task is an exhaustive inventory of the agent's external actions, data flows, connected systems, and affected persons. △ Less

Submitted 6 April, 2026; originally announced April 2026.

Comments: Working Paper - April 2026, subject to updates (EC M/613, M/606, Digital Omnibus proposals)

arXiv:2604.03440 [pdf, ps, other]

ARES OS 2.0: An Orchestration Software Suite for Autonomous Experimentation Systems and Self-Driving Labs

Authors: Arthur W. N. Sloan, Robert W. Waelder, Morgen L. Smith, Nicholas Kleiner, Arnas Babeckis, Jason Wheeler, Daylond Hooper, Benji Maruyama

Abstract: ARES OS 2.0 (hereinafter ARES OS) is an open-source software suite to enable laboratory automation and closed-loop autonomous experimentation. Its function is to orchestrate experimental actions and data handoff between lab equipment, analysis routines, and experimental planning modules through a service-oriented architecture. ARES OS is abstracted to apply to general experimental flows common in… ▽ More ARES OS 2.0 (hereinafter ARES OS) is an open-source software suite to enable laboratory automation and closed-loop autonomous experimentation. Its function is to orchestrate experimental actions and data handoff between lab equipment, analysis routines, and experimental planning modules through a service-oriented architecture. ARES OS is abstracted to apply to general experimental flows common in materials science, chemistry, and biology and related disciplines. The core of ARES OS provides central control over all modules, along with the heavy lifting of UI creation, data management, and experimental design tools. ARES OS modules communicate with the core software over protobuf and gRPC, allowing them to be language-agnostic and user-creatable. This allows users to easily implement modules that control experimental hardware, process collected data , or plan experiments to meet their specific research needs. ARES OS lowers the barrier to entry for researchers to build their own self-driving labs, allowing them to focus on scientific programming for their use case and reducing the effort and time needed to bring an autonomous experimentation system online. △ Less

Submitted 3 April, 2026; originally announced April 2026.

arXiv:2602.17888 [pdf, ps, other]

Machine Learning Based Prediction of Surgical Outcomes in Chronic Rhinosinusitis from Clinical Data

Authors: Sayeed Shafayet Chowdhury, Karen D'Souza, V. Siva Kakumani, Snehasis Mukhopadhyay, Shiaofen Fang, Rodney J. Schlosser, Daniel M. Beswick, Jeremiah A. Alt, Jess C. Mace, Zachary M. Soler, Timothy L. Smith, Vijay R. Ramakrishnan

Abstract: Artificial intelligence (AI) has increasingly transformed medical prognostics by enabling rapid and accurate analysis across imaging and pathology. However, the investigation of machine learning predictions applied to prospectively collected, standardized data from observational clinical intervention trials remains underexplored, despite its potential to reduce costs and improve patient outcomes.… ▽ More Artificial intelligence (AI) has increasingly transformed medical prognostics by enabling rapid and accurate analysis across imaging and pathology. However, the investigation of machine learning predictions applied to prospectively collected, standardized data from observational clinical intervention trials remains underexplored, despite its potential to reduce costs and improve patient outcomes. Chronic rhinosinusitis (CRS), a persistent inflammatory disease of the paranasal sinuses lasting more than three months, imposes a substantial burden on quality of life (QoL) and societal cost. Although many patients respond to medical therapy, others with refractory symptoms often pursue surgical intervention. Surgical decision-making in CRS is complex, as it must weigh known procedural risks against uncertain individualized outcomes. In this study, we evaluated supervised machine learning models for predicting surgical benefit in CRS, using the Sino-Nasal Outcome Test-22 (SNOT-22) as the primary patient-reported outcome. Our prospectively collected cohort from an observational intervention trial comprised patients who all underwent surgery; we investigated whether models trained only on preoperative data could identify patients who might not have been recommended surgery prior to the procedure. Across multiple algorithms, including an ensemble approach, our best model achieved approximately 85% classification accuracy, providing accurate and interpretable predictions of surgical candidacy. Moreover, on a held-out set of 30 cases spanning mixed difficulty, our model achieved 80% accuracy, exceeding the average prediction accuracy of expert clinicians (75.6%), demonstrating its potential to augment clinical decision-making and support personalized CRS care. △ Less

Submitted 19 February, 2026; originally announced February 2026.

arXiv:2601.17227 [pdf, ps, other]

Hierarchical Informative Path Planning via Graph Guidance and Trajectory Optimization

Authors: Avraiem Iskandar, Shamak Dutta, Kevin Murrant, Yash Vardhan Pant, Stephen L. Smith

Abstract: We study informative path planning (IPP) with travel budgets in cluttered environments, where an agent collects measurements of a latent field modeled as a Gaussian process (GP) to reduce uncertainty at target locations. Graph-based solvers provide global guarantees but assume pre-selected measurement locations, while continuous trajectory optimization supports path-based sensing but is computatio… ▽ More We study informative path planning (IPP) with travel budgets in cluttered environments, where an agent collects measurements of a latent field modeled as a Gaussian process (GP) to reduce uncertainty at target locations. Graph-based solvers provide global guarantees but assume pre-selected measurement locations, while continuous trajectory optimization supports path-based sensing but is computationally intensive and sensitive to initialization in obstacle-dense settings. We propose a hierarchical framework with three stages: (i) graph-based global planning, (ii) segment-wise budget allocation using geometric and kernel bounds, and (iii) spline-based refinement of each segment with hard constraints and obstacle pruning. By combining global guidance with local refinement, our method achieves lower posterior uncertainty than graph-only and continuous baselines, while running faster than continuous-space solvers (up to 9x faster than gradient-based methods and 20x faster than black-box optimizers) across synthetic cluttered environments and Arctic datasets. △ Less

Submitted 23 January, 2026; originally announced January 2026.

arXiv:2512.24969 [pdf, ps, other]

Large language models and the entropy of English

Authors: Colin Scheibner, Lindsay M. Smith, William Bialek

Abstract: We use large language models (LLMs) to uncover long-ranged structure in English texts from a variety of sources. The conditional entropy or code length in many cases continues to decrease with context length at least to $N\sim 10^4$ characters, implying that there are direct dependencies or interactions across these distances. A corollary is that there are small but significant correlations betwee… ▽ More We use large language models (LLMs) to uncover long-ranged structure in English texts from a variety of sources. The conditional entropy or code length in many cases continues to decrease with context length at least to $N\sim 10^4$ characters, implying that there are direct dependencies or interactions across these distances. A corollary is that there are small but significant correlations between characters at these separations, as we show from the data independent of models. The distribution of code lengths reveals an emergent certainty about an increasing fraction of characters at large $N$. Over the course of model training, we observe different dynamics at long and short context lengths, suggesting that long-ranged structure is learned only gradually. Our results constrain efforts to build statistical physics models of LLMs or language itself. △ Less

Submitted 31 December, 2025; originally announced December 2025.

Comments: 8 pages, 6 figures

arXiv:2512.11736 [pdf, ps, other]

Bench-Push: Benchmarking Pushing-based Navigation and Manipulation Tasks for Mobile Robots

Authors: Ninghan Zhong, Steven Caro, Megnath Ramesh, Rishi Bhatnagar, Avraiem Iskandar, Stephen L. Smith

Abstract: Mobile robots are increasingly deployed in cluttered environments with movable objects, posing challenges for traditional methods that prohibit interaction. In such settings, the mobile robot must go beyond traditional obstacle avoidance, leveraging pushing or nudging strategies to accomplish its goals. While research in pushing-based robotics is growing, evaluations rely on ad hoc setups, limitin… ▽ More Mobile robots are increasingly deployed in cluttered environments with movable objects, posing challenges for traditional methods that prohibit interaction. In such settings, the mobile robot must go beyond traditional obstacle avoidance, leveraging pushing or nudging strategies to accomplish its goals. While research in pushing-based robotics is growing, evaluations rely on ad hoc setups, limiting reproducibility and cross-comparison. To address this, we present Bench-Push, the first unified benchmark for pushing-based mobile robot navigation and manipulation tasks. Bench-Push includes multiple components: 1) a comprehensive range of simulated environments that capture the fundamental challenges in pushing-based tasks, including navigating a maze with movable obstacles, autonomous ship navigation in ice-covered waters, box delivery, and area clearing, each with varying levels of complexity; 2) novel evaluation metrics to capture efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-Push to evaluate example implementations of established baselines across environments. Bench-Push is open-sourced as a Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN. △ Less

Submitted 12 December, 2025; originally announced December 2025.

Comments: Under review for ICRA 2026

arXiv:2512.10099 [pdf, ps, other]

Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation

Authors: Steven Caro, Stephen L. Smith

Abstract: Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ… ▽ More Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ a high-level reinforcement learning (RL) agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them. This architecture combines the long-term reward maximizing behaviour of RL with the generative capabilities of diffusion models. We evaluate our method in a 2D simulation environment and show that it outperforms the state-of-the-art baseline in success rate, path efficiency, and generalization across multiple environment configurations. Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation. Code, documentation, and trained models are available: https://github.com/carosteven/HeRD. △ Less

Submitted 10 December, 2025; originally announced December 2025.

Comments: 8 pages, 8 figures

arXiv:2512.10092 [pdf, ps, other]

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit

Authors: Nick Jiang, Xiaoqing Sun, Lisa Dunlap, Lewis Smith, Neel Nanda

Abstract: Analyzing large-scale text corpora is a core challenge in machine learning, crucial for tasks like identifying undesirable model behaviors or biases in training data. Current methods often rely on costly LLM-based techniques (e.g. annotating dataset differences) or dense embedding models (e.g. for clustering), which lack control over the properties of interest. We propose using sparse autoencoders… ▽ More Analyzing large-scale text corpora is a core challenge in machine learning, crucial for tasks like identifying undesirable model behaviors or biases in training data. Current methods often rely on costly LLM-based techniques (e.g. annotating dataset differences) or dense embedding models (e.g. for clustering), which lack control over the properties of interest. We propose using sparse autoencoders (SAEs) to create SAE embeddings: representations whose dimensions map to interpretable concepts. Through four data analysis tasks, we show that SAE embeddings are more cost-effective and reliable than LLMs and more controllable than dense embeddings. Using the large hypothesis space of SAEs, we can uncover insights such as (1) semantic differences between datasets and (2) unexpected concept correlations in documents. For instance, by comparing model responses, we find that Grok-4 clarifies ambiguities more often than nine other frontier models. Relative to LLMs, SAE embeddings uncover bigger differences at 2-8x lower cost and identify biases more reliably. Additionally, SAE embeddings are controllable: by filtering concepts, we can (3) cluster documents along axes of interest and (4) outperform dense embeddings on property-based retrieval. Using SAE embeddings, we study model behavior with two case studies: investigating how OpenAI model behavior has changed over time and finding "trigger" phrases learned by Tulu-3 (Lambert et al., 2024) from its training data. These results position SAEs as a versatile tool for unstructured data analysis and highlight the neglected importance of interpreting models through their data. △ Less

Submitted 10 December, 2025; originally announced December 2025.

Comments: Code: https://github.com/nickjiang2378/interp_embed

arXiv:2511.22662 [pdf, ps, other]

Difficulties with Evaluating a Deception Detector for AIs

Authors: Lewis Smith, Bilal Chughtai, Neel Nanda

Abstract: Building reliable deception detectors for AI systems -- methods that could predict when an AI system is being strategically deceptive without necessarily requiring behavioural evidence -- would be valuable in mitigating risks from advanced AI systems. But evaluating the reliability and efficacy of a proposed deception detector requires examples that we can confidently label as either deceptive or… ▽ More Building reliable deception detectors for AI systems -- methods that could predict when an AI system is being strategically deceptive without necessarily requiring behavioural evidence -- would be valuable in mitigating risks from advanced AI systems. But evaluating the reliability and efficacy of a proposed deception detector requires examples that we can confidently label as either deceptive or honest. We argue that we currently lack the necessary examples and further identify several concrete obstacles in collecting them. We provide evidence from conceptual arguments, analysis of existing empirical works, and analysis of novel illustrative case studies. We also discuss the potential of several proposed empirical workarounds to these problems and argue that while they seem valuable, they also seem insufficient alone. Progress on deception detection likely requires further consideration of these problems. △ Less

Submitted 16 December, 2025; v1 submitted 27 November, 2025; originally announced November 2025.

arXiv:2511.14759 [pdf, ps, other]

$π^{*}_{0.6}$: a VLA That Learns From Experience

Authors: Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, Catherine Glossop, Thomas Godden, Ivan Goryachev, Lachy Groom, Hunter Hancock, Karol Hausman, Gashon Hussein, Brian Ichter, Szymon Jakubczak, Rowan Jen , et al. (31 additional authors not shown)

Abstract: We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demon… ▽ More We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $π^{*}_{0.6}$, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $π^{*}_{0.6}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput and roughly halves the task failure rate. △ Less

Submitted 18 November, 2025; v1 submitted 18 November, 2025; originally announced November 2025.

arXiv:2511.07568 [pdf, ps, other]

Procedural Knowledge Improves Agentic LLM Workflows

Authors: Vincent Hsiao, Mark Roberts, Leslie Smith

Abstract: Large language models (LLMs) often struggle when performing agentic tasks without substantial tool support, prom-pt engineering, or fine tuning. Despite research showing that domain-dependent, procedural knowledge can dramatically increase planning efficiency, little work evaluates its potential for improving LLM performance on agentic tasks that may require implicit planning. We formalize, implem… ▽ More Large language models (LLMs) often struggle when performing agentic tasks without substantial tool support, prom-pt engineering, or fine tuning. Despite research showing that domain-dependent, procedural knowledge can dramatically increase planning efficiency, little work evaluates its potential for improving LLM performance on agentic tasks that may require implicit planning. We formalize, implement, and evaluate an agentic LLM workflow that leverages procedural knowledge in the form of a hierarchical task network (HTN). Empirical results of our implementation show that hand-coded HTNs can dramatically improve LLM performance on agentic tasks, and using HTNs can boost a 20b or 70b parameter LLM to outperform a much larger 120b parameter LLM baseline. Furthermore, LLM-created HTNs improve overall performance, though less so. The results suggest that leveraging expertise--from humans, documents, or LLMs--to curate procedural knowledge will become another important tool for improving LLM workflows. △ Less

Submitted 10 November, 2025; originally announced November 2025.

arXiv:2511.04831 [pdf, ps, other]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

Authors: NVIDIA, :, Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Muñoz, Xinjie Yao, René Zurbrügg, Nikita Rudin, Lukasz Wawrzyniak, Milad Rakhsha, Alain Denzler, Eric Heiden, Ales Borovicka, Ossama Ahmed, Iretiayo Akinola, Abrar Anwar, Mark T. Carlson, Ji Yuan Feng, Animesh Garg, Renato Gasoto, Lionel Gulich , et al. (82 additional authors not shown)

Abstract: We present Isaac Lab, the natural successor to Isaac Gym, which extends the paradigm of GPU-native robotics simulation into the era of large-scale multi-modal learning. Isaac Lab combines high-fidelity GPU parallel physics, photorealistic rendering, and a modular, composable architecture for designing environments and training robot policies. Beyond physics and rendering, the framework integrates… ▽ More We present Isaac Lab, the natural successor to Isaac Gym, which extends the paradigm of GPU-native robotics simulation into the era of large-scale multi-modal learning. Isaac Lab combines high-fidelity GPU parallel physics, photorealistic rendering, and a modular, composable architecture for designing environments and training robot policies. Beyond physics and rendering, the framework integrates actuator models, multi-frequency sensor simulation, data collection pipelines, and domain randomization tools, unifying best practices for reinforcement and imitation learning at scale within a single extensible platform. We highlight its application to a diverse set of challenges, including whole-body control, cross-embodiment mobility, contact-rich and dexterous manipulation, and the integration of human demonstrations for skill acquisition. Finally, we discuss upcoming integration with the differentiable, GPU-accelerated Newton physics engine, which promises new opportunities for scalable, data-efficient, and gradient-based approaches to robot learning. We believe Isaac Lab's combination of advanced simulation capabilities, rich sensing, and data-center scale execution will help unlock the next generation of breakthroughs in robotics research. △ Less

Submitted 6 November, 2025; originally announced November 2025.

Comments: Code and documentation are available here: https://github.com/isaac-sim/IsaacLab

arXiv:2510.19113 [pdf, ps, other]

UniqueRank: Identifying Important and Difficult-to-Replace Nodes in Attributed Graphs

Authors: Erica Cai, Benjamin A. Miller, Olga Simek, Christopher L. Smith

Abstract: Node-ranking methods that focus on structural importance are widely used in a variety of applications, from ranking webpages in search engines to identifying key molecules in biomolecular networks. In real social, supply chain, and terrorist networks, one definition of importance considers the impact on information flow or network productivity when a given node is removed. In practice, however, a… ▽ More Node-ranking methods that focus on structural importance are widely used in a variety of applications, from ranking webpages in search engines to identifying key molecules in biomolecular networks. In real social, supply chain, and terrorist networks, one definition of importance considers the impact on information flow or network productivity when a given node is removed. In practice, however, a nearby node may be able to replace another node upon removal, allowing the network to continue functioning as before. This replaceability is an aspect that existing ranking methods do not consider. To address this, we introduce UniqueRank, a Markov-Chain-based approach that captures attribute uniqueness in addition to structural importance, making top-ranked nodes harder to replace. We find that UniqueRank identifies important nodes with dissimilar attributes from its neighbors in simple symmetric networks with known ground truth. Further, on real terrorist, social, and supply chain networks, we demonstrate that removing and attempting to replace top UniqueRank nodes often yields larger efficiency reductions than removing and attempting to replace top nodes ranked by competing methods. Finally, we show UniqueRank's versatility by demonstrating its potential to identify structurally critical atoms with unique chemical environments in biomolecular structures. △ Less

Submitted 21 October, 2025; originally announced October 2025.

Comments: In submission to the IEEE, 16 pages, 14 figures

arXiv:2510.15060 [pdf]

A solution to generalized learning from small training sets found in infant repeated visual experiences of individual objects

Authors: Frangil Ramirez, Elizabeth Clerkin, David J. Crandall, Linda B. Smith

Abstract: One-year-old infants show immediate adult-like generalization of common object categories to novel instances. The field has limited understanding of how this early prowess is achieved. Here we provide evidence on infants' daily-life visual experiences for 8 early-learned object categories. Using a corpus of infant head camera images recorded at mealtimes (87 mealtimes captured by 14 infants), we q… ▽ More One-year-old infants show immediate adult-like generalization of common object categories to novel instances. The field has limited understanding of how this early prowess is achieved. Here we provide evidence on infants' daily-life visual experiences for 8 early-learned object categories. Using a corpus of infant head camera images recorded at mealtimes (87 mealtimes captured by 14 infants), we quantify the individual instances experienced by infants and the similarity structure of all images containing an instance of each category. The distribution of instances is highly skewed, containing, for each infant and category, many images of the same few objects along with fewer images of other instances. Graph theoretic measures of the similarity structure for individual categories reveal a lumpy mix of high similarity and high variability, organized into multiple but interconnected clusters of high-similarity images. In computational experiments, we show that creating training sets that include an oversampling of varied images from a single instance yields a lumpy similarity structure. We also show that these artificially-created training sets support generalization to novel instances after very few training experiences. We discuss implications for the development of visual object recognition in both humans and machines. △ Less

Submitted 27 December, 2025; v1 submitted 16 October, 2025; originally announced October 2025.

Comments: 18 pages, 5 figures, 2 tables

arXiv:2510.13677 [pdf, ps, other]

APRIL: Auxiliary Physically-Redundant Information in Loss - A physics-informed framework for parameter estimation with a gravitational-wave case study

Authors: Matteo Scialpi, Francesco Di Clemente, Leigh Smith, Michał Bejger

Abstract: Physics-Informed Neural Networks (PINNs) embed the partial differential equations (PDEs) governing the system under study directly into the training of Neural Networks, ensuring solutions that respect physical laws. While effective for single-system problems, standard PINNs scale poorly to datasets containing many realizations of the same underlying physics with varying parameters. To address this… ▽ More Physics-Informed Neural Networks (PINNs) embed the partial differential equations (PDEs) governing the system under study directly into the training of Neural Networks, ensuring solutions that respect physical laws. While effective for single-system problems, standard PINNs scale poorly to datasets containing many realizations of the same underlying physics with varying parameters. To address this limitation, we present a complementary approach by including auxiliary physically-redundant information in loss (APRIL), i.e. augment the standard supervised output-target loss with auxiliary terms which exploit exact physical redundancy relations among outputs. We mathematically demonstrate that these terms preserve the true physical minimum while reshaping the loss landscape, improving convergence toward physically consistent solutions. As a proof-of-concept, we benchmark APRIL on a fully-connected neural network for gravitational wave (GW) parameter estimation (PE). We use simulated, noise-free compact binary coalescence (CBC) signals, focusing on inspiral-frequency waveforms to recover the chirp mass $\mathcal{M}$, the total mass $M_\mathrm{tot}$, and symmetric mass ratio $η$ of the binary. In this controlled setting, we show that APRIL achieves up to an order-of-magnitude improvement in test accuracy, especially for parameters that are otherwise difficult to learn. This method provides physically consistent learning for large multi-system datasets and is well suited for future GW analyses involving realistic noise and broader parameter ranges. △ Less

Submitted 15 October, 2025; originally announced October 2025.

arXiv:2509.18499 [pdf, ps, other]

Hybrid Data can Enhance the Utility of Synthetic Data for Training Anti-Money Laundering Models

Authors: Rachel Chung, Pratyush Nidhi Sharma, Mikko Siponen, Rohit Vadodaria, Luke Smith

Abstract: Money laundering is a critical global issue for financial institutions. Automated Anti-money laundering (AML) models, like Graph Neural Networks (GNN), can be trained to identify illicit transactions in real time. A major issue for developing such models is the lack of access to training data due to privacy and confidentiality concerns. Synthetically generated data that mimics the statistical prop… ▽ More Money laundering is a critical global issue for financial institutions. Automated Anti-money laundering (AML) models, like Graph Neural Networks (GNN), can be trained to identify illicit transactions in real time. A major issue for developing such models is the lack of access to training data due to privacy and confidentiality concerns. Synthetically generated data that mimics the statistical properties of real data but preserves privacy and confidentiality has been proposed as a solution. However, training AML models on purely synthetic datasets presents its own set of challenges. This article proposes the use of hybrid datasets to augment the utility of synthetic datasets by incorporating publicly available, easily accessible, and real-world features. These additions demonstrate that hybrid datasets not only preserve privacy but also improve model utility, offering a practical pathway for financial institutions to enhance AML systems. △ Less

Submitted 22 September, 2025; originally announced September 2025.

Comments: Presented at the Association of Certified Fraud Examiners (ACFE) Research Institute Annual Meeting, Las Vegas, NV, (2024)

arXiv:2509.07282 [pdf, ps, other]

ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers

Authors: Jeff Shen, Lindsay M. Smith

Abstract: We present cryptogram solving as an ideal testbed for studying neural network reasoning and generalization; models must decrypt text encoded with substitution ciphers, choosing from 26! possible mappings without explicit access to the cipher. We develop ALICE (an Architecture for Learning Interpretable Cryptogram dEcipherment), a simple encoder-only Transformer that sets a new state-of-the-art for… ▽ More We present cryptogram solving as an ideal testbed for studying neural network reasoning and generalization; models must decrypt text encoded with substitution ciphers, choosing from 26! possible mappings without explicit access to the cipher. We develop ALICE (an Architecture for Learning Interpretable Cryptogram dEcipherment), a simple encoder-only Transformer that sets a new state-of-the-art for both accuracy and speed on this decryption problem. Surprisingly, ALICE generalizes to unseen ciphers after training on only ${\sim}1500$ unique ciphers, a minute fraction ($3.7 \times 10^{-24}$) of the possible cipher space. To enhance interpretability, we introduce a novel bijective decoding head that explicitly models permutations via the Gumbel-Sinkhorn method, enabling direct extraction of learned cipher mappings. Through early exit and probing experiments, we reveal how ALICE progressively refines its predictions in a way that appears to mirror common human strategies -- early layers place greater emphasis on letter frequencies, while later layers form word-level structures. Our architectural innovations and analysis methods are applicable beyond cryptograms and offer new insights into neural network generalization and interpretability. △ Less

Submitted 24 September, 2025; v1 submitted 8 September, 2025; originally announced September 2025.

Comments: Preprint. Project page at https://jshen.net/alice. Added section on probing

arXiv:2507.22133 [pdf, ps, other]

Prompt Optimization and Evaluation for LLM Automated Red Teaming

Authors: Michael Freenor, Lauren Alvarez, Milton Leal, Lily Smith, Joel Garrett, Yelyzaveta Husieva, Madeline Woodruff, Ryan Miller, Erich Kummerfeld, Rafael Medeiros, Sander Schulhoff

Abstract: Applications that use Large Language Models (LLMs) are becoming widespread, making the identification of system vulnerabilities increasingly important. Automated Red Teaming accelerates this effort by using an LLM to generate and execute attacks against target systems. Attack generators are evaluated using the Attack Success Rate (ASR) the sample mean calculated over the judgment of success for ea… ▽ More Applications that use Large Language Models (LLMs) are becoming widespread, making the identification of system vulnerabilities increasingly important. Automated Red Teaming accelerates this effort by using an LLM to generate and execute attacks against target systems. Attack generators are evaluated using the Attack Success Rate (ASR) the sample mean calculated over the judgment of success for each attack. In this paper, we introduce a method for optimizing attack generator prompts that applies ASR to individual attacks. By repeating each attack multiple times against a randomly seeded target, we measure an attack's discoverability the expectation of the individual attack success. This approach reveals exploitable patterns that inform prompt optimization, ultimately enabling more robust evaluation and refinement of generators. △ Less

Submitted 29 July, 2025; originally announced July 2025.

Comments: 9 pages, 5 Figures, and 1 Appendix item

arXiv:2506.22604 [pdf, ps, other]

Bootstrapping Human-Like Planning via LLMs

Authors: David Porfirio, Vincent Hsiao, Morgan Fine-Morris, Leslie Smith, Laura M. Hiatt

Abstract: Robot end users increasingly require accessible means of specifying tasks for robots to perform. Two common end-user programming paradigms include drag-and-drop interfaces and natural language programming. Although natural language interfaces harness an intuitive form of human communication, drag-and-drop interfaces enable users to meticulously and precisely dictate the key actions of the robot's… ▽ More Robot end users increasingly require accessible means of specifying tasks for robots to perform. Two common end-user programming paradigms include drag-and-drop interfaces and natural language programming. Although natural language interfaces harness an intuitive form of human communication, drag-and-drop interfaces enable users to meticulously and precisely dictate the key actions of the robot's task. In this paper, we investigate the degree to which both approaches can be combined. Specifically, we construct a large language model (LLM)-based pipeline that accepts natural language as input and produces human-like action sequences as output, specified at a level of granularity that a human would produce. We then compare these generated action sequences to another dataset of hand-specified action sequences. Although our results reveal that larger models tend to outperform smaller ones in the production of human-like action sequences, smaller models nonetheless achieve satisfactory performance. △ Less

Submitted 27 June, 2025; originally announced June 2025.

Comments: Accepted by the 2025 34th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)

arXiv:2506.15742 [pdf, ps, other]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Authors: Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, Luke Smith

Abstract: We present evaluation results for FLUX.1 Kontext, a generative flow matching model that unifies image generation and editing. The model generates novel output views by incorporating semantic context from text and image inputs. Using a simple sequence concatenation approach, FLUX.1 Kontext handles both local editing and generative in-context tasks within a single unified architecture. Compared to c… ▽ More We present evaluation results for FLUX.1 Kontext, a generative flow matching model that unifies image generation and editing. The model generates novel output views by incorporating semantic context from text and image inputs. Using a simple sequence concatenation approach, FLUX.1 Kontext handles both local editing and generative in-context tasks within a single unified architecture. Compared to current editing models that exhibit degradation in character consistency and stability across multiple turns, we observe that FLUX.1 Kontext improved preservation of objects and characters, leading to greater robustness in iterative workflows. The model achieves competitive performance with current state-of-the-art systems while delivering significantly faster generation times, enabling interactive applications and rapid prototyping workflows. To validate these improvements, we introduce KontextBench, a comprehensive benchmark with 1026 image-prompt pairs covering five task categories: local editing, global editing, character reference, style reference and text editing. Detailed evaluations show the superior performance of FLUX.1 Kontext in terms of both single-turn quality and multi-turn consistency, setting new standards for unified image processing models. △ Less

Submitted 24 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

arXiv:2506.05574 [pdf, ps, other]

When can in-context learning generalize out of task distribution?

Authors: Chase Goddard, Lindsay M. Smith, Vudtiwat Ngampruetikorn, David J. Schwab

Abstract: In-context learning (ICL) is a remarkable capability of pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for ICL to emerge and generalize \emph{out-of-distribution}. Previous work has focused on the number of distinct tasks necessary in the pretraining datas… ▽ More In-context learning (ICL) is a remarkable capability of pretrained transformers that allows models to generalize to unseen tasks after seeing only a few examples. We investigate empirically the conditions necessary on the pretraining distribution for ICL to emerge and generalize \emph{out-of-distribution}. Previous work has focused on the number of distinct tasks necessary in the pretraining dataset. Here, we use a different notion of task diversity to study the emergence of ICL in transformers trained on linear functions. We find that as task diversity increases, transformers undergo a transition from a specialized solution, which exhibits ICL only within the pretraining task distribution, to a solution which generalizes out of distribution to the entire task space. We also investigate the nature of the solutions learned by the transformer on both sides of the transition, and observe similar transitions in nonlinear regression problems. We construct a phase diagram to characterize how our concept of task diversity interacts with the number of pretraining tasks. In addition, we explore how factors such as the depth of the model and the dimensionality of the regression problem influence the transition. △ Less

Submitted 18 August, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

Comments: ICML 2025

Journal ref: Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19585-19599, 2025

arXiv:2505.15802 [pdf, ps, other]

A Deep Learning Framework for Two-Dimensional, Multi-Frequency Propagation Factor Estimation

Authors: Sarah E. Wessinger, Leslie N. Smith, Jacob Gull, Jonathan Gehman, Zachary Beever, Andrew J. Kammerer

Abstract: Accurately estimating the refractive environment over multiple frequencies within the marine atmospheric boundary layer is crucial for the effective deployment of radar technologies. Traditional parabolic equation simulations, while effective, can be computationally expensive and time-intensive, limiting their practical application. This communication explores a novel approach using deep neural ne… ▽ More Accurately estimating the refractive environment over multiple frequencies within the marine atmospheric boundary layer is crucial for the effective deployment of radar technologies. Traditional parabolic equation simulations, while effective, can be computationally expensive and time-intensive, limiting their practical application. This communication explores a novel approach using deep neural networks to estimate the pattern propagation factor, a critical parameter for characterizing environmental impacts on signal propagation. Image-to-image translation generators designed to ingest modified refractivity data and generate predictions of pattern propagation factors over the same domain were developed. Findings demonstrate that deep neural networks can be trained to analyze multiple frequencies and reasonably predict the pattern propagation factor, offering an alternative to traditional methods. △ Less

Submitted 4 September, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2505.12084

Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation

Authors: Ninghan Zhong, Steven Caro, Avraiem Iskandar, Megnath Ramesh, Stephen L. Smith

Abstract: Mobile robots are increasingly deployed in unstructured environments where obstacles and objects are movable. Navigation in such environments is known as interactive navigation, where task completion requires not only avoiding obstacles but also strategic interactions with movable objects. Non-prehensile interactive navigation focuses on non-grasping interaction strategies, such as pushing, rather… ▽ More Mobile robots are increasingly deployed in unstructured environments where obstacles and objects are movable. Navigation in such environments is known as interactive navigation, where task completion requires not only avoiding obstacles but also strategic interactions with movable objects. Non-prehensile interactive navigation focuses on non-grasping interaction strategies, such as pushing, rather than relying on prehensile manipulation. Despite a growing body of research in this field, most solutions are evaluated using case-specific setups, limiting reproducibility and cross-comparison. In this paper, we present Bench-NPIN, the first comprehensive benchmark for non-prehensile interactive navigation. Bench-NPIN includes multiple components: 1) a comprehensive range of simulated environments for non-prehensile interactive navigation tasks, including navigating a maze with movable obstacles, autonomous ship navigation in icy waters, box delivery, and area clearing, each with varying levels of complexity; 2) a set of evaluation metrics that capture unique aspects of interactive navigation, such as efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-NPIN to evaluate example implementations of established baselines across environments. Bench-NPIN is an open-source Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN. △ Less

Submitted 5 February, 2026; v1 submitted 17 May, 2025; originally announced May 2025.

Comments: This paper has been withdrawn by the authors. This paper has been superseded by arXiv:2512.11736

arXiv:2504.16054 [pdf, other]

$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

Authors: Physical Intelligence, Kevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Y. Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Allen Z. Ren , et al. (11 additional authors not shown)

Abstract: In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks… ▽ More In order for robots to be useful, they must perform practically relevant tasks in the real world, outside of the lab. While vision-language-action (VLA) models have demonstrated impressive results for end-to-end robot control, it remains an open question how far such models can generalize in the wild. We describe $π_{0.5}$, a new model based on $π_{0}$ that uses co-training on heterogeneous tasks to enable broad generalization. $π_{0.5}$\ uses data from multiple robots, high-level semantic prediction, web data, and other sources to enable broadly generalizable real-world robotic manipulation. Our system uses a combination of co-training and hybrid multi-modal examples that combine image observations, language commands, object detections, semantic subtask prediction, and low-level actions. Our experiments show that this kind of knowledge transfer is essential for effective generalization, and we demonstrate for the first time that an end-to-end learning-enabled robotic system can perform long-horizon and dexterous manipulation skills, such as cleaning a kitchen or bedroom, in entirely new homes. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2503.15321 [pdf, ps, other]

doi 10.1051/0004-6361/202554612

Euclid Quick Data Release (Q1). Active galactic nuclei identification using diffusion-based inpainting of Euclid VIS images

Authors: Euclid Collaboration, G. Stevens, S. Fotopoulou, M. N. Bremer, T. Matamoro Zatarain, K. Jahnke, B. Margalef-Bentabol, M. Huertas-Company, M. J. Smith, M. Walmsley, M. Salvato, M. Mezcua, A. Paulino-Afonso, M. Siudek, M. Talia, F. Ricci, W. Roster, N. Aghanim, B. Altieri, S. Andreon, H. Aussel, C. Baccigalupi, M. Baldi, S. Bardelli, P. Battaglia , et al. (249 additional authors not shown)

Abstract: Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an acti… ▽ More Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an active galactic nucleus (AGN) feature intense, concentrated emission from gas accretion around supermassive black holes, superimposed on regular galactic light, while quasi-stellar objects (QSO) are the extreme case of the AGN emission dominating the galaxy. The challenge of identifying AGN and QSO has been discussed many times in the literature, often requiring multi-wavelength observations. This paper introduces a novel approach to identify AGN and QSO from a single image. Diffusion models have been recently developed in the machine-learning literature to generate realistic-looking images of everyday objects. Utilising the spatial resolving power of the Euclid VIS images, we created a diffusion model trained on one million sources, without using any source pre-selection or labels. The model learns to reconstruct light distributions of normal galaxies, since the population is dominated by them. We condition the prediction of the central light distribution by masking the central few pixels of each source and reconstruct the light according to the diffusion model. We further use this prediction to identify sources that deviate from this profile by examining the reconstruction error of the few central pixels regenerated in each source's core. Our approach, solely using VIS imaging, features high completeness compared to traditional methods of AGN and QSO selection, including optical, near-infrared, mid-infrared, and X-rays. △ Less

Submitted 16 October, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

Comments: Paper Accepted as part of the A&A Special Issue `Euclid Quick Data Release (Q1)', 34 pages, 26 figures

arXiv:2502.14761 [pdf, other]

User Awareness and Perspectives Survey on Privacy, Security and Usability of Auditory Prostheses

Authors: Sohini Saha, Leslie M. Collins, Sherri L. Smith, Boyla O. Mainsah

Abstract: According to the World Health Organization, over 466 million people worldwide suffer from disabling hearing loss, with approximately 34 million of these being children. Hearing aids (HA) and cochlear implants (CI) have become indispensable tools for restoring hearing and enhancing the quality of life for individuals with hearing impairments. Clinical research and consumer studies indicate that use… ▽ More According to the World Health Organization, over 466 million people worldwide suffer from disabling hearing loss, with approximately 34 million of these being children. Hearing aids (HA) and cochlear implants (CI) have become indispensable tools for restoring hearing and enhancing the quality of life for individuals with hearing impairments. Clinical research and consumer studies indicate that users of HAs and CIs report significant improvements in their daily lives, including enhanced communication abilities and social engagement and reduced psychological stress. Modern auditory prosthetic devices are more advanced and interconnected with digital networks to add functionality, such as streaming audio directly from smartphones and other devices, remote adjustments by audiologists, integration with smart home systems, and access to artificial intelligence-driven sound enhancement features. With this interconnectivity, issues surrounding data privacy and security have become increasingly pertinent. There is limited research on the usability perceptions of current HA and CI models from the perspective of end-users. In addition, no studies have investigated consumer mental models during the purchasing process, particularly which factors they prioritize when selecting a device. In this study, we assessed participants' satisfaction levels with various features of their auditory prostheses. This work contributes to the field by addressing gaps in user perceptions of HA and CI usability, identifying key factors in consumer purchasing decisions, and highlighting the need for improved privacy and security awareness and education among users. △ Less

Submitted 20 February, 2025; originally announced February 2025.

Comments: USENIX Symposium on Usable Privacy and Security (SOUPS) 2024

arXiv:2502.01912 [pdf, ps, other]

PATCH: a deep learning method to assess heterogeneity of artistic practice in historical paintings

Authors: Andrew Van Horn, Lauryn Smith, Mahamad Mahmoud, Michael McMaster, Clara Pinchbeck, Ina Martin, Andrew Lininger, Anthony Ingrisano, Adam Lowe, Carlos Bayod, Elizabeth Bolman, Kenneth Singer, Michael Hinczewski

Abstract: The history of art has seen significant shifts in the manner in which artworks are created, making understanding of creative processes a central question in technical art history. In the Renaissance and Early Modern period, paintings were largely produced by master painters directing workshops of apprentices who often contributed to projects. The masters varied significantly in artistic and manage… ▽ More The history of art has seen significant shifts in the manner in which artworks are created, making understanding of creative processes a central question in technical art history. In the Renaissance and Early Modern period, paintings were largely produced by master painters directing workshops of apprentices who often contributed to projects. The masters varied significantly in artistic and managerial styles, meaning different combinations of artists and implements might be seen both between masters and within workshops or even individual canvases. Information on how different workshops were managed and the processes by which artworks were created remains elusive. Machine learning methods have potential to unearth new information about artists' creative processes by extending the analysis of brushwork to a microscopic scale. Analysis of workshop paintings, however, presents a challenge in that documentation of the artists and materials involved is sparse, meaning external examples are not available to train networks to recognize their contributions. Here we present a novel machine learning approach we call pairwise assignment training for classifying heterogeneity (PATCH) that is capable of identifying individual artistic practice regimes with no external training data, or "ground truth." The method achieves unsupervised results by supervised means, and outperforms both simple statistical procedures and unsupervised machine learning methods. We apply this method to two historical paintings by the Spanish Renaissance master, El Greco: The Baptism of Christ and Christ on the Cross with Landscape, and our findings regarding the former potentially challenge previous work that has assigned the painting to workshop members. Further, the results of our analyses create a measure of heterogeneity of artistic practice that can be used to characterize artworks across time and space. △ Less

Submitted 16 July, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

Comments: main text: 15 pages, 5 figures; SI: 10 pages, 4 figures; v2: minor typo corrections, higher resolution figures; v3: additional comparisons with alternative methods

arXiv:2501.13009 [pdf, other]

Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects

Authors: Louis Aberdeen, Mark Hansen, Melvyn L. Smith, Lyndon Smith

Abstract: As the density of spacecraft in Earth's orbit increases, their recognition, pose and trajectory identification becomes crucial for averting potential collisions and executing debris removal operations. However, training models able to identify a spacecraft and its pose presents a significant challenge due to a lack of available image data for model training. This paper puts forth an innovative fra… ▽ More As the density of spacecraft in Earth's orbit increases, their recognition, pose and trajectory identification becomes crucial for averting potential collisions and executing debris removal operations. However, training models able to identify a spacecraft and its pose presents a significant challenge due to a lack of available image data for model training. This paper puts forth an innovative framework for generating realistic synthetic datasets of Resident Space Object (RSO) imagery. Using the International Space Station (ISS) as a test case, it goes on to combine image regression with image restoration methodologies to estimate pose from blurred images. An analysis of the proposed image recovery and regression techniques was undertaken, providing insights into the performance, potential enhancements and limitations when applied to real imagery of RSOs. The image recovery approach investigated involves first applying image deconvolution using an effective point spread function, followed by detail object extraction with a U-Net. Interestingly, using only U-Net for image reconstruction the best pose performance was attained, reducing the average Mean Squared Error in image recovery by 97.28% and the average angular error by 71.9%. The successful application of U-Net image restoration combined with the Resnet50 regression network for pose estimation of the International Space Station demonstrates the value of a diverse set of evaluation tools for effective solutions to real-world problems such as the analysis of distant objects in Earth's orbit. △ Less

Submitted 22 January, 2025; originally announced January 2025.

Comments: 10 pages, 13 figures

MSC Class: 68T07 (Primary) 68T45 (Secondary) ACM Class: I.4.4; I.6.4

arXiv:2412.18972 [pdf, other]

Recommending Pre-Trained Models for IoT Devices

Authors: Parth V. Patil, Wenxin Jiang, Huiyun Peng, Daniel Lugo, Kelechi G. Kalu, Josh LeBlanc, Lawrence Smith, Hyeonwoo Heo, Nathanael Aou, James C. Davis

Abstract: The availability of pre-trained models (PTMs) has enabled faster deployment of machine learning across applications by reducing the need for extensive training. Techniques like quantization and distillation have further expanded PTM applicability to resource-constrained IoT hardware. Given the many PTM options for any given task, engineers often find it too costly to evaluate each model's suitabil… ▽ More The availability of pre-trained models (PTMs) has enabled faster deployment of machine learning across applications by reducing the need for extensive training. Techniques like quantization and distillation have further expanded PTM applicability to resource-constrained IoT hardware. Given the many PTM options for any given task, engineers often find it too costly to evaluate each model's suitability. Approaches such as LogME, LEEP, and ModelSpider help streamline model selection by estimating task relevance without exhaustive tuning. However, these methods largely leave hardware constraints as future work-a significant limitation in IoT settings. In this paper, we identify the limitations of current model recommendation approaches regarding hardware constraints and introduce a novel, hardware-aware method for PTM selection. We also propose a research agenda to guide the development of effective, hardware-conscious model recommendation systems for IoT applications. △ Less

Submitted 25 December, 2024; originally announced December 2024.

Comments: Accepted at SERP4IOT'25

arXiv:2411.17155 [pdf, ps, other]

doi 10.1109/TRO.2025.3613472

AUTO-IceNav: A Local Navigation Strategy for Autonomous Surface Ships in Broken Ice Fields

Authors: Rodrigue de Schaetzen, Alexander Botros, Ninghan Zhong, Kevin Murrant, Robert Gash, Stephen L. Smith

Abstract: Ice conditions often require ships to reduce speed and deviate from their main course to avoid damage to the ship. In addition, broken ice fields are becoming the dominant ice conditions encountered in the Arctic, where the effects of collisions with ice are highly dependent on where contact occurs and on the particular features of the ice floes. In this paper, we present AUTO-IceNav, a framework… ▽ More Ice conditions often require ships to reduce speed and deviate from their main course to avoid damage to the ship. In addition, broken ice fields are becoming the dominant ice conditions encountered in the Arctic, where the effects of collisions with ice are highly dependent on where contact occurs and on the particular features of the ice floes. In this paper, we present AUTO-IceNav, a framework for the autonomous navigation of ships operating in ice floe fields. Trajectories are computed in a receding-horizon manner, where we frequently replan given updated ice field data. During a planning step, we assume a nominal speed that is safe with respect to the current ice conditions, and compute a reference path. We formulate a novel cost function that minimizes the kinetic energy loss of the ship from ship-ice collisions and incorporate this cost as part of our lattice-based path planner. The solution computed by the lattice planning stage is then used as an initial guess in our proposed optimization-based improvement step, producing a locally optimal path. Extensive experiments were conducted both in simulation and in a physical testbed to validate our approach. △ Less

Submitted 16 September, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

Comments: 20 pages, 18 figures

arXiv:2411.06211 [pdf, other]

doi 10.1017/S0269888924000110

Artificial Intelligence for Collective Intelligence: A National-Scale Research Strategy

Authors: Seth Bullock, Nirav Ajmeri, Mike Batty, Michaela Black, John Cartlidge, Robert Challen, Cangxiong Chen, Jing Chen, Joan Condell, Leon Danon, Adam Dennett, Alison Heppenstall, Paul Marshall, Phil Morgan, Aisling O'Kane, Laura G. E. Smith, Theresa Smith, Hywel T. P. Williams

Abstract: Advances in artificial intelligence (AI) have great potential to help address societal challenges that are both collective in nature and present at national or trans-national scale. Pressing challenges in healthcare, finance, infrastructure and sustainability, for instance, might all be productively addressed by leveraging and amplifying AI for national-scale collective intelligence. The developme… ▽ More Advances in artificial intelligence (AI) have great potential to help address societal challenges that are both collective in nature and present at national or trans-national scale. Pressing challenges in healthcare, finance, infrastructure and sustainability, for instance, might all be productively addressed by leveraging and amplifying AI for national-scale collective intelligence. The development and deployment of this kind of AI faces distinctive challenges, both technical and socio-technical. Here, a research strategy for mobilising inter-disciplinary research to address these challenges is detailed and some of the key issues that must be faced are outlined. △ Less

Submitted 9 November, 2024; originally announced November 2024.

Comments: 25 pages, 3 figures, Accepted for publication at Knowledge Engineering Review (KER)

Journal ref: The Knowledge Engineering Review 39 (2024) e10

arXiv:2411.03409 [pdf, other]

STEER: Flexible Robotic Manipulation via Dense Language Grounding

Authors: Laura Smith, Alex Irpan, Montserrat Gonzalez Arenas, Sean Kirmani, Dmitry Kalashnikov, Dhruv Shah, Ted Xiao

Abstract: The complexity of the real world demands robotic systems that can intelligently adapt to unseen situations. We present STEER, a robot learning framework that bridges high-level, commonsense reasoning with precise, flexible low-level control. Our approach translates complex situational awareness into actionable low-level behavior through training language-grounded policies with dense annotation. By… ▽ More The complexity of the real world demands robotic systems that can intelligently adapt to unseen situations. We present STEER, a robot learning framework that bridges high-level, commonsense reasoning with precise, flexible low-level control. Our approach translates complex situational awareness into actionable low-level behavior through training language-grounded policies with dense annotation. By structuring policy training around fundamental, modular manipulation skills expressed in natural language, STEER exposes an expressive interface for humans or Vision-Language Models (VLMs) to intelligently orchestrate the robot's behavior by reasoning about the task and context. Our experiments demonstrate the skills learned via STEER can be combined to synthesize novel behaviors to adapt to new situations or perform completely new tasks without additional data collection or training. △ Less

Submitted 5 November, 2024; originally announced November 2024.

Comments: Project website: https://lauramsmith.github.io/steer/

arXiv:2411.02704 [pdf, other]

RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation

Authors: Soroush Nasiriany, Sean Kirmani, Tianli Ding, Laura Smith, Yuke Zhu, Danny Driess, Dorsa Sadigh, Ted Xiao

Abstract: We explore how intermediate policy representations can facilitate generalization by providing guidance on how to perform manipulation tasks. Existing representations such as language, goal images, and trajectory sketches have been shown to be helpful, but these representations either do not provide enough context or provide over-specified context that yields less robust policies. We propose condit… ▽ More We explore how intermediate policy representations can facilitate generalization by providing guidance on how to perform manipulation tasks. Existing representations such as language, goal images, and trajectory sketches have been shown to be helpful, but these representations either do not provide enough context or provide over-specified context that yields less robust policies. We propose conditioning policies on affordances, which capture the pose of the robot at key stages of the task. Affordances offer expressive yet lightweight abstractions, are easy for users to specify, and facilitate efficient learning by transferring knowledge from large internet datasets. Our method, RT-Affordance, is a hierarchical model that first proposes an affordance plan given the task language, and then conditions the policy on this affordance plan to perform manipulation. Our model can flexibly bridge heterogeneous sources of supervision including large web datasets and robot trajectories. We additionally train our model on cheap-to-collect in-domain affordance images, allowing us to learn new tasks without collecting any additional costly robot trajectories. We show on a diverse set of novel tasks how RT-Affordance exceeds the performance of existing methods by over 50%, and we empirically demonstrate that affordances are robust to novel settings. Videos available at https://snasiriany.me/rt-affordance △ Less

Submitted 4 November, 2024; originally announced November 2024.

arXiv:2410.10621 [pdf, other]

Traversability-Aware Legged Navigation by Learning from Real-World Visual Data

Authors: Hongbo Zhang, Zhongyu Li, Xuanqi Zeng, Laura Smith, Kyle Stachowicz, Dhruv Shah, Linzhu Yue, Zhitao Song, Weipeng Xia, Sergey Levine, Koushil Sreenath, Yun-hui Liu

Abstract: The enhanced mobility brought by legged locomotion empowers quadrupedal robots to navigate through complex and unstructured environments. However, optimizing agile locomotion while accounting for the varying energy costs of traversing different terrains remains an open challenge. Most previous work focuses on planning trajectories with traversability cost estimation based on human-labeled environm… ▽ More The enhanced mobility brought by legged locomotion empowers quadrupedal robots to navigate through complex and unstructured environments. However, optimizing agile locomotion while accounting for the varying energy costs of traversing different terrains remains an open challenge. Most previous work focuses on planning trajectories with traversability cost estimation based on human-labeled environmental features. However, this human-centric approach is insufficient because it does not account for the varying capabilities of the robot locomotion controllers over challenging terrains. To address this, we develop a novel traversability estimator in a robot-centric manner, based on the value function of the robot's locomotion controller. This estimator is integrated into a new learning-based RGBD navigation framework. The framework employs multiple training stages to develop a planner that guides the robot in avoiding obstacles and hard-to-traverse terrains while reaching its goals. The training of the navigation planner is directly performed in the real world using a sample efficient reinforcement learning method that utilizes both online data and offline datasets. Through extensive benchmarking, we demonstrate that the proposed framework achieves the best performance in accurate traversability cost estimation and efficient learning from multi-modal data (including the robot's color and depth vision, as well as proprioceptive feedback) for real-world training. Using the proposed method, a quadrupedal robot learns to perform traversability-aware navigation through trial and error in various real-world environments with challenging terrains that are difficult to classify using depth vision alone. Moreover, the robot demonstrates the ability to generalize the learned navigation skills to unseen scenarios. Video can be found at https://youtu.be/RSqnIWZ1qks. △ Less

Submitted 11 November, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

arXiv:2409.11326 [pdf, other]

Autonomous Navigation in Ice-Covered Waters with Learned Predictions on Ship-Ice Interactions

Authors: Ninghan Zhong, Alessandro Potenza, Stephen L. Smith

Abstract: Autonomous navigation in ice-covered waters poses significant challenges due to the frequent lack of viable collision-free trajectories. When complete obstacle avoidance is infeasible, it becomes imperative for the navigation strategy to minimize collisions. Additionally, the dynamic nature of ice, which moves in response to ship maneuvers, complicates the path planning process. To address these c… ▽ More Autonomous navigation in ice-covered waters poses significant challenges due to the frequent lack of viable collision-free trajectories. When complete obstacle avoidance is infeasible, it becomes imperative for the navigation strategy to minimize collisions. Additionally, the dynamic nature of ice, which moves in response to ship maneuvers, complicates the path planning process. To address these challenges, we propose a novel deep learning model to estimate the coarse dynamics of ice movements triggered by ship actions through occupancy estimation. To ensure real-time applicability, we propose a novel approach that caches intermediate prediction results and seamlessly integrates the predictive model into a graph search planner. We evaluate the proposed planner both in simulation and in a physical testbed against existing approaches and show that our planner significantly reduces collisions with ice when compared to the state-of-the-art. Codes and demos of this work are available at https://github.com/IvanIZ/predictive-asv-planner. △ Less

Submitted 18 September, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.06084 [pdf, ps, other]

doi 10.1063/5.0242345

Symmetry constrained neural networks for detection and localization of damage in metal plates

Authors: James Amarel, Christopher Rudolf, Athanasios Iliopoulos, John Michopoulos, Leslie N. Smith

Abstract: The present paper is concerned with deep learning techniques applied to detection and localization of damage in a thin aluminum plate. We used data collected on a tabletop apparatus by mounting to the plate four piezoelectric transducers, each of which took turn to generate a Lamb wave that then traversed the region of interest before being received by the remaining three sensors. On training a ne… ▽ More The present paper is concerned with deep learning techniques applied to detection and localization of damage in a thin aluminum plate. We used data collected on a tabletop apparatus by mounting to the plate four piezoelectric transducers, each of which took turn to generate a Lamb wave that then traversed the region of interest before being received by the remaining three sensors. On training a neural network to analyze time-series data of the material response, which displayed damage-reflective features whenever the plate guided waves interacted with a contact load, we achieved a model that detected with greater than $99\%$ accuracy in addition to a model that localized with $2.58 \pm 0.12$ mm mean distance error. For each task, the best-performing model was designed according to the inductive bias that our transducers were both similar and arranged in a square pattern on a nearly uniform plate. △ Less

Submitted 26 May, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

Journal ref: APL Mach. Learn. 1 June 2025; 3 (2): 026106

arXiv:2409.03120 [pdf, other]

Approximate Environment Decompositions for Robot Coverage Planning using Submodular Set Cover

Authors: Megnath Ramesh, Frank Imeson, Baris Fidan, Stephen L. Smith

Abstract: In this paper, we investigate the problem of decomposing 2D environments for robot coverage planning. Coverage path planning (CPP) involves computing a cost-minimizing path for a robot equipped with a coverage or sensing tool so that the tool visits all points in the environment. CPP is an NP-Hard problem, so existing approaches simplify the problem by decomposing the environment into the minimum… ▽ More In this paper, we investigate the problem of decomposing 2D environments for robot coverage planning. Coverage path planning (CPP) involves computing a cost-minimizing path for a robot equipped with a coverage or sensing tool so that the tool visits all points in the environment. CPP is an NP-Hard problem, so existing approaches simplify the problem by decomposing the environment into the minimum number of sectors. Sectors are sub-regions of the environment that can each be covered using a lawnmower path (i.e., along parallel straight-line paths) oriented at an angle. However, traditional methods either limit the coverage orientations to be axis-parallel (horizontal/vertical) or provide no guarantees on the number of sectors in the decomposition. We introduce an approach to decompose the environment into possibly overlapping rectangular sectors. We provide an approximation guarantee on the number of sectors computed using our approach for a given environment. We do this by leveraging the submodular property of the sector coverage function, which enables us to formulate the decomposition problem as a submodular set cover (SSC) problem with well-known approximation guarantees for the greedy algorithm. Our approach improves upon existing coverage planning methods, as demonstrated through an evaluation using maps of complex real-world environments. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: Extended version of the 2024 IEEE CDC paper, 8 pages, 3 figures

arXiv:2409.03055 [pdf, other]

SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints

Authors: Haonan Chen, Jordan B. L. Smith, Janne Spijkervet, Ju-Chiang Wang, Pei Zou, Bochen Li, Qiuqiang Kong, Xingjian Du

Abstract: Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences.… ▽ More Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences. To the best of our knowledge, this work is the first to demonstrate the feasibility of training symbolic generation models solely from auto-transcribed audio data. Furthermore, to enhance the controllability of the trained model, we introduce SymPAC (Symbolic Music Language Model with Prompting And Constrained Generation), which is distinguished by using (a) prompt bars in encoding and (b) a technique called Constrained Generation via Finite State Machines (FSMs) during inference time. We show the flexibility and controllability of this approach, which may be critical in making music AI useful to creators and users. △ Less

Submitted 9 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

Comments: ISMIR 2024

arXiv:2409.02850 [pdf, other]

Oops, I Sampled it Again: Reinterpreting Confidence Intervals in Few-Shot Learning

Authors: Raphael Lafargue, Luke Smith, Franck Vermet, Mathias Löwe, Ian Reid, Vincent Gripon, Jack Valmadre

Abstract: The predominant method for computing confidence intervals (CI) in few-shot learning (FSL) is based on sampling the tasks with replacement, i.e.\ allowing the same samples to appear in multiple tasks. This makes the CI misleading in that it takes into account the randomness of the sampler but not the data itself. To quantify the extent of this problem, we conduct a comparative analysis between CIs… ▽ More The predominant method for computing confidence intervals (CI) in few-shot learning (FSL) is based on sampling the tasks with replacement, i.e.\ allowing the same samples to appear in multiple tasks. This makes the CI misleading in that it takes into account the randomness of the sampler but not the data itself. To quantify the extent of this problem, we conduct a comparative analysis between CIs computed with and without replacement. These reveal a notable underestimation by the predominant method. This observation calls for a reevaluation of how we interpret confidence intervals and the resulting conclusions in FSL comparative studies. Our research demonstrates that the use of paired tests can partially address this issue. Additionally, we explore methods to further reduce the (size of the) CI by strategically sampling tasks of a specific size. We also introduce a new optimized benchmark, which can be accessed at https://github.com/RafLaf/FSL-benchmark-again △ Less

Submitted 6 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

MSC Class: 68T06 ACM Class: I.2; I.4; I.5; G.3

arXiv:2409.01161 [pdf, ps, other]

doi 10.1145/3689727

Mix Testing: Specifying and Testing ABI Compatibility of C/C++ Atomics Implementations

Authors: Luke Geeson, James Brotherston, Wilco Dijkstra, Alastair F. Donaldson, Lee Smith, Tyler Sorensen, John Wickerson

Abstract: The correctness of complex software depends on the correctness of both the source code and the compilers that generate corresponding binary code. Compilers must do more than preserve the semantics of a single source file: they must ensure that generated binaries can be composed with other binaries to form a final executable. The compatibility of composition is ensured using an Application Binary I… ▽ More The correctness of complex software depends on the correctness of both the source code and the compilers that generate corresponding binary code. Compilers must do more than preserve the semantics of a single source file: they must ensure that generated binaries can be composed with other binaries to form a final executable. The compatibility of composition is ensured using an Application Binary Interface (ABI), which specifies details of calling conventions, exception handling, and so on. Unfortunately, there are no official ABIs for concurrent programs, so different atomics mappings, although correct in isolation, may induce bugs when composed. Indeed, today, mixing binaries generated by different compilers can lead to an erroneous resulting binary. We present mix testing: a new technique designed to find compiler bugs when the instructions of a C/C++ test are separately compiled for multiple compatible architectures and then mixed together. We define a class of compiler bugs, coined mixing bugs, that arise when parts of a program are compiled separately using different mappings from C/C++ atomic operations to assembly sequences. To demonstrate the generality of mix testing, we have designed and implemented a tool, atomic-mixer, which we have used: (a) to reproduce one existing non-mixing bug that state-of-the-art concurrency testing tools are limited to being able to find (showing that atomic-mixer at least meets the capabilities of these tools), and (b) to find four previously-unknown mixing bugs in LLVM and GCC, and one prospective mixing bug in mappings proposed for the Java Virtual Machine. Lastly, we have worked with engineers at Arm to specify, for the first time, an atomics ABI for Armv8, and have used atomic-mixer to validate the LLVM and GCC compilers against it. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 26 pages, Accepted to OOPSLA (Object-oriented Programming, Systems, Languages, and Applications) 2024

ACM Class: D.3.4; D.2.5

arXiv:2408.08441 [pdf, other]

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Authors: Rafael Rafailov, Kyle Hatch, Anikait Singh, Laura Smith, Aviral Kumar, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip Ball, Jiajun Wu, Chelsea Finn, Sergey Levine

Abstract: Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetu… ▽ More Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetuning to overcome challenges with exploration. However, evaluating progress on offline RL algorithms requires effective and challenging benchmarks that capture properties of real-world tasks, provide a range of task difficulties, and cover a range of challenges both in terms of the parameters of the domain (e.g., length of the horizon, sparsity of rewards) and the parameters of the data (e.g., narrow demonstration data or broad exploratory data). While considerable progress in offline RL in recent years has been enabled by simpler benchmark tasks, the most widely used datasets are increasingly saturating in performance and may fail to reflect properties of realistic tasks. We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments, based on models of real-world robotic systems, and comprising a variety of data sources, including scripted data, play-style data collected by human teleoperators, and other data sources. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation, with some of the tasks specifically designed to require both pre-training and fine-tuning. We hope that our proposed benchmark will facilitate further progress on both offline RL and fine-tuning algorithms. Website with code, examples, tasks, and data is available at \url{https://sites.google.com/view/d5rl/} △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: RLC 2024

arXiv:2408.05147 [pdf, other]

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Authors: Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, János Kramár, Anca Dragan, Rohin Shah, Neel Nanda

Abstract: Sparse autoencoders (SAEs) are an unsupervised method for learning a sparse decomposition of a neural network's latent representations into seemingly interpretable features. Despite recent excitement about their potential, research applications outside of industry are limited by the high cost of training a comprehensive suite of SAEs. In this work, we introduce Gemma Scope, an open suite of JumpRe… ▽ More Sparse autoencoders (SAEs) are an unsupervised method for learning a sparse decomposition of a neural network's latent representations into seemingly interpretable features. Despite recent excitement about their potential, research applications outside of industry are limited by the high cost of training a comprehensive suite of SAEs. In this work, we introduce Gemma Scope, an open suite of JumpReLU SAEs trained on all layers and sub-layers of Gemma 2 2B and 9B and select layers of Gemma 2 27B base models. We primarily train SAEs on the Gemma 2 pre-trained models, but additionally release SAEs trained on instruction-tuned Gemma 2 9B for comparison. We evaluate the quality of each SAE on standard metrics and release these results. We hope that by releasing these SAE weights, we can help make more ambitious safety and interpretability research easier for the community. Weights and a tutorial can be found at https://huggingface.co/google/gemma-scope and an interactive demo can be found at https://www.neuronpedia.org/gemma-scope △ Less

Submitted 19 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

Comments: 12 main text pages, and 14 pages of acknowledgements, references and appendices

arXiv:2408.00113 [pdf, other]

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Authors: Adam Karvonen, Benjamin Wright, Can Rager, Rico Angell, Jannik Brinkmann, Logan Smith, Claudio Mayrink Verdun, David Bau, Samuel Marks

Abstract: What latent features are encoded in language model (LM) representations? Recent work on training sparse autoencoders (SAEs) to disentangle interpretable features in LM representations has shown significant promise. However, evaluating the quality of these SAEs is difficult because we lack a ground-truth collection of interpretable features that we expect good SAEs to recover. We thus propose to me… ▽ More What latent features are encoded in language model (LM) representations? Recent work on training sparse autoencoders (SAEs) to disentangle interpretable features in LM representations has shown significant promise. However, evaluating the quality of these SAEs is difficult because we lack a ground-truth collection of interpretable features that we expect good SAEs to recover. We thus propose to measure progress in interpretable dictionary learning by working in the setting of LMs trained on chess and Othello transcripts. These settings carry natural collections of interpretable features -- for example, "there is a knight on F3" -- which we leverage into $\textit{supervised}$ metrics for SAE quality. To guide progress in interpretable dictionary learning, we introduce a new SAE training technique, $\textit{p-annealing}$, which improves performance on prior unsupervised metrics as well as our new metrics. △ Less

Submitted 30 October, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

Comments: Accepted as an oral paper (top 5%) at the ICML 2024 Mechanistic Interpretability Workshop and to the NeurIPS 2024 Main Conference

arXiv:2407.06584 [pdf, other]

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: IROS 2024

arXiv:2407.02666 [pdf, other]

Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models

Authors: Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith, Sergey Levine, Chelsea Finn

Abstract: Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unu… ▽ More Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unusual scenarios successfully. This presents an open challenge to current learning methods, which often struggle with generalization to the long tail of unexpected situations without heavy human supervision. To address this issue, we investigate how to leverage the broad knowledge about the structure of the world and commonsense reasoning capabilities of vision-language models (VLMs) to aid legged robots in handling difficult, ambiguous situations. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection with VLMs: (1) in-context adaptation over previous robot interactions and (2) planning multiple skills into the future and replanning. We evaluate VLM-PC on several challenging real-world obstacle courses, involving dead ends and climbing and crawling, on a Go1 quadruped robot. Our experiments show that by reasoning over the history of interactions and future plans, VLMs enable the robot to autonomously perceive, navigate, and act in a wide range of complex scenarios that would otherwise require environment-specific engineering or human guidance. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 27 pages

arXiv:2406.06615 [pdf, other]

Language Guided Skill Discovery

Authors: Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

Abstract: Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity… ▽ More Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity" of skills. We hypothesize that leveraging the semantic knowledge of large language models (LLMs) can lead us to improve semantic diversity of resulting behaviors. In this sense, we introduce Language Guided Skill Discovery (LGSD), a skill discovery framework that aims to directly maximize the semantic diversity between skills. LGSD takes user prompts as input and outputs a set of semantically distinctive skills. The prompts serve as a means to constrain the search space into a semantically desired subspace, and the generated LLM outputs guide the agent to visit semantically diverse states within the subspace. We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt. Furthermore, we show that language guidance aids in discovering more diverse skills compared to five existing skill discovery methods in robot-arm manipulation environments. Lastly, LGSD provides a simple way of utilizing learned skills via natural language. △ Less

Submitted 28 February, 2025; v1 submitted 7 June, 2024; originally announced June 2024.

arXiv:2404.16014 [pdf, other]

Improving Dictionary Learning with Gated Sparse Autoencoders

Authors: Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

Abstract: Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to enco… ▽ More Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to encourage sparsity introduces many undesirable biases, such as shrinkage -- systematic underestimation of feature activations. The key insight of Gated SAEs is to separate the functionality of (a) determining which directions to use and (b) estimating the magnitudes of those directions: this enables us to apply the L1 penalty only to the former, limiting the scope of undesirable side effects. Through training SAEs on LMs of up to 7B parameters we find that, in typical hyper-parameter ranges, Gated SAEs solve shrinkage, are similarly interpretable, and require half as many firing features to achieve comparable reconstruction fidelity. △ Less

Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 15 main text pages, 22 appendix pages

arXiv:2404.07839 [pdf, other]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr… ▽ More We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens. △ Less

Submitted 28 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Showing 1–50 of 228 results for author: Smith, L