-
CRED: Counterfactual Reasoning and Environment Design for Active Preference Learning
Authors:
Yi-Shiuan Tung,
Gyanig Kumar,
Wei Jiang,
Bradley Hayes,
Alessandro Roncone
Abstract:
As a robot's operational environment and tasks to perform within it grow in complexity, the explicit specification and balancing of optimization objectives to achieve a preferred behavior profile moves increasingly farther out of reach. These systems benefit strongly by being able to align their behavior to reflect human preferences and respond to corrections, but manually encoding this feedback i…
▽ More
As a robot's operational environment and tasks to perform within it grow in complexity, the explicit specification and balancing of optimization objectives to achieve a preferred behavior profile moves increasingly farther out of reach. These systems benefit strongly by being able to align their behavior to reflect human preferences and respond to corrections, but manually encoding this feedback is infeasible. Active preference learning (APL) learns human reward functions by presenting trajectories for ranking. However, existing methods sample from fixed trajectory sets or replay buffers that limit query diversity and often fail to identify informative comparisons. We propose CRED, a novel trajectory generation method for APL that improves reward inference by jointly optimizing environment design and trajectory selection to efficiently query and extract preferences from users. CRED "imagines" new scenarios through environment design and leverages counterfactual reasoning -- by sampling possible rewards from its current belief and asking "What if this were the true preference?" -- to generate trajectory pairs that expose differences between competing reward functions. Comprehensive experiments and a user study show that CRED significantly outperforms state-of-the-art methods in reward accuracy and sample efficiency and receives higher user ratings.
△ Less
Submitted 9 March, 2026;
originally announced March 2026.
-
RF-Modulated Adaptive Communication Improves Multi-Agent Robotic Exploration
Authors:
Lorin Achey,
Breanne Crockett,
Christoffer Heckman,
Bradley Hayes
Abstract:
Reliable coordination and efficient communication are critical challenges for multi-agent robotic exploration of environments where communication is limited. This work introduces Adaptive-RF Transmission (ART), a novel communication-aware planning algorithm that dynamically modulates transmission location based on signal strength and data payload size, enabling heterogeneous robot teams to share i…
▽ More
Reliable coordination and efficient communication are critical challenges for multi-agent robotic exploration of environments where communication is limited. This work introduces Adaptive-RF Transmission (ART), a novel communication-aware planning algorithm that dynamically modulates transmission location based on signal strength and data payload size, enabling heterogeneous robot teams to share information efficiently without unnecessary backtracking. We further explore an extension to this approach called ART-SST, which enforces signal strength thresholds for high-fidelity data delivery. Through over 480 simulations across three cave-inspired environments, ART consistently outperforms existing strategies, including full rendezvous and minimum-signal heuristic approaches, achieving up to a 58% reduction in distance traveled and up to 52% faster exploration times compared to baseline methods. These results demonstrate that adaptive, payload-aware communication significantly improves coverage efficiency and mission speed in complex, communication-constrained environments, offering a promising foundation for future planetary exploration and search-and-rescue missions.
△ Less
Submitted 12 February, 2026;
originally announced February 2026.
-
UnrealPose: Leveraging Game Engine Kinematics for Large-Scale Synthetic Human Pose Data
Authors:
Joshua Kawaguchi,
Saad Manzur,
Emily Gao Wang,
Maitreyi Sinha,
Bryan Vela,
Yunxi Wang,
Brandon Vela,
Wayne B. Hayes
Abstract:
Diverse, accurately labeled 3D human pose data is expensive and studio-bound, while in-the-wild datasets lack known ground truth. We introduce UnrealPose-Gen, an Unreal Engine 5 pipeline built on Movie Render Queue for high-quality offline rendering. Our generated frames include: (i) 3D joints in world and camera coordinates, (ii) 2D projections and COCO-style keypoints with occlusion and joint-vi…
▽ More
Diverse, accurately labeled 3D human pose data is expensive and studio-bound, while in-the-wild datasets lack known ground truth. We introduce UnrealPose-Gen, an Unreal Engine 5 pipeline built on Movie Render Queue for high-quality offline rendering. Our generated frames include: (i) 3D joints in world and camera coordinates, (ii) 2D projections and COCO-style keypoints with occlusion and joint-visibility flags, (iii) person bounding boxes, and (iv) camera intrinsics and extrinsics. We use UnrealPose-Gen to present UnrealPose-1M, an approximately one million frame corpus comprising eight sequences: five scripted "coherent" sequences spanning five scenes, approximately 40 actions, and five subjects; and three randomized sequences across three scenes, approximately 100 actions, and five subjects, all captured from diverse camera trajectories for broad viewpoint coverage. As a fidelity check, we report real-to-synthetic results on four tasks: image-to-3D pose, 2D keypoint detection, 2D-to-3D lifting, and person detection/segmentation. Though time and resources constrain us from an unlimited dataset, we release the UnrealPose-1M dataset, as well as the UnrealPose-Gen pipeline to support third-party generation of human pose data.
△ Less
Submitted 2 January, 2026;
originally announced January 2026.
-
ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments with Low-Cost Sensors
Authors:
Shivendra Agrawal,
Jake Brawer,
Ashutosh Naik,
Alessandro Roncone,
Bradley Hayes
Abstract:
Many indoor workspaces are quasi-static: global layout is stable but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat vision-based localization. We present ShelfAware, a semantic particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed landmarks. Sh…
▽ More
Many indoor workspaces are quasi-static: global layout is stable but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat vision-based localization. We present ShelfAware, a semantic particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed landmarks. ShelfAware fuses a depth likelihood with a category-centric semantic similarity and uses a precomputed bank of semantic viewpoints to perform inverse semantic proposals inside MCL, yielding fast, targeted hypothesis generation on low-cost, vision-only hardware. Across 100 global-localization trials spanning four conditions (cart-mounted, wearable, dynamic obstacles, and sparse semantics) in a semantically dense, retail environment, ShelfAware achieves a 96% success rate (vs. 22% MCL and 10% AMCL) with a mean time-to-convergence of 1.91s, attains the lowest translational RMSE in all conditions, and maintains stable tracking in 80% of tested sequences, all while running in real time on a consumer laptop-class platform. By modeling semantics distributionally at the category level and leveraging inverse proposals, ShelfAware resolves geometric aliasing and semantic drift common to quasi-static domains. Because the method requires only vision sensors and VIO, it integrates as an infrastructure-free building block for mobile robots in warehouses, labs, and retail settings; as a representative application, it also supports the creation of assistive devices providing start-anytime, shared-control assistive navigation for people with visual impairments.
△ Less
Submitted 9 December, 2025;
originally announced December 2025.
-
Core Mondrian: Basic Mondrian beyond k-anonymity
Authors:
Adam Bloomston,
Elizabeth Burke,
Megan Cacace,
Anne Diaz,
Wren Dougherty,
Matthew Gonzalez,
Remington Gregg,
Yeliz Güngör,
Bryce Hayes,
Eeway Hsu,
Oron Israeli,
Heesoo Kim,
Sara Kwasnick,
Joanne Lacsina,
Demma Rosa Rodriguez,
Adam Schiller,
Whitney Schumacher,
Jessica Simon,
Maggie Tang,
Skyler Wharton,
Marilyn Wilcken
Abstract:
We present Core Mondrian, a scalable extension of the Original Mondrian partition-based anonymization algorithm. A modular strategy layer supports k-anonymity, allowing new privacy models to be added easily. A hybrid recursive/queue execution engine exploits multi-core parallelism while maintaining deterministic output. Utility-preserving enhancements include NaN-pattern pre-partitioning, metric-d…
▽ More
We present Core Mondrian, a scalable extension of the Original Mondrian partition-based anonymization algorithm. A modular strategy layer supports k-anonymity, allowing new privacy models to be added easily. A hybrid recursive/queue execution engine exploits multi-core parallelism while maintaining deterministic output. Utility-preserving enhancements include NaN-pattern pre-partitioning, metric-driven cut scoring, and dynamic suppression budget management. Experiments on the 48k-record UCI ADULT dataset and synthetically scaled versions up to 1M records achieve lower Discernibility Metric scores than Original Mondrian for numeric quasi-identifier sets while parallel processing delivers up to 4x speedup vs. sequential Core Mondrian. Core Mondrian enables privacy-compliant equity analytics at production scale.
△ Less
Submitted 7 October, 2025;
originally announced October 2025.
-
PESTO: Real-Time Pitch Estimation with Self-supervised Transposition-equivariant Objective
Authors:
Alain Riou,
Bernardo Torres,
Ben Hayes,
Stefan Lattner,
Gaëtan Hadjeres,
Gaël Richard,
Geoffroy Peeters
Abstract:
In this paper, we introduce PESTO, a self-supervised learning approach for single-pitch estimation using a Siamese architecture. Our model processes individual frames of a Variable-$Q$ Transform (VQT) and predicts pitch distributions. The neural network is designed to be equivariant to translations, notably thanks to a Toeplitz fully-connected layer. In addition, we construct pitch-shifted pairs b…
▽ More
In this paper, we introduce PESTO, a self-supervised learning approach for single-pitch estimation using a Siamese architecture. Our model processes individual frames of a Variable-$Q$ Transform (VQT) and predicts pitch distributions. The neural network is designed to be equivariant to translations, notably thanks to a Toeplitz fully-connected layer. In addition, we construct pitch-shifted pairs by translating and cropping the VQT frames and train our model with a novel class-based transposition-equivariant objective, eliminating the need for annotated data. Thanks to this architecture and training objective, our model achieves remarkable performances while being very lightweight ($130$k parameters). Evaluations on music and speech datasets (MIR-1K, MDB-stem-synth, and PTDB) demonstrate that PESTO not only outperforms self-supervised baselines but also competes with supervised methods, exhibiting superior cross-dataset generalization. Finally, we enhance PESTO's practical utility by developing a streamable VQT implementation using cached convolutions. Combined with our model's low latency (less than 10 ms) and minimal parameter count, this makes PESTO particularly suitable for real-time applications.
△ Less
Submitted 27 October, 2025; v1 submitted 2 August, 2025;
originally announced August 2025.
-
CRED: Counterfactual Reasoning and Environment Design for Active Preference Learning
Authors:
Yi-Shiuan Tung,
Bradley Hayes,
Alessandro Roncone
Abstract:
For effective real-world deployment, robots should adapt to human preferences, such as balancing distance, time, and safety in delivery routing. Active preference learning (APL) learns human reward functions by presenting trajectories for ranking. However, existing methods often struggle to explore the full trajectory space and fail to identify informative queries, particularly in long-horizon tas…
▽ More
For effective real-world deployment, robots should adapt to human preferences, such as balancing distance, time, and safety in delivery routing. Active preference learning (APL) learns human reward functions by presenting trajectories for ranking. However, existing methods often struggle to explore the full trajectory space and fail to identify informative queries, particularly in long-horizon tasks. We propose CRED, a trajectory generation method for APL that improves reward estimation by jointly optimizing environment design and trajectory selection. CRED "imagines" new scenarios through environment design and uses counterfactual reasoning -- by sampling rewards from its current belief and asking "What if this reward were the true preference?" -- to generate a diverse and informative set of trajectories for ranking. Experiments in GridWorld and real-world navigation using OpenStreetMap data show that CRED improves reward learning and generalizes effectively across different environments.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Robust Robotic Exploration and Mapping Using Generative Occupancy Map Synthesis
Authors:
Lorin Achey,
Alec Reed,
Brendan Crowe,
Bradley Hayes,
Christoffer Heckman
Abstract:
We present a novel approach for enhancing robotic exploration by using generative occupancy mapping. We implement SceneSense, a diffusion model designed and trained for predicting 3D occupancy maps given partial observations. Our proposed approach probabilistically fuses these predictions into a running occupancy map in real-time, resulting in significant improvements in map quality and traversabi…
▽ More
We present a novel approach for enhancing robotic exploration by using generative occupancy mapping. We implement SceneSense, a diffusion model designed and trained for predicting 3D occupancy maps given partial observations. Our proposed approach probabilistically fuses these predictions into a running occupancy map in real-time, resulting in significant improvements in map quality and traversability. We deploy SceneSense on a quadruped robot and validate its performance with real-world experiments to demonstrate the effectiveness of the model. In these experiments we show that occupancy maps enhanced with SceneSense predictions better estimate the distribution of our fully observed ground truth data ($24.44\%$ FID improvement around the robot and $75.59\%$ improvement at range). We additionally show that integrating SceneSense enhanced maps into our robotic exploration stack as a ``drop-in'' map improvement, utilizing an existing off-the-shelf planner, results in improvements in robustness and traversability time. Finally, we show results of full exploration evaluations with our proposed system in two dissimilar environments and find that locally enhanced maps provide more consistent exploration results than maps constructed only from direct sensor measurements.
△ Less
Submitted 29 December, 2025; v1 submitted 24 June, 2025;
originally announced June 2025.
-
Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching
Authors:
Ben Hayes,
Charalampos Saitis,
György Fazekas
Abstract:
Many audio synthesizers can produce the same signal given different parameter configurations, meaning the inversion from sound to parameters is an inherently ill-posed problem. We show that this is largely due to intrinsic symmetries of the synthesizer, and focus in particular on permutation invariance. First, we demonstrate on a synthetic task that regressing point estimates under permutation sym…
▽ More
Many audio synthesizers can produce the same signal given different parameter configurations, meaning the inversion from sound to parameters is an inherently ill-posed problem. We show that this is largely due to intrinsic symmetries of the synthesizer, and focus in particular on permutation invariance. First, we demonstrate on a synthetic task that regressing point estimates under permutation symmetry degrades performance, even when using a permutation-invariant loss function or symmetry-breaking heuristics. Then, viewing equivalent solutions as modes of a probability distribution, we show that a conditional generative model substantially improves performance. Further, acknowledging the invariance of the implicit parameter distribution, we find that performance is further improved by using a permutation equivariant continuous normalizing flow. To accommodate intricate symmetries in real synthesizers, we also propose a relaxed equivariance strategy that adaptively discovers relevant symmetries from data. Applying our method to Surge XT, a full-featured open source synthesizer used in real world audio production, we find our method outperforms regression and generative baselines across audio reconstruction metrics.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Employing Laban Shape for Generating Emotionally and Functionally Expressive Trajectories in Robotic Manipulators
Authors:
Srikrishna Bangalore Raghu,
Clare Lohrmann,
Akshay Bakshi,
Jennifer Kim,
Jose Caraveo Herrera,
Bradley Hayes,
Alessandro Roncone
Abstract:
Successful human-robot collaboration depends on cohesive communication and a precise understanding of the robot's abilities, goals, and constraints. While robotic manipulators offer high precision, versatility, and productivity, they exhibit expressionless and monotonous motions that conceal the robot's intention, resulting in a lack of efficiency and transparency with humans. In this work, we use…
▽ More
Successful human-robot collaboration depends on cohesive communication and a precise understanding of the robot's abilities, goals, and constraints. While robotic manipulators offer high precision, versatility, and productivity, they exhibit expressionless and monotonous motions that conceal the robot's intention, resulting in a lack of efficiency and transparency with humans. In this work, we use Laban notation, a dance annotation language, to enable robotic manipulators to generate trajectories with functional expressivity, where the robot uses nonverbal cues to communicate its abilities and the likelihood of succeeding at its task. We achieve this by introducing two novel variants of Hesitant expressive motion (Spoke-Like and Arc-Like). We also enhance the emotional expressivity of four existing emotive trajectories (Happy, Sad, Shy, and Angry) by augmenting Laban Effort usage with Laban Shape. The functionally expressive motions are validated via a human-subjects study, where participants equate both variants of Hesitant motion with reduced robot competency. The enhanced emotive trajectories are shown to be viewed as distinct emotions using the Valence-Arousal-Dominance (VAD) spectrum, corroborating the usage of Laban Shape.
△ Less
Submitted 23 June, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions
Authors:
Chin-Yun Yu,
Marco A. Martínez-Ramírez,
Junghyun Koo,
Ben Hayes,
Wei-Hsiang Liao,
György Fazekas,
Yuki Mitsufuji
Abstract:
This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, compris…
▽ More
This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations reveals strong relationships between effects and parameters, such as the high-pass and low-shelf filters often working together to shape the low end, and the delay time correlating with the intensity of the delayed signals. Principal component analysis reveals connections to McAdams' timbre dimensions, where the most crucial component modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms the non-Gaussian nature of the parameter distribution, highlighting the complexity of the vocal effects space. These initial findings on the parameter distributions set the foundation for future research in vocal effects modelling and automatic mixing. Our source code and datasets are accessible at https://github.com/SonyResearch/diffvox.
△ Less
Submitted 17 August, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
AI/ML Based Detection and Categorization of Covert Communication in IPv6 Network
Authors:
Mohammad Wali Ur Rahman,
Yu-Zheng Lin,
Carter Weeks,
David Ruddell,
Jeff Gabriellini,
Bill Hayes,
Salim Hariri,
Pratik Satam,
Edward V. Ziegler Jr
Abstract:
The flexibility and complexity of IPv6 extension headers allow attackers to create covert channels or bypass security mechanisms, leading to potential data breaches or system compromises. The mature development of machine learning has become the primary detection technology option used to mitigate covert communication threats. However, the complexity of detecting covert communication, evolving inj…
▽ More
The flexibility and complexity of IPv6 extension headers allow attackers to create covert channels or bypass security mechanisms, leading to potential data breaches or system compromises. The mature development of machine learning has become the primary detection technology option used to mitigate covert communication threats. However, the complexity of detecting covert communication, evolving injection techniques, and scarcity of data make building machine-learning models challenging. In previous related research, machine learning has shown good performance in detecting covert communications, but oversimplified attack scenario assumptions cannot represent the complexity of modern covert technologies and make it easier for machine learning models to detect covert communications. To bridge this gap, in this study, we analyzed the packet structure and network traffic behavior of IPv6, used encryption algorithms, and performed covert communication injection without changing network packet behavior to get closer to real attack scenarios. In addition to analyzing and injecting methods for covert communications, this study also uses comprehensive machine learning techniques to train the model proposed in this study to detect threats, including traditional decision trees such as random forests and gradient boosting, as well as complex neural network architectures such as CNNs and LSTMs, to achieve detection accuracy of over 90\%. This study details the methods used for dataset augmentation and the comparative performance of the applied models, reinforcing insights into the adaptability and resilience of the machine learning application in IPv6 covert communication. We further introduce a Generative AI-driven script refinement framework, leveraging prompt engineering as a preliminary exploration of how generative agents can assist in covert communication detection and model enhancement.
△ Less
Submitted 16 September, 2025; v1 submitted 17 January, 2025;
originally announced January 2025.
-
Online Diffusion-Based 3D Occupancy Prediction at the Frontier with Probabilistic Map Reconciliation
Authors:
Alec Reed,
Lorin Achey,
Brendan Crowe,
Bradley Hayes,
Christoffer Heckman
Abstract:
Autonomous navigation and exploration in unmapped environments remains a significant challenge in robotics due to the difficulty robots face in making commonsense inference of unobserved geometries. Recent advancements have demonstrated that generative modeling techniques, particularly diffusion models, can enable systems to infer these geometries from partial observation. In this work, we present…
▽ More
Autonomous navigation and exploration in unmapped environments remains a significant challenge in robotics due to the difficulty robots face in making commonsense inference of unobserved geometries. Recent advancements have demonstrated that generative modeling techniques, particularly diffusion models, can enable systems to infer these geometries from partial observation. In this work, we present implementation details and results for real-time, online occupancy prediction using a modified diffusion model. By removing attention-based visual conditioning and visual feature extraction components, we achieve a 73$\%$ reduction in runtime with minimal accuracy reduction. These modifications enable occupancy prediction across the entire map, rather than being limited to the area around the robot where camera data can be collected. We introduce a probabilistic update method for merging predicted occupancy data into running occupancy maps, resulting in a 71$\%$ improvement in predicting occupancy at map frontiers compared to previous methods. Finally, we release our code and a ROS node for on-robot operation <upon publication> at github.com/arpg/sceneSense_ws.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
ShelfHelp: Empowering Humans to Perform Vision-Independent Manipulation Tasks with a Socially Assistive Robotic Cane
Authors:
Shivendra Agrawal,
Suresh Nayak,
Ashutosh Naik,
Bradley Hayes
Abstract:
The ability to shop independently, especially in grocery stores, is important for maintaining a high quality of life. This can be particularly challenging for people with visual impairments (PVI). Stores carry thousands of products, with approximately 30,000 new products introduced each year in the US market alone, presenting a challenge even for modern computer vision solutions. Through this work…
▽ More
The ability to shop independently, especially in grocery stores, is important for maintaining a high quality of life. This can be particularly challenging for people with visual impairments (PVI). Stores carry thousands of products, with approximately 30,000 new products introduced each year in the US market alone, presenting a challenge even for modern computer vision solutions. Through this work, we present a proof-of-concept socially assistive robotic system we call ShelfHelp, and propose novel technical solutions for enhancing instrumented canes traditionally meant for navigation tasks with additional capability within the domain of shopping. ShelfHelp includes a novel visual product locator algorithm designed for use in grocery stores and a novel planner that autonomously issues verbal manipulation guidance commands to guide the user during product retrieval. Through a human subjects study, we show the system's success in locating and providing effective manipulation guidance to retrieve desired products with novice users. We compare two autonomous verbal guidance modes achieving comparable performance to a human assistance baseline and present encouraging findings that validate our system's efficiency and effectiveness and through positive subjective metrics including competence, intelligence, and ease of use.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction Tasks
Authors:
Emily Jensen,
Sriram Sankaranarayanan,
Bradley Hayes
Abstract:
We claim that LLMs can be paired with formal analysis methods to provide accessible, relevant feedback for HRI tasks. While logic specifications are useful for defining and assessing a task, these representations are not easily interpreted by non-experts. Luckily, LLMs are adept at generating easy-to-understand text that explains difficult concepts. By integrating task assessment outcomes and othe…
▽ More
We claim that LLMs can be paired with formal analysis methods to provide accessible, relevant feedback for HRI tasks. While logic specifications are useful for defining and assessing a task, these representations are not easily interpreted by non-experts. Luckily, LLMs are adept at generating easy-to-understand text that explains difficult concepts. By integrating task assessment outcomes and other contextual information into an LLM prompt, we can effectively synthesize a useful set of recommendations for the learner to improve their performance.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Automated Assessment and Adaptive Multimodal Formative Feedback Improves Psychomotor Skills Training Outcomes in Quadrotor Teleoperation
Authors:
Emily Jensen,
Sriram Sankaranarayanan,
Bradley Hayes
Abstract:
The workforce will need to continually upskill in order to meet the evolving demands of industry, especially working with robotic and autonomous systems. Current training methods are not scalable and do not adapt to the skills that learners already possess. In this work, we develop a system that automatically assesses learner skill in a quadrotor teleoperation task using temporal logic task specif…
▽ More
The workforce will need to continually upskill in order to meet the evolving demands of industry, especially working with robotic and autonomous systems. Current training methods are not scalable and do not adapt to the skills that learners already possess. In this work, we develop a system that automatically assesses learner skill in a quadrotor teleoperation task using temporal logic task specifications. This assessment is used to generate multimodal feedback based on the principles of effective formative feedback. Participants perceived the feedback positively. Those receiving formative feedback viewed the feedback as more actionable compared to receiving summary statistics. Participants in the multimodal feedback condition were more likely to achieve a safe landing and increased their safe landings more over the experiment compared to other feedback conditions. Finally, we identify themes to improve adaptive feedback and discuss and how training for complex psychomotor tasks can be integrated with learning theories.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial Observation
Authors:
Alec Reed,
Brendan Crowe,
Doncey Albin,
Lorin Achey,
Bradley Hayes,
Christoffer Heckman
Abstract:
When exploring new areas, robotic systems generally exclusively plan and execute controls over geometry that has been directly measured. When entering space that was previously obstructed from view such as turning corners in hallways or entering new rooms, robots often pause to plan over the newly observed space. To address this we present SceneScene, a real-time 3D diffusion model for synthesizin…
▽ More
When exploring new areas, robotic systems generally exclusively plan and execute controls over geometry that has been directly measured. When entering space that was previously obstructed from view such as turning corners in hallways or entering new rooms, robots often pause to plan over the newly observed space. To address this we present SceneScene, a real-time 3D diffusion model for synthesizing 3D occupancy information from partial observations that effectively predicts these occluded or out of view geometries for use in future planning and control frameworks. SceneSense uses a running occupancy map and a single RGB-D camera to generate predicted geometry around the platform at runtime, even when the geometry is occluded or out of view. Our architecture ensures that SceneSense never overwrites observed free or occupied space. By preserving the integrity of the observed map, SceneSense mitigates the risk of corrupting the observed space with generative predictions. While SceneSense is shown to operate well using a single RGB-D camera, the framework is flexible enough to extend to additional modalities. SceneSense operates as part of any system that generates a running occupancy map `out of the box', removing conditioning from the framework. Alternatively, for maximum performance in new modalities, the perception backbone can be replaced and the model retrained for inference in new applications. Unlike existing models that necessitate multiple views and offline scene synthesis, or are focused on filling gaps in observed data, our findings demonstrate that SceneSense is an effective approach to estimating unobserved local occupancy information at runtime. Local occupancy predictions from SceneSense are shown to better represent the ground truth occupancy distribution during the test exploration trajectories than the running occupancy map.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Workspace Optimization Techniques to Improve Prediction of Human Motion During Human-Robot Collaboration
Authors:
Yi-Shiuan Tung,
Matthew B. Luebbers,
Alessandro Roncone,
Bradley Hayes
Abstract:
Understanding human intentions is critical for safe and effective human-robot collaboration. While state of the art methods for human goal prediction utilize learned models to account for the uncertainty of human motion data, that data is inherently stochastic and high variance, hindering those models' utility for interactions requiring coordination, including safety-critical or close-proximity ta…
▽ More
Understanding human intentions is critical for safe and effective human-robot collaboration. While state of the art methods for human goal prediction utilize learned models to account for the uncertainty of human motion data, that data is inherently stochastic and high variance, hindering those models' utility for interactions requiring coordination, including safety-critical or close-proximity tasks. Our key insight is that robot teammates can deliberately configure shared workspaces prior to interaction in order to reduce the variance in human motion, realizing classifier-agnostic improvements in goal prediction. In this work, we present an algorithmic approach for a robot to arrange physical objects and project "virtual obstacles" using augmented reality in shared human-robot workspaces, optimizing for human legibility over a given set of tasks. We compare our approach against other workspace arrangement strategies using two human-subjects studies, one in a virtual 2D navigation domain and the other in a live tabletop manipulation domain involving a robotic manipulator arm. We evaluate the accuracy of human motion prediction models learned from each condition, demonstrating that our workspace optimization technique with virtual obstacles leads to higher robot prediction accuracy using less training data.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Improving Human Legibility in Collaborative Robot Tasks through Augmented Reality and Workspace Preparation
Authors:
Yi-Shiuan Tung,
Matthew B. Luebbers,
Alessandro Roncone,
Bradley Hayes
Abstract:
Understanding the intentions of human teammates is critical for safe and effective human-robot interaction. The canonical approach for human-aware robot motion planning is to first predict the human's goal or path, and then construct a robot plan that avoids collision with the human. This method can generate unsafe interactions if the human model and subsequent predictions are inaccurate. In this…
▽ More
Understanding the intentions of human teammates is critical for safe and effective human-robot interaction. The canonical approach for human-aware robot motion planning is to first predict the human's goal or path, and then construct a robot plan that avoids collision with the human. This method can generate unsafe interactions if the human model and subsequent predictions are inaccurate. In this work, we present an algorithmic approach for both arranging the configuration of objects in a shared human-robot workspace, and projecting ``virtual obstacles'' in augmented reality, optimizing for legibility in a given task. These changes to the workspace result in more legible human behavior, improving robot predictions of human goals, thereby improving task fluency and safety. To evaluate our approach, we propose two user studies involving a collaborative tabletop task with a manipulator robot, and a warehouse navigation task with a mobile robot.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Towards A Natural Language Interface for Flexible Multi-Agent Task Assignment
Authors:
Jake Brawer,
Kayleigh Bishop,
Bradley Hayes,
Alessandro Roncone
Abstract:
Task assignment and scheduling algorithms are powerful tools for autonomously coordinating large teams of robotic or AI agents. However, the decisions these system make often rely on components designed by domain experts, which can be difficult for non-technical end-users to understand or modify to their own ends. In this paper we propose a preliminary design for a flexible natural language interf…
▽ More
Task assignment and scheduling algorithms are powerful tools for autonomously coordinating large teams of robotic or AI agents. However, the decisions these system make often rely on components designed by domain experts, which can be difficult for non-technical end-users to understand or modify to their own ends. In this paper we propose a preliminary design for a flexible natural language interface for a task assignment system. The goal of our approach is both to grant users more control over a task assignment system's decision process, as well as render these decisions more transparent. Users can direct the task assignment system via natural language commands, which are applied as constraints to a mixed-integer linear program (MILP) using a large language model (LLM). Additionally, our proposed system can alert users to potential issues with their commands, and engage them in a corrective dialogue in order to find a viable solution. We conclude with a description of our planned user-evaluation in the simulated environment Overcooked and describe next steps towards developing a flexible and transparent task allocation system.
△ Less
Submitted 8 November, 2023; v1 submitted 31 October, 2023;
originally announced November 2023.
-
AI (r)evolution -- where are we heading? Thoughts about the future of music and sound technologies in the era of deep learning
Authors:
Giovanni Bindi,
Nils Demerlé,
Rodrigo Diaz,
David Genova,
Aliénor Golvet,
Ben Hayes,
Jiawen Huang,
Lele Liu,
Vincent Martos,
Sarah Nabi,
Teresa Pelinski,
Lenny Renault,
Saurjya Sarkar,
Pedro Sarmento,
Cyrus Vahidi,
Lewis Wolstanholme,
Yixiao Zhang,
Axel Roebel,
Nick Bryan-Kinns,
Jean-Louis Giavitto,
Mathieu Barthet
Abstract:
Artificial Intelligence (AI) technologies such as deep learning are evolving very quickly bringing many changes to our everyday lives. To explore the future impact and potential of AI in the field of music and sound technologies a doctoral day was held between Queen Mary University of London (QMUL, UK) and Sciences et Technologies de la Musique et du Son (STMS, France). Prompt questions about curr…
▽ More
Artificial Intelligence (AI) technologies such as deep learning are evolving very quickly bringing many changes to our everyday lives. To explore the future impact and potential of AI in the field of music and sound technologies a doctoral day was held between Queen Mary University of London (QMUL, UK) and Sciences et Technologies de la Musique et du Son (STMS, France). Prompt questions about current trends in AI and music were generated by academics from QMUL and STMS. Students from the two institutions then debated these questions. This report presents a summary of the student debates on the topics of: Data, Impact, and the Environment; Responsible Innovation and Creative Practice; Creativity and Bias; and From Tools to the Singularity. The students represent the future generation of AI and music researchers. The academics represent the incumbent establishment. The student debates reported here capture visions, dreams, concerns, uncertainties, and contentious issues for the future of AI and music as the establishment is rightfully challenged by the next generation.
△ Less
Submitted 20 September, 2023;
originally announced October 2023.
-
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
Authors:
Ben Hayes,
Jordie Shier,
György Fazekas,
Andrew McPherson,
Charalampos Saitis
Abstract:
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including mu…
▽ More
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Learning from Abstract Images: on the Importance of Occlusion in a Minimalist Encoding of Human Poses
Authors:
Saad Manzur,
Wayne Hayes
Abstract:
Existing 2D-to-3D pose lifting networks suffer from poor performance in cross-dataset benchmarks. Although the use of 2D keypoints joined by "stick-figure" limbs has shown promise as an intermediate step, stick-figures do not account for occlusion information that is often inherent in an image. In this paper, we propose a novel representation using opaque 3D limbs that preserves occlusion informat…
▽ More
Existing 2D-to-3D pose lifting networks suffer from poor performance in cross-dataset benchmarks. Although the use of 2D keypoints joined by "stick-figure" limbs has shown promise as an intermediate step, stick-figures do not account for occlusion information that is often inherent in an image. In this paper, we propose a novel representation using opaque 3D limbs that preserves occlusion information while implicitly encoding joint locations. Crucially, when training on data with accurate three-dimensional keypoints and without part-maps, this representation allows training on abstract synthetic images, with occlusion, from as many synthetic viewpoints as desired. The result is a pose defined by limb angles rather than joint positions $\unicode{x2013}$ because poses are, in the real world, independent of cameras $\unicode{x2013}$ allowing us to predict poses that are completely independent of camera viewpoint. The result provides not only an improvement in same-dataset benchmarks, but a "quantum leap" in cross-dataset benchmarks.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
The Responsibility Problem in Neural Networks with Unordered Targets
Authors:
Ben Hayes,
Charalampos Saitis,
György Fazekas
Abstract:
We discuss the discontinuities that arise when mapping unordered objects to neural network outputs of fixed permutation, referred to as the responsibility problem. Prior work has proved the existence of the issue by identifying a single discontinuity. Here, we show that discontinuities under such models are uncountably infinite, motivating further research into neural networks for unordered data.
We discuss the discontinuities that arise when mapping unordered objects to neural network outputs of fixed permutation, referred to as the responsibility problem. Prior work has proved the existence of the issue by identifying a single discontinuity. Here, we show that discontinuities under such models are uncountably infinite, motivating further research into neural networks for unordered data.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Rigid-Body Sound Synthesis with Differentiable Modal Resonators
Authors:
Rodrigo Diaz,
Ben Hayes,
Charalampos Saitis,
György Fazekas,
Mark Sandler
Abstract:
Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods such as modal synthesis often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work we present a novel end-to-end framework for training a deep neural networ…
▽ More
Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods such as modal synthesis often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work we present a novel end-to-end framework for training a deep neural network to generate modal resonators for a given 2D shape and material, using a bank of differentiable IIR filters. We demonstrate our method on a dataset of synthetic objects, but train our model using an audio-domain objective, paving the way for physically-informed synthesisers to be learned directly from recordings of real-world objects.
△ Less
Submitted 28 October, 2022; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Sinusoidal Frequency Estimation by Gradient Descent
Authors:
Ben Hayes,
Charalampos Saitis,
György Fazekas
Abstract:
Sinusoidal parameter estimation is a fundamental task in applications from spectral analysis to time-series forecasting. Estimating the sinusoidal frequency parameter by gradient descent is, however, often impossible as the error function is non-convex and densely populated with local minima. The growing family of differentiable signal processing methods has therefore been unable to tune the frequ…
▽ More
Sinusoidal parameter estimation is a fundamental task in applications from spectral analysis to time-series forecasting. Estimating the sinusoidal frequency parameter by gradient descent is, however, often impossible as the error function is non-convex and densely populated with local minima. The growing family of differentiable signal processing methods has therefore been unable to tune the frequency of oscillatory components, preventing their use in a broad range of applications. This work presents a technique for joint sinusoidal frequency and amplitude estimation using the Wirtinger derivatives of a complex exponential surrogate and any first order gradient-based optimizer, enabling end to-end training of neural network controllers for unconstrained sinusoidal models.
△ Less
Submitted 18 November, 2022; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Bilevel Optimization for Just-in-Time Robotic Kitting and Delivery via Adaptive Task Segmentation and Scheduling
Authors:
Yi-Shiuan Tung,
Kayleigh Bishop,
Bradley Hayes,
Alessandro Roncone
Abstract:
Kitting refers to the task of preparing and grouping necessary parts and tools (or "kits") for assembly in a manufacturing environment. Automating this process simplifies the assembly task for human workers and improves efficiency. Existing automated kitting systems adhere to scripted instructions and predefined heuristics. However, given variability in the availability of parts and logistic delay…
▽ More
Kitting refers to the task of preparing and grouping necessary parts and tools (or "kits") for assembly in a manufacturing environment. Automating this process simplifies the assembly task for human workers and improves efficiency. Existing automated kitting systems adhere to scripted instructions and predefined heuristics. However, given variability in the availability of parts and logistic delays, the inflexibility of existing systems can limit the overall efficiency of an assembly line. In this paper, we propose a bilevel optimization framework to enable a robot to perform task segmentation-based part selection, kit arrangement, and delivery scheduling to provide custom-tailored kits just in time - i.e., right when they are needed. We evaluate the proposed approach both through a human subjects study (n=18) involving the construction of a flat-pack furniture table and shop-flow simulation based on the data from the study. Our results show that the just-in-time kitting system is objectively more efficient, resilient to upstream shop flow delays, and subjectively more preferable as compared to baseline approaches of using kits defined by rigid task segmentation boundaries defined by the task graph itself or a single kit that includes all parts necessary to assemble a single unit.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning
Authors:
Himanshu Gupta,
Bradley Hayes,
Zachary Sunberg
Abstract:
This paper presents a hybrid online Partially Observable Markov Decision Process (POMDP) planning system that addresses the problem of autonomous navigation in the presence of multi-modal uncertainty introduced by other agents in the environment. As a particular example, we consider the problem of autonomous navigation in dense crowds of pedestrians and among obstacles. Popular approaches to this…
▽ More
This paper presents a hybrid online Partially Observable Markov Decision Process (POMDP) planning system that addresses the problem of autonomous navigation in the presence of multi-modal uncertainty introduced by other agents in the environment. As a particular example, we consider the problem of autonomous navigation in dense crowds of pedestrians and among obstacles. Popular approaches to this problem first generate a path using a complete planner (e.g., Hybrid A*) with ad-hoc assumptions about uncertainty, then use online tree-based POMDP solvers to reason about uncertainty with control over a limited aspect of the problem (i.e. speed along the path). We present a more capable and responsive real-time approach enabling the POMDP planner to control more degrees of freedom (e.g., both speed AND heading) to achieve more flexible and efficient solutions. This modification greatly extends the region of the state space that the POMDP planner must reason over, significantly increasing the importance of finding effective roll-out policies within the limited computational budget that real time control affords. Our key insight is to use multi-query motion planning techniques (e.g., Probabilistic Roadmaps or Fast Marching Method) as priors for rapidly generating efficient roll-out policies for every state that the POMDP planning tree might reach during its limited horizon search. Our proposed approach generates trajectories that are safe and significantly more efficient than the previous approach, even in densely crowded dynamic environments with long planning horizons.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
PokeRRT: A Kinodynamic Planning Approach for Poking Manipulation
Authors:
Anuj Pasricha,
Yi-Shiuan Tung,
Bradley Hayes,
Alessandro Roncone
Abstract:
This work introduces PokeRRT, a novel motion planning algorithm that demonstrates poking as an effective non-prehensile manipulation skill to enable fast manipulation of objects and increase the size of a robot's reachable workspace. Our qualitative and quantitative results demonstrate the advantages of poking over pushing and grasping in planning object trajectories through uncluttered and clutte…
▽ More
This work introduces PokeRRT, a novel motion planning algorithm that demonstrates poking as an effective non-prehensile manipulation skill to enable fast manipulation of objects and increase the size of a robot's reachable workspace. Our qualitative and quantitative results demonstrate the advantages of poking over pushing and grasping in planning object trajectories through uncluttered and cluttered environments.
△ Less
Submitted 9 February, 2022;
originally announced March 2022.
-
PokeRRT: Poking as a Skill and Failure Recovery Tactic for Planar Non-Prehensile Manipulation
Authors:
Anuj Pasricha,
Yi-Shiuan Tung,
Bradley Hayes,
Alessandro Roncone
Abstract:
In this work, we introduce PokeRRT, a novel motion planning algorithm that demonstrates poking as an effective non-prehensile manipulation skill to enable fast manipulation of objects and increase the size of a robot's reachable workspace. We showcase poking as a failure recovery tactic used synergistically with pick-and-place for resiliency in cases where pick-and-place initially fails or is unac…
▽ More
In this work, we introduce PokeRRT, a novel motion planning algorithm that demonstrates poking as an effective non-prehensile manipulation skill to enable fast manipulation of objects and increase the size of a robot's reachable workspace. We showcase poking as a failure recovery tactic used synergistically with pick-and-place for resiliency in cases where pick-and-place initially fails or is unachievable. Our experiments demonstrate the efficiency of the proposed framework in planning object trajectories using poking manipulation in uncluttered and cluttered environments. In addition to quantitatively and qualitatively demonstrating the adaptability of PokeRRT to different scenarios in both simulation and real-world settings, our results show the advantages of poking over pushing and grasping in terms of success rate and task time.
△ Less
Submitted 4 June, 2024; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Neural Waveshaping Synthesis
Authors:
Ben Hayes,
Charalampos Saitis,
György Fazekas
Abstract:
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear transfer functions that encode the characteristics…
▽ More
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear transfer functions that encode the characteristics of a target timbre. Once trained, a NEWT can produce complex timbral evolutions by simple affine transformations of its input and output signals. We paired the NEWT with a differentiable noise synthesiser and reverb and found it capable of generating realistic musical instrument performances with only 260k total model parameters, conditioned on F0 and loudness features. We compared our method to state-of-the-art benchmarks with a multi-stimulus listening test and the Fréchet Audio Distance and found it performed competitively across the tested timbral domains. Our method significantly outperformed the benchmarks in terms of generation speed, and achieved real-time performance on a consumer CPU, both with and without FastNEWT, suggesting it is a viable basis for future creative sound design tools.
△ Less
Submitted 27 July, 2021; v1 submitted 11 July, 2021;
originally announced July 2021.
-
Parallel Statistical Model Checking for Safety Verification in Smart Grids
Authors:
T. Mancini,
F. Mari,
I. Melatti,
I. Salvo,
E. Tronci,
J. K. Gruber,
B. Hayes,
M. Prodanovic,
L. Elmegaard
Abstract:
By using small computing devices deployed at user premises, Autonomous Demand Response (ADR) adapts users electricity consumption to given time-dependent electricity tariffs. This allows end-users to save on their electricity bill and Distribution System Operators to optimise (through suitable time-dependent tariffs) management of the electric grid by avoiding demand peaks. Unfortunately, even wit…
▽ More
By using small computing devices deployed at user premises, Autonomous Demand Response (ADR) adapts users electricity consumption to given time-dependent electricity tariffs. This allows end-users to save on their electricity bill and Distribution System Operators to optimise (through suitable time-dependent tariffs) management of the electric grid by avoiding demand peaks. Unfortunately, even with ADR, users power consumption may deviate from the expected (minimum cost) one, e.g., because ADR devices fail to correctly forecast energy needs at user premises. As a result, the aggregated power demand may present undesirable peaks. In this paper we address such a problem by presenting methods and a software tool (APD-Analyser) implementing them, enabling Distribution System Operators to effectively verify that a given time-dependent electricity tariff achieves the desired goals even when end-users deviate from their expected behaviour. We show feasibility of the proposed approach through a realistic scenario from a medium voltage Danish distribution network.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
One-shot Policy Elicitation via Semantic Reward Manipulation
Authors:
Aaquib Tabrez,
Ryan Leonard,
Bradley Hayes
Abstract:
Synchronizing expectations and knowledge about the state of the world is an essential capability for effective collaboration. For robots to effectively collaborate with humans and other autonomous agents, it is critical that they be able to generate intelligible explanations to reconcile differences between their understanding of the world and that of their collaborators. In this work we present S…
▽ More
Synchronizing expectations and knowledge about the state of the world is an essential capability for effective collaboration. For robots to effectively collaborate with humans and other autonomous agents, it is critical that they be able to generate intelligible explanations to reconcile differences between their understanding of the world and that of their collaborators. In this work we present Single-shot Policy Explanation for Augmenting Rewards (SPEAR), a novel sequential optimization algorithm that uses semantic explanations derived from combinations of planning predicates to augment agents' reward functions, driving their policies to exhibit more optimal behavior. We provide an experimental validation of our algorithm's policy manipulation capabilities in two practically grounded applications and conclude with a performance analysis of SPEAR on domains of increasingly complex state space and predicate counts. We demonstrate that our method makes substantial improvements over the state-of-the-art in terms of runtime and addressable problem size, enabling an agent to leverage its own expertise to communicate actionable information to improve another's performance.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
Exact $p$-values for global network alignments via combinatorial analysis of shared GO terms (Subtitle: REFANGO: Rigorous Evaluation of Functional Alignments of Networks using Gene Ontology)
Authors:
Wayne B. Hayes
Abstract:
Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no gold standard exists for evaluating how well eithe…
▽ More
Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions tend to perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no gold standard exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. We use combinatorics to precisely count the number of possible network alignments in which $k$ proteins share a particular GO term. When divided by the number of all possible network alignments, this provides an explicit, exact $p$-value for a network alignment with respect to a particular GO term. Just as with BLAST's p-values and bit-scores, this method is designed not to guide the formation of any particular alignment, but instead to provide an after-the-fact evaluation of a fixed, given alignment.
△ Less
Submitted 25 September, 2021; v1 submitted 9 October, 2020;
originally announced October 2020.
-
Automated Failure-Mode Clustering and Labeling for Informed Car-To-Driver Handover in Autonomous Vehicles
Authors:
Aaquib Tabrez,
Matthew B. Luebbers,
Bradley Hayes
Abstract:
The car-to-driver handover is a critically important component of safe autonomous vehicle operation when the vehicle is unable to safely proceed on its own. Current implementations of this handover in automobiles take the form of a generic alarm indicating an imminent transfer of control back to the human driver. However, certain levels of vehicle autonomy may allow the driver to engage in other,…
▽ More
The car-to-driver handover is a critically important component of safe autonomous vehicle operation when the vehicle is unable to safely proceed on its own. Current implementations of this handover in automobiles take the form of a generic alarm indicating an imminent transfer of control back to the human driver. However, certain levels of vehicle autonomy may allow the driver to engage in other, non-driving related tasks prior to a handover, leading to substantial difficulty in quickly regaining situational awareness. This delay in re-orientation could potentially lead to life-threatening failures unless mitigating steps are taken. Explainable AI has been shown to improve fluency and teamwork in human-robot collaboration scenarios. Therefore, we hypothesize that by utilizing autonomous explanation, these car-to-driver handovers can be performed more safely and reliably. The rationale is, by providing the driver with additional situational knowledge, they will more rapidly focus on the relevant parts of the driving environment. Towards this end, we propose an algorithmic failure-mode identification and explanation approach to enable informed handovers from vehicle to driver. Furthermore, we propose a set of human-subjects driving-simulator studies to determine the appropriate form of explanation during handovers, as well as validate our framework.
△ Less
Submitted 9 May, 2020;
originally announced May 2020.
-
Proceedings of the AI-HRI Symposium at AAAI-FSS 2018
Authors:
Kalesha Bullard,
Nick DePalma,
Richard G. Freedman,
Bradley Hayes,
Luca Iocchi,
Katrin Lohan,
Ross Mead,
Emmanuel Senft,
Tom Williams
Abstract:
The goal of the Interactive Learning for Artificial Intelligence (AI) for Human-Robot Interaction (HRI) symposium is to bring together the large community of researchers working on interactive learning scenarios for interactive robotics. While current HRI research involves investigating ways for robots to effectively interact with people, HRI's overarching goal is to develop robots that are autono…
▽ More
The goal of the Interactive Learning for Artificial Intelligence (AI) for Human-Robot Interaction (HRI) symposium is to bring together the large community of researchers working on interactive learning scenarios for interactive robotics. While current HRI research involves investigating ways for robots to effectively interact with people, HRI's overarching goal is to develop robots that are autonomous while intelligently modeling and learning from humans. These goals greatly overlap with some central goals of AI and interactive machine learning, such that HRI is an extremely challenging problem domain for interactive learning and will elicit fresh problem areas for robotics research. Present-day AI research still does not widely consider situations for interacting directly with humans and within human-populated environments, which present inherent uncertainty in dynamics, structure, and interaction. We believe that the HRI community already offers a rich set of principles and observations that can be used to structure new models of interaction. The human-aware AI initiative has primarily been approached through human-in-the-loop methods that use people's data and feedback to improve refinement and performance of the algorithms, learned functions, and personalization. We thus believe that HRI is an important component to furthering AI and robotics research.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8
Authors:
Adib Hassan,
Po-Chien Chung,
Wayne B. Hayes
Abstract:
Graphlets are small connected induced subgraphs of a larger graph $G$. Graphlets are now commonly used to quantify local and global topology of networks in the field. Methods exist to exhaustively enumerate all graphlets (and their orbits) in large networks as efficiently as possible using orbit counting equations. However, the number of graphlets in $G$ is exponential in both the number of nodes…
▽ More
Graphlets are small connected induced subgraphs of a larger graph $G$. Graphlets are now commonly used to quantify local and global topology of networks in the field. Methods exist to exhaustively enumerate all graphlets (and their orbits) in large networks as efficiently as possible using orbit counting equations. However, the number of graphlets in $G$ is exponential in both the number of nodes and edges in $G$. Enumerating them all is already unacceptably expensive on existing large networks, and the problem will only get worse as networks continue to grow in size and density. Here we introduce an efficient method designed to aid statistical sampling of graphlets up to size $k=8$ from a large network. We define graphettes as the generalization of graphlets allowing for disconnected graphlets. Given a particular (undirected) graphette $g$, we introduce the idea of the canonical graphette $\mathcal K(g)$ as a representative member of the isomorphism group $Iso(g)$ of $g$. We compute the mapping $\mathcal K$, in the form of a lookup table, from all $2^{k(k-1)/2}$ undirected graphettes $g$ of size $k\le 8$ to their canonical representatives $\mathcal K(g)$, as well as the permutation that transforms $g$ to $\mathcal K(g)$. We also compute all automorphism orbits for each canonical graphette. Thus, given any $k\le 8$ nodes in a graph $G$, we can in constant time infer which graphette it is, as well as which orbit each of the $k$ nodes belongs to. Sampling a large number $N$ of such $k$-sets of nodes provides an approximation of both the distribution of graphlets and orbits across $G$, and the orbit degree vector at each node.
△ Less
Submitted 14 August, 2017;
originally announced August 2017.
-
A Graph-based Model for GPU Caching Problems
Authors:
Lingda Li,
Ari B. Hayes,
Stephen A. Hackler,
Eddy Z. Zhang,
Mario Szegedy,
Shuaiwen Leon Song
Abstract:
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling among different threads. Traditionally, in the field of parallel computing, graph partition models are used to model data communication and guide task scheduling.…
▽ More
Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling among different threads. Traditionally, in the field of parallel computing, graph partition models are used to model data communication and guide task scheduling. However, we discover that the previous methods are either inaccurate or expensive when applied to GPU programs. In this paper, we propose a novel task partition model that is accurate and gives rise to the development of fast and high quality task/data reorganization algorithms. We demonstrate the effectiveness of the proposed model by rigorous theoretical analysis of the algorithm bounds and extensive experimental analysis. The experimental results show that it achieves significant performance improvement across a representative set of GPU applications.
△ Less
Submitted 6 May, 2016;
originally announced May 2016.