-
A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation
Authors:
TRI LBM Team,
Jose Barreiros,
Andrew Beaulieu,
Aditya Bhat,
Rick Cory,
Eric Cousineau,
Hongkai Dai,
Ching-Hsin Fang,
Kunimatsu Hashimoto,
Muhammad Zubair Irshad,
Masha Itkina,
Naveen Kuppuswamy,
Kuan-Hui Lee,
Katherine Liu,
Dale McConachie,
Ian McMahon,
Haruki Nishimura,
Calder Phillips-Grafflin,
Charles Richter,
Paarth Shah,
Krishnan Srinivasan,
Blake Wulfe,
Chen Xu,
Mengchao Zhang,
Alex Alspach
, et al. (57 additional authors not shown)
Abstract:
Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnere…
▽ More
Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnered significant enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting both the pace of development and inhibiting a nuanced understanding of current capabilities. In this paper, we rigorously evaluate multitask robot manipulation policies, referred to as Large Behavior Models (LBMs), by extending the Diffusion Policy paradigm across a corpus of simulated and real-world robot data. We propose and validate an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compare against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We find that multi-task pretraining makes the policies more successful and robust, and enables teaching complex new tasks more quickly, using a fraction of the data when compared to single-task baselines. Moreover, performance predictably increases as pretraining scale and diversity grows. Project page: https://toyotaresearchinstitute.github.io/lbm1/
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation
Authors:
Jessica Yin,
Paarth Shah,
Naveen Kuppuswamy,
Andrew Beaulieu,
Avinash Uttamchandani,
Alejandro Castro,
James Pikul,
Russ Tedrake
Abstract:
Visuotactile sensors are a popular tactile sensing strategy due to high-fidelity estimates of local object geometry. However, existing algorithms for processing raw sensor inputs to useful intermediate signals such as contact patches struggle in high-deformation regimes. This is due to physical constraints imposed by sensor hardware and small-deformation assumptions used by mechanics-based models.…
▽ More
Visuotactile sensors are a popular tactile sensing strategy due to high-fidelity estimates of local object geometry. However, existing algorithms for processing raw sensor inputs to useful intermediate signals such as contact patches struggle in high-deformation regimes. This is due to physical constraints imposed by sensor hardware and small-deformation assumptions used by mechanics-based models. In this work, we propose a fusion algorithm for proximity and visuotactile point clouds for contact patch segmentation, entirely independent from membrane mechanics. This algorithm exploits the synchronous, high spatial resolution proximity and visuotactile modalities enabled by an extremely deformable, selectively transmissive soft membrane, which uses visible light for visuotactile sensing and infrared light for proximity depth. We evaluate our contact patch algorithm in low (10%), medium (60%), and high (100%+) strain states. We compare our method against three baselines: proximity-only, tactile-only, and a first principles mechanics model. Our approach outperforms all baselines with an average RMSE under 2.8 mm of the contact patch geometry across all strain ranges. We demonstrate our contact patch algorithm in four applications: varied stiffness membranes, torque and shear-induced wrinkling, closed loop control, and pose estimation.
△ Less
Submitted 19 May, 2025; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Punyo-1: Soft tactile-sensing upper-body robot for large object manipulation and physical human interaction
Authors:
Aimee Goncalves,
Naveen Kuppuswamy,
Andrew Beaulieu,
Avinash Uttamchandani,
Katherine M. Tsui,
Alex Alspach
Abstract:
The manipulation of large objects and safe operation in the vicinity of humans are key capabilities of a general purpose domestic robotic assistant. We present the design of a soft, tactile-sensing humanoid upper-body robot and demonstrate whole-body rich-contact manipulation strategies for handling large objects. We demonstrate our hardware design philosophy for outfitting off-the-shelf hard robo…
▽ More
The manipulation of large objects and safe operation in the vicinity of humans are key capabilities of a general purpose domestic robotic assistant. We present the design of a soft, tactile-sensing humanoid upper-body robot and demonstrate whole-body rich-contact manipulation strategies for handling large objects. We demonstrate our hardware design philosophy for outfitting off-the-shelf hard robot arms and other components with soft tactile-sensing modules, including: (i) low-cost, cut-resistant, contact pressure localizing coverings for the arms, (ii) paws based on TRI's Soft-bubble sensors for the end effectors, and (iii) compliant force/geometry sensors for the coarse geometry sensing chest. We leverage the mechanical intelligence and tactile sensing of these modules to develop and demonstrate motion primitives for whole-body grasping. We evaluate the hardware's effectiveness in achieving grasps of varying strengths over a variety of large domestic objects. Our results demonstrate the importance of exploiting softness and tactile sensing in contact-rich manipulation strategies, as well as a path forward for whole-body force-controlled interactions with the world. (The supplemental video is available publicly at https://youtu.be/G8ZYgPRV5LY).
△ Less
Submitted 30 March, 2022; v1 submitted 17 November, 2021;
originally announced November 2021.
-
Variable compliance and geometry regulation of Soft-Bubble grippers with active pressure control
Authors:
Sihah Joonhigh,
Naveen Kuppuswamy,
Andrew Beaulieu,
Alex Alspach,
Russ Tedrake
Abstract:
While compliant grippers have become increasingly commonplace in robot manipulation, finding the right stiffness and geometry for grasping the widest variety of objects remains a key challenge. Adjusting both stiffness and gripper geometry on the fly may provide the versatility needed to manipulate the large range of objects found in domestic environments. We present a system for actively controll…
▽ More
While compliant grippers have become increasingly commonplace in robot manipulation, finding the right stiffness and geometry for grasping the widest variety of objects remains a key challenge. Adjusting both stiffness and gripper geometry on the fly may provide the versatility needed to manipulate the large range of objects found in domestic environments. We present a system for actively controlling the geometry (inflation level) and compliance of Soft-bubble grippers - air filled, highly compliant parallel gripper fingers incorporating visuotactile sensing. The proposed system enables large, controlled changes in gripper finger geometry and grasp stiffness, as well as simple in-hand manipulation. We also demonstrate, despite these changes, the continued viability of advanced perception capabilities such as dense geometry and shear force measurement - we present a straightforward extension of our previously presented approach for measuring shear induced displacements using the internal imaging sensor and taking into account pressure and geometry changes. We quantify the controlled variation of grasp-free geometry, grasp stiffness and contact patch geometry resulting from pressure regulation and we demonstrate new capabilities for the gripper in the home by grasping in constrained spaces, manipulating tools requiring lower and higher stiffness grasps, as well as contact patch modulation.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Monocular Depth Estimation for Soft Visuotactile Sensors
Authors:
Rares Ambrus,
Vitor Guizilini,
Naveen Kuppuswamy,
Andrew Beaulieu,
Adrien Gaidon,
Alex Alspach
Abstract:
Fluid-filled soft visuotactile sensors such as the Soft-bubbles alleviate key challenges for robust manipulation, as they enable reliable grasps along with the ability to obtain high-resolution sensory feedback on contact geometry and forces. Although they are simple in construction, their utility has been limited due to size constraints introduced by enclosed custom IR/depth imaging sensors to di…
▽ More
Fluid-filled soft visuotactile sensors such as the Soft-bubbles alleviate key challenges for robust manipulation, as they enable reliable grasps along with the ability to obtain high-resolution sensory feedback on contact geometry and forces. Although they are simple in construction, their utility has been limited due to size constraints introduced by enclosed custom IR/depth imaging sensors to directly measure surface deformations. Towards mitigating this limitation, we investigate the application of state-of-the-art monocular depth estimation to infer dense internal (tactile) depth maps directly from the internal single small IR imaging sensor. Through real-world experiments, we show that deep networks typically used for long-range depth estimation (1-100m) can be effectively trained for precise predictions at a much shorter range (1-100mm) inside a mostly textureless deformable fluid-filled sensor. We propose a simple supervised learning process to train an object-agnostic network requiring less than 10 random poses in contact for less than 10 seconds for a small set of diverse objects (mug, wine glass, box, and fingers in our experiments). We show that our approach is sample-efficient, accurate, and generalizes across different objects and sensor configurations unseen at training time. Finally, we discuss the implications of our approach for the design of soft visuotactile sensors and grippers.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.