Website | Model | Dataset | Paper | Research Blog
We just released GR00T N1.6, an updated version of GR00T N1 with improved performance and new features. Check out the release blog post for more details.
To use the older version, N1.5, please checkout the n1.5-release branch.
NVIDIA Isaac GR00T N1.6 is an open vision-language-action (VLA) model for generalized humanoid robot skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.
GR00T N1.6 is trained on a diverse mixture of robot data including bimanual, semi-humanoid and an expansive humanoid dataset. It is adaptable through post-training for specific embodiments, tasks and environments.
The neural network architecture of GR00T N1.6 is a combination of vision-language foundation model and diffusion transformer head that denoises continuous actions. Here is a schematic diagram of the architecture:
Here is the general procedure to use GR00T N1.6:
- We assume the user has already collected a dataset of robot demonstrations in the form of (video, state, action) triplets for a specific task.
- The user will first convert the demonstration data into the LeRobot compatible data schema (more info in
getting_started/data_preparation.md), which is compatible with the upstream Huggingface LeRobot Dataset V2. - Our repo provides convenient scripts to validate zero-shot performance of the pretrained model (see Policy API Guide and RoboCasa Zero-Shot).
- Our repo provides examples of different configurations for training with different robot embodiments (see
examples/and Fine-tuning Guide). - Our repo provides convenient scripts for finetuning the pre-trained GR00T N1.6 model on user's data, and running inference, see
examples. - Our repo provides convenient scripts to run academic simulation benchmarks with finetuned checkpoints (see LIBERO, SimplerEnv, RoboCasa).
- The user will need to connect the
Gr00tPolicyto the robot controller to execute actions on their target hardware.
GR00T N1.6 represents a significant upgrade over GR00T N1.5, with improvements in both model architecture and data leading to better performance in many aspects.
Architectural changes:
- Base VLM: We use an internal NVIDIA Cosmos-Reason-2B VLM variant. The VLM supports flexible resolution and can encode images in their native aspect ratio without padding. The VLM is trained both general vision-language tasks and embodied reasoning tasks like next action prediction.
- Uses 2x larger DiT (32 layers vs 16 layers in N1.5).
- Removes N1.5's post-VLM 4-layer transformer adapter. Instead, unfreezes top 4 layers of the VLM during pretraining.
- Predicts state-relative action chunks for most embodiments, rather than absolute joint angles or EEF positions.
Beyond the N1.5 data mixture, the N1.6 pretraining data additionally includes several thousand hours of teleoperated data from:
- Bimanual YAM arms
- AGIBot Genie1
- Simulated Galaxea R1 Pro on the BEHAVIOR suite
- Whole-Body Locomanipulation with Unitree G1
Other code-level improvements:
- Faster dataloader with sharded dataloader support.
- RTC and Async Policy Wrapper for inference (soon to release)
- Simplified data processing pipeline with
processing_gr00t_n1d6.py - Flexible Training configuration
GR00T N1.6 is intended for researchers and professionals in robotics. This repository provides tools to:
- Leverage a pre-trained foundation model for robot control
- Fine-tune on small, custom datasets
- Adapt the model to specific robotics tasks with minimal data
- Deploy the model for inference
The focus is on enabling customization of robot behaviors through finetuning.
GR00T relies on submodules for certain dependencies. Include them when cloning:
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00TIf you've already cloned without submodules, initialize them separately:
git submodule update --init --recursiveGR00T uses uv for fast, reproducible dependency management.
Requirement: uv v0.8.4+ is needed to parse
[tool.uv.extra-build-dependencies]inpyproject.toml(required for buildingflash-attn).
After installing uv, create the environment and install GR00T:
uv sync --python 3.10
uv pip install -e .Note: CUDA 12.4 is recommended and officially tested. However, CUDA 11.8 has also been verified to work. In such cases, make sure to install a compatible version of
flash-attnmanually (e.g.,flash-attn==2.8.2was confirmed working with CUDA 11.8).
For a containerized setup that avoids system-level dependency conflicts, see our Docker Setup Guide.
For training and inference hardware recommendations (RTX PRO Servers, DGX, Jetson AGX Thor), see the Hardware Recommendation Guide.
We provide pre-trained base VLA model checkpoints. These checkpoints have been pre-trained on 10k+ hours of robot data and can be used for finetuning on downstream tasks.
| Model | Use Case | Description | Checkpoint Path | Branch |
|---|---|---|---|---|
| GR00T N1.5 | Finetuning | Base GR00T N1.5 model (3B parameters) | nvidia/GR00T-N1.5-3B | n1.5-release |
| GR00T N1.6 | Finetuning | Base GR00T N1.6 model (3B parameters) | nvidia/GR00T-N1.6-3B | main |
We also provide finetuned checkpoints for various robot platforms and benchmarks. These models are finetuned from the base models above and can be used directly for evaluation or as starting points for further finetuning.
| Model | Base Model | Description | Checkpoint Path | Example |
|---|---|---|---|---|
| GR00T-N1.6-bridge | nvidia/GR00T-N1.6-3B | Fine-tuned on Bridge dataset for WidowX robot on manipulation tasks | nvidia/GR00T-N1.6-bridge | SimplerEnv |
| GR00T-N1.6-fractal | nvidia/GR00T-N1.6-3B | Fine-tuned on Fractal dataset for Google robot on manipulation tasks | nvidia/GR00T-N1.6-fractal | SimplerEnv |
| GR00T-N1.6-BEHAVIOR1k | nvidia/GR00T-N1.6-3B | Fine-tuned on BEHAVIOR-1K for Galaxea R1 Pro robot on loco-manipulation tasks | nvidia/GR00T-N1.6-BEHAVIOR1k | BEHAVIOR |
| GR00T-N1.6-G1-PnPAppleToPlate | nvidia/GR00T-N1.6-3B | Fine-tuned for Unitree G1 loco-manipulation pick-and-place tasks | nvidia/GR00T-N1.6-G1-PnPAppleToPlate | G1 LocoManipulation |
We can quickly start by downloading a pre-trained checkpoint and starting the policy server for any pretrained embodiement, e.g. GR1 embodiment.
# On GPU server: Start the policy server
uv run python gr00t/eval/run_gr00t_server.py --embodiment-tag GR1 --model-path nvidia/GR00T-N1.6-3BThen, refer to the robocasa-gr1-tabletop-tasks for more details on how to rollout the policy with GR1 embodiment.
We provide accessible Jupyter notebooks and detailed documentation in the ./getting_started folder.
Please refer to the data preparation guide for more details.
After data is prepared, the GR00T model can be used to generate output actions with the below simple inference script:
uv run python scripts/deployment/standalone_inference_script.py \
--model-path nvidia/GR00T-N1.6-3B \
--dataset-path demo_data/gr1.PickNPlace \
--embodiment-tag GR1 \
--traj-ids 0 1 2 \
--inference-mode pytorch \
--action-horizon 8GR00T-N1.6-3B inference timing (4 denoising steps, single view):
| Device | Mode | Data Processing | Backbone | Action Head | E2E | Frequency |
|---|---|---|---|---|---|---|
| RTX 5090 | torch.compile | 2 ms | 18 ms | 16 ms | 37 ms | 27.3 Hz |
| H100 | torch.compile | 4 ms | 23 ms | 11 ms | 38 ms | 26.3 Hz |
| RTX 4090 | torch.compile | 2 ms | 25 ms | 17 ms | 44 ms | 22.8 Hz |
| Thor | torch.compile | 5 ms | 39 ms | 61 ms | 105 ms | 9.5 Hz |
For more details, please check our full inference guide for more details including faster inference with TensorRT
GR00T provides several pre-registered embodiment tags with ready-to-use configurations:
LIBERO_PANDAOXE_GOOGLEOXE_WIDOWXUNITREE_G1BEHAVIOR_R1_PRO
Example: To finetune Libero-Spatial on GR00T N1.6, follow the instructions in the Libero finetuning guide. We also provide simulation environment setup for evaluation linked with post-train checkpoints and benchmark numbers.
To finetune GR00T on your own robot data and configuration, follow the detailed tutorial available at getting_started/finetune_new_embodiment.md.
Ensure your input data follows the GR00T-flavored LeRobot v2 format, and specify your modality configuration at modality_config_path.
# Set number of GPUs
export NUM_GPUS=1
CUDA_VISIBLE_DEVICES=0 uv run python \
gr00t/experiment/launch_finetune.py \
--base-model-path nvidia/GR00T-N1.6-3B \
--dataset-path <DATASET_PATH> \
--embodiment-tag NEW_EMBODIMENT \
--modality-config-path <MODALITY_CONFIG_PATH> \
--num-gpus $NUM_GPUS \
--output-dir <OUTPUT_PATH> \
--save-total-limit 5 \
--save-steps 2000 \
--max-steps 2000 \
--use-wandb \
--global-batch-size 32 \
--color-jitter-params brightness 0.3 contrast 0.4 saturation 0.5 hue 0.08 \
--dataloader-num-workers 4For more extensive finetuning configuration, use
gr00t/experiment/launch_train.pyinstead to launch the training process.
For optimal results, maximize your batch size based on available hardware and train for a few thousand steps.
Fine-tuning Performance
- We recommend using 1 H100 node or L40 node for optimal finetuning performance
- Other hardware configurations (e.g., A6000) will also work but may require longer training time
- Optimal batch size depends on your hardware and which model components are being tuned
Users may observe some variance in post-training results across runs, even when using the same configuration, seed, and dropout settings. In our experiments, we have observed performance differences as large as 5-6% between runs. This variance may be attributed to non-deterministic operations in image augmentations or other stochastic components. When comparing results to reported benchmarks, please keep this inherent variance in mind.
We recommend a two-stage evaluation approach: open-loop evaluation followed by simulation evaluation to comprehensively assess model quality.
Open-loop evaluation provides an offline assessment by comparing the model's predicted actions against ground truth data from your dataset.
Execute the evaluation script with your newly trained model:
uv run python gr00t/eval/open_loop_eval.py \
--dataset-path <DATASET_PATH> \
--embodiment-tag NEW_EMBODIMENT \
--model-path <CHECKPOINT_PATH> \
--traj-ids 0 \
--action-horizon 16 # ensure this is within the delta_indices of action's modality config.The evaluation generates a visualization saved at /tmp/open_loop_eval/traj_{traj_id}.jpeg, which includes:
- Ground truth actions vs. predicted actions
- Unnormalized mean squared error (MSE) metrics
These plots provide a quick indicator of the policy's accuracy on the training dataset distribution.
After validating performance through open-loop evaluation, test your model in closed-loop environments.
After training your model, you'll use the Gr00tPolicy class to load and run inference. The policy expects observations in a specific format (nested dictionaries with video, state, and language modalities) and returns actions ready for execution.
Quick Start with Server-Client Architecture:
# On GPU server: Start the policy server
uv run python gr00t/eval/run_gr00t_server.py \
--embodiment-tag NEW_EMBODIMENT \
--model-path <CHECKPOINT_PATH> \
--device cuda:0 \
--host 0.0.0.0 \
--port 5555from gr00t.policy.server_client import PolicyClient
policy = PolicyClient(host="localhost", port=5555) # Connect to the policy server
env = YourEnvironment() # Create an environment
obs, info = env.reset() # Reset the environment
if not policy.ping(): # Verify connection
raise RuntimeError("Cannot connect to policy server!")
action, info = policy.get_action(obs) # Run inference
obs, reward, done, truncated, info = env.step(action) # Execute the actionDebugging with ReplayPolicy:
When developing a new environment integration or debugging your inference loop, you can use ReplayPolicy to replay recorded actions from an existing dataset. This helps verify that your environment setup, observation formatting, and action execution work correctly—without needing a trained model.
# Start server with ReplayPolicy (replays actions from dataset)
uv run python gr00t/eval/run_gr00t_server.py \
--dataset-path <DATASET_PATH> \
--embodiment-tag NEW_EMBODIMENT \
--execution-horizon 8 # should match the executed action horizon in the environmentThe server will replay actions from the first episode of the dataset. Use policy.reset(options={"episode_index": N}) on the client to switch to a different episode.
For detailed documentation on:
- How to adapt the policy to your own environment
- Server-client architecture for remote inference
- Observation and action formats
- Querying modality configurations
- Batched inference
- Troubleshooting common errors
See the complete Policy API Guide.
We support evaluation on available public benchmarks and our internal benchmarks. Our evaluation framework uses a server-client architecture that communicates via RESTful API. Both the policy server and simulation environment client use the same IP (usually localhost) and port to run simulation evaluation.
For the policy server, we reuse the project root's uv environment (same as finetuning) to run run_gr00t_server. For simulation environment clients, we provide individual setup scripts to configure uv environments, as they typically conflict with each other when using a single shared environment.
You can use the verification script to verify that all dependencies and environments for simulation evaluation are properly configured.
Please refer to each benchmark link below for more details.
Zero-shot Evaluation (evaluate without finetuning):
- RoboCasa: Instructions
- RoboCasa GR1 Tabletop Tasks: Instructions
Finetuned Evaluation (test after task-specific finetuning):
- G1 LocoManipulation: Instructions
- LIBERO: Instructions
- SimplerEnv: Instructions
- BEHAVIOR: Instructions
- PointNav: Instructions
- SO-100: Instructions
For more details, see CONTRIBUTING.md
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
@inproceedings{gr00tn1_2025,
archivePrefix = {arxiv},
eprint = {2503.14734},
title = {{GR00T} {N1}: An Open Foundation Model for Generalist Humanoid Robots},
author = {NVIDIA and Johan Bjorck and Fernando Castañeda, Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu},
month = {March},
year = {2025},
booktitle = {ArXiv Preprint},
}