Skip to content

Commit 6a477aa

Browse files
Add Libero and RoboCasa benchmark and add variant agg for SimplerEnv (#357)
* libero * Add libero eval. * Add readme * lint * lint Signed-off-by: youliangt <youliangt@nvidia.com> * move to examples/Libero folder. * update README * Add RoboCasa Tabletop benchmark. * Update Readme. * address comments * update readme with higher number * update numbers and checkpoints and data configs. * format * update robocasa number * update libero * tyro * lint * libero long * upload to hf * nit * add disclaimer * update README after uploading 1k traj data. --------- Signed-off-by: youliangt <youliangt@nvidia.com> Co-authored-by: youliangt <youliangt@nvidia.com>
1 parent b211007 commit 6a477aa

File tree

11 files changed

+882
-25
lines changed

11 files changed

+882
-25
lines changed

examples/Libero/README.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# GR00T Libero Benchmarks
2+
3+
This directory contains fine-tuning and evaluation scripts for **GR00T N1.5** on the Libero benchmark suite.
4+
5+
6+
7+
## 🎯 Model Evaluation
8+
9+
Evaluation is performed using [`run_libero_eval.py`](https://github.com/NVIDIA/Isaac-GR00T/examples/Libero/eval/run_libero_eval.py).
10+
11+
<!-- Spatial: /mnt/amlfs-02/shared/checkpoints/xiaoweij/0827/libero-checkpoints-20K/checkpoint-20000/ -->
12+
<!-- Goal: /mnt/amlfs-02/shared/checkpoints/xiaoweij/0911/libero-goal-checkpoints-20K/ https://wandb.ai/nv-gear/huggingface/runs/wibov9ph?nw=nwuserxiaoweij -->
13+
<!-- Object: /mnt/amlfs-02/shared/checkpoints/xiaoweij/0904/libero-object-checkpoints-20K/ https://wandb.ai/nv-gear/huggingface/runs/38tmzwcw?nw=nwuserxiaoweij -->
14+
<!-- Libero-90: /mnt/amlfs-02/shared/checkpoints/xiaoweij/0905/libero-90-checkpoints-60K/ https://wandb.ai/nv-gear/huggingface/runs/3wpxrsri?nw=nwuserxiaoweij -->
15+
<!-- Libero-Long: /mnt/amlfs-02/shared/checkpoints/xiaoweij/0908/libero-10-checkpoints-60K/ https://wandb.ai/nv-gear/huggingface/runs/cyh7mdtx?nw=nwuserxiaoweij -->
16+
17+
### Eval Result and Training Config Table
18+
19+
| Task | Success rate | max_steps | grad_accum_steps | batch_size | Data config | Checkpoint |
20+
|-----------|--------------------|-----------|------------------|------------|-----------------------------------------------------------------|-----------------------------------------------|
21+
| Spatial | 46/50 (92%) | 20K | 1 | 128 | examples.Libero.custom_data_config:LiberoDataConfig |youliangtan/gr00t-n1.5-libero-spatial-posttrain|
22+
| Goal | 43/50 (86%) | 20K | 4 | 72 | examples.Libero.custom_data_config:LiberoDataConfigMeanStd |youliangtan/gr00t-n1.5-libero-goal-posttrain|
23+
| Object | 46/50 (92%) | 20K | 1 | 128 | examples.Libero.custom_data_config:LiberoDataConfig |youliangtan/gr00t-n1.5-libero-object-posttrain|
24+
| Libero-90 | 402/450 (89.3%) | 60K | 1 | 128 | examples.Libero.custom_data_config:LiberoDataConfig |youliangtan/gr00t-n1.5-libero-90-posttrain|
25+
| Long | 38/50 (76%) | 60K | 1 | 128 | examples.Libero.custom_data_config:LiberoDataConfig |youliangtan/gr00t-n1.5-libero-long-posttrain|
26+
27+
28+
29+
30+
31+
> Note: The results reported above were obtained with minimal hyperparameter tuning and are intended primarily for demonstration purposes. More comprehensive studies have fine-tuned GR00T on LIBERO and achieved substantially higher performance. For example, see Table 3 in this [paper](https://arxiv.org/pdf/2508.21112).
32+
----
33+
34+
To evaluate, first start the inference server with our provided checkpoint:
35+
36+
```bash
37+
python scripts/inference_service.py \
38+
--model_path youliangtan/gr00t-n1.5-libero-spatial-posttrain \
39+
--server \
40+
--data_config examples.Libero.custom_data_config:LiberoDataConfig \
41+
--denoising-steps 8 \
42+
--port 5555 \
43+
--embodiment-tag new_embodiment
44+
```
45+
46+
> Note, for **Libero-Goal**, the checkpoints are trained using data config: `examples.Libero.custom_data_config:LiberoDataConfigMeanStd`. So the corresponding checkpoints should be served using commands:
47+
```bash
48+
python scripts/inference_service.py \
49+
--model_path /mnt/amlfs-02/shared/checkpoints/xiaoweij/0913/libero-goal-checkpoints-20K/checkpoint-20000 \
50+
--server \
51+
--data_config examples.Libero.custom_data_config:LiberoDataConfigMeanStd \
52+
--denoising-steps 8 \
53+
--port 5555 \
54+
--embodiment-tag new_embodiment
55+
```
56+
----
57+
58+
### Installation
59+
60+
Follow the [official Libero installation guide](https://lifelong-robot-learning.github.io/LIBERO/html/getting_started/installation.html).
61+
62+
### Troubleshooting
63+
64+
If you see:
65+
```
66+
ModuleNotFoundError: No module named 'robosuite.environments.manipulation.single_arm_env'
67+
```
68+
69+
Make sure you install:
70+
```bash
71+
pip install robosuite==1.4.0
72+
```
73+
74+
Then run the evaluation:
75+
```bash
76+
cd examples/Libero/eval
77+
python run_libero_eval.py --task_suite_name spatial
78+
```
79+
80+
----
81+
82+
## Reproduce Training Results
83+
84+
To reproduce the training results, you can use the following steps:
85+
1. Download the datasets
86+
2. Add the modality configuration files
87+
3. Fine-tune the model
88+
4. Evaluate the model (same as above)
89+
90+
## 📦 1. Dataset Preparation
91+
92+
### Dataset Downloads
93+
Download LeRobot-compatible datasets directly from Hugging Face.
94+
95+
```bash
96+
huggingface-cli download \
97+
--repo-type dataset IPEC-COMMUNITY/libero_spatial_no_noops_1.0.0_lerobot \
98+
--local-dir /tmp/libero_spatial/
99+
```
100+
101+
> 🔄 Replace with the appropriate dataset name:
102+
> - `IPEC-COMMUNITY/libero_goal_no_noops_1.0.0_lerobot` (for **goal**)
103+
> - `IPEC-COMMUNITY/libero_object_no_noops_1.0.0_lerobot` (for **object**)
104+
> - `IPEC-COMMUNITY/libero_90_no_noops_lerobot` (for **libero-90**)
105+
> - `IPEC-COMMUNITY/libero_10_no_noops_lerobot` (for **libero-10**)
106+
107+
### Modality Configuration
108+
109+
After downloading the datasets, you need to add the appropriate modality configuration files to make them compatible with GR00T N1.5. These configuration files define the observation and action space mappings.
110+
111+
```bash
112+
cp examples/Libero/modality.json /tmp/libero_spatial/meta/modality.json
113+
```
114+
115+
## 🚀 Model Fine-tuning
116+
117+
### Training Commands
118+
119+
The fine-tuning script supports multiple configurations.
120+
121+
```bash
122+
python scripts/gr00t_finetune.py \
123+
--dataset-path /tmp/libero_spatial/ \
124+
--data_config examples.Libero.custom_data_config:LiberoDataConfig \
125+
--num-gpus 8 \
126+
--batch-size 128 \
127+
--output-dir /tmp/my_libero_spatial_checkpoint/ \
128+
--max-steps 60000 \
129+
--video-backend torchvision_av
130+
```
131+
> Note, replace with the corresponding data config class and training configs according to the [table](#training-config-table).

examples/Libero/__init__.py

Whitespace-only changes.
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
17+
from gr00t.data.transform.base import ComposedModalityTransform, ModalityTransform
18+
from gr00t.data.transform.concat import ConcatTransform
19+
from gr00t.data.transform.state_action import StateActionToTensor, StateActionTransform
20+
from gr00t.data.transform.video import (
21+
VideoColorJitter,
22+
VideoCrop,
23+
VideoResize,
24+
VideoToNumpy,
25+
VideoToTensor,
26+
)
27+
from gr00t.experiment.data_config import BaseDataConfig
28+
from gr00t.model.transforms import GR00TTransform
29+
30+
31+
class LiberoDataConfig(BaseDataConfig):
32+
video_keys = [
33+
"video.image",
34+
"video.wrist_image",
35+
]
36+
state_keys = [
37+
"state.x",
38+
"state.y",
39+
"state.z",
40+
"state.roll",
41+
"state.pitch",
42+
"state.yaw",
43+
"state.gripper",
44+
]
45+
action_keys = [
46+
"action.x",
47+
"action.y",
48+
"action.z",
49+
"action.roll",
50+
"action.pitch",
51+
"action.yaw",
52+
"action.gripper",
53+
]
54+
language_keys = ["annotation.human.action.task_description"]
55+
observation_indices = [0]
56+
action_indices = list(range(16))
57+
58+
def transform(self, action_norm: str = "min_max") -> ModalityTransform:
59+
if action_norm == "min_max":
60+
action_transform = StateActionTransform(
61+
apply_to=self.action_keys,
62+
normalization_modes={key: "min_max" for key in self.action_keys},
63+
)
64+
else:
65+
action_transform = StateActionTransform(
66+
apply_to=self.action_keys,
67+
normalization_modes={
68+
"action.x": "mean_std",
69+
"action.y": "mean_std",
70+
"action.z": "mean_std",
71+
"action.roll": "mean_std",
72+
"action.pitch": "mean_std",
73+
"action.yaw": "mean_std",
74+
"action.gripper": "min_max",
75+
},
76+
)
77+
transforms = [
78+
# video transforms
79+
VideoToTensor(apply_to=self.video_keys),
80+
VideoCrop(apply_to=self.video_keys, scale=0.95),
81+
VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
82+
VideoColorJitter(
83+
apply_to=self.video_keys,
84+
brightness=0.3,
85+
contrast=0.4,
86+
saturation=0.5,
87+
hue=0.08,
88+
),
89+
VideoToNumpy(apply_to=self.video_keys),
90+
# state transforms
91+
StateActionToTensor(apply_to=self.state_keys),
92+
StateActionTransform(
93+
apply_to=self.state_keys,
94+
normalization_modes={key: "min_max" for key in self.state_keys},
95+
),
96+
# action transforms
97+
StateActionToTensor(apply_to=self.action_keys),
98+
action_transform,
99+
# concat transforms
100+
ConcatTransform(
101+
video_concat_order=self.video_keys,
102+
state_concat_order=self.state_keys,
103+
action_concat_order=self.action_keys,
104+
),
105+
# model-specific transform
106+
GR00TTransform(
107+
state_horizon=len(self.observation_indices),
108+
action_horizon=len(self.action_indices),
109+
max_state_dim=64,
110+
max_action_dim=32,
111+
),
112+
]
113+
return ComposedModalityTransform(transforms=transforms)
114+
115+
116+
class LiberoDataConfigMeanStd(LiberoDataConfig):
117+
"""Apply mean_std normalization to actions other than gripper."""
118+
119+
def transform(self) -> ModalityTransform:
120+
return super().transform(action_norm="mean_std")

examples/Libero/eval/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)