Simulation Data Preparation#

Overview#

This repository supports both the LeRobotDataset v2.1 format and the RLDS format.

Environment Setup#

  1. Install the LeRobot environment. This repository depends on LeRobotDataset v2.1. For version details, refer to the following link and follow the official installation instructions:

https://github.com/huggingface/lerobot/commit/3354d919fc71130fe5b6b1d9997fdfc68fd6b42f

LeRobotDataset V3 was not adopted in order to maintain flexibility. To ensure alignment with this repository, it is recommended to use the specific commit ID for the data component. Through LeRobot, you can organize your datasets in any desired structure.

git clone https://github.com/huggingface/lerobot.git
cd lerobot
git checkout 55198de096f46a8e0447a8795129dd9ee84c088c
pip install -e .

Ready-to-Use LIBERO Data Example#

Any LeRobot V2.1 dataset is compatible with training. The following provides ready-to-use datasets that have been uploaded to Hugging Face and can be downloaded directly. Specify your desired storage path:

# libero_10
huggingface-cli download limxdynamics/FluxVLAData --repo-type dataset --include "libero_10_no_noops_lerobotv2.1/*" --local-dir ./datasets
# libero_spatial
huggingface-cli download limxdynamics/FluxVLAData --repo-type dataset --include "libero_spatial_no_noops_lerobotv2.1/*" --local-dir ./datasets
# libero_object
huggingface-cli download limxdynamics/FluxVLAData --repo-type dataset --include "libero_object_no_noops_lerobotv2.1/*" --local-dir ./datasets
# libero_goal
huggingface-cli download limxdynamics/FluxVLAData --repo-type dataset --include "libero_goal_no_noops_lerobotv2.1/*" --local-dir ./datasets

LIBERO LeRobotDataset v2.1 Data Format#

The converted LIBERO dataset adopts the LeRobotDataset v2.1 format, stored in a HuggingFace Datasets-compatible directory structure:

<output_dir>/<repo_id>/
|-- data/
│   ├── chunk-000/
│   │   ├── episode_000000.parquet
│   │   └── ...
│-- videos/
│   ├── chunk-000/
│   │   |-- observation.images.image
|   |   |   |-- episode_000000.mp4
|   |   |   └── ...
|-- meta
|   |-- episodes.jsonl
|   |-- episodes_stats.jsonl
|   |-- info.json
|   `-- tasks.jsonl

Parquet schema:

observation.state : fixed_size_list<element: float>[14]
action : fixed_size_list<element: float>[14]
observation.eepose : fixed_size_list<element: float>[14]
timestamp : float
frame_index : int64
episode_index : int64
index : int64
task_index : int64

TFRecords Format (LIBERO Example Only)#

Download directly from the following URL to use with the RLDS dataset loader:

https://huggingface.co/datasets/openvla/modified_libero_rlds