Real-Robot Data Preparation#

Overview#

Training data collected from physical robots is commonly stored in HDF5 format. This document describes how to convert such data into the LeRobot Dataset v2.1 format for use with FluxVLA training.

Data conversion script: Available from the project repository. Refer to the project README for details.

Data Format Requirements#

Input Data Format (HDF5)#

HDF5 files should follow the episode_*.hdf5 naming convention. The conversion script recursively searches the specified directory for all matching files.

Required Fields#

`/observations/qpos` - Robot Joint Positions#

Data type: float32 or float64
Shape: [num_frames, 14] or [num_frames, 16] (16-dimensional data is automatically converted to 14-dimensional)
Joint order: Left arm (7 joints) + Right arm (7 joints)
Format details:
- 16-dimensional format: Gripper aperture is represented by the absolute positions of two gripper fingers (8 dimensions total)
- 14-dimensional format: Gripper aperture is represented by a relative position normalized to [0, 0.1]
- Conversion formula: gripper_value = (left_finger - right_finger) * (0.1 / 0.07)

`/observations/images/<camera_name>` - Camera Images#

Supported cameras: head_cam, left_cam, right_cam
Format:
- Uncompressed 4-dimensional numpy array [num_frames, height, width, channels] (uint8)
- Or JPEG-compressed byte stream [num_frames] (automatically decoded to RGB)

Optional Fields#

`/action` - Robot Desired Joint Positions#

Data type: float32 or float64
Shape: [num_frames, 14] or [num_frames, 16] (16-dimensional data is automatically converted to 14-dimensional)

`/observations/eepose` - End-Effector Pose#

Data type: float32 or float64
Shape: [num_frames, 14]
Description: Contains position (x, y, z) and quaternion (qx, qy, qz, qw) for both left and right end-effectors

`/observations/images_depth/<camera_name>_depth` - Depth Images#

Supported cameras: head_cam, left_cam, right_cam
Data type: uint16 (values in millimeters)
Shape: [num_frames, height, width]
Description: Requires add_infos = ["depth"] to be set in the DatasetConfig for processing

Output Data Format (LeRobot v2.1)#

The converted dataset adopts the LeRobot v2.1 format, stored in a HuggingFace Datasets-compatible directory structure:

<output_dir>/<repo_id>/
├── data/
│   ├── train/
│   │   ├── episode_0.parquet
│   │   └── ...
│   └── video/
│       ├── episode_0/
│       │   ├── observation.images.head_cam.mp4
│       │   └── ...
│       └── ...
├── info.json
└── meta.json

Data Field Descriptions#

Each episode’s parquet file contains the following fields:

`observation.state`#

Type: float32
Shape: (14,)
Description: Robot joint states; field ordering is identical to the input qpos

`observation.images.<camera_name>`#

Type: VideoFrame object
Description: Camera image reference containing the video file path and frame timestamp
Supported cameras: head_cam, left_cam, right_cam
Video specifications:
- Frame rate: 30 FPS
- Image dimensions: (480, 640, 3)
- Storage location: MP4 files in the video/ directory

`action` (Optional)#

Type: float32
Shape: (14,)
Description: Robot actions; generated only when the input contains the /action field

`observation.eepose` (Optional)#

Type: float32
Shape: (14,)
Description: End-effector pose with field ordering identical to the input; generated only when the input contains /observations/eepose

`observation.depth.<camera_name>` (Optional)#

Type: uint16
Shape: (480, 640)
Description: Depth images; generated only when the input contains depth images and add_infos = ["depth"] is configured

`task`#

Type: string
Default value: "pick up the yellow banana and put it on the pink plate"
Description: Task label; customizable via the init_task parameter

Real-Robot Data Preparation#

Overview#

Data Format Requirements#

Input Data Format (HDF5)#

Required Fields#

/observations/qpos - Robot Joint Positions#

/observations/images/<camera_name> - Camera Images#

Optional Fields#

/action - Robot Desired Joint Positions#

/observations/eepose - End-Effector Pose#

/observations/images_depth/<camera_name>_depth - Depth Images#

Output Data Format (LeRobot v2.1)#

Data Field Descriptions#

observation.state#

observation.images.<camera_name>#

action (Optional)#

observation.eepose (Optional)#

observation.depth.<camera_name> (Optional)#

task#