ALOHA Real-Robot Training and Deployment#

This section uses the Agilex Cobot Magic robot as an example to explain how to train and deploy models with real-robot data.

Data Format Conversion#

The data collection scripts provided by Agilex Cobot Magic save data in HDF5 format. We first need to convert this data to LeRobot dataset v2.1 format for FluxVLA training.

We provide a conversion script that can convert data from HDF5 to LeRobot v2.1 format in one step: The data conversion script is available from the project repository. Refer to the project README for details.

HDF5 files should be named in the episode_*.hdf5 format. The script will recursively search for all matching files in the specified directory.

Before running the conversion, verify that the HDF5 files contain the following required fields:

`/observations/qpos` - Robot Joint Positions#

Data type: float32 or float64
Shape: [num_frames, 14] or [num_frames, 16] (16-dim is automatically converted to 14-dim)
Joint order: Left arm (7 joints) + Right arm (7 joints)
Format details:
- 16-dim format: Gripper open/close is represented by absolute positions of two gripper fingers (8 dims total)
- 14-dim format: Gripper open/close is represented by a relative position normalized to [0, 0.1]
- Conversion formula: gripper_value = (left_finger - right_finger) * (0.1 / 0.07)

`/observations/images/<camera_name>` - Camera Images#

Supported cameras: head_cam, left_cam, right_cam
Formats:
- Uncompressed 4D numpy array [num_frames, height, width, channels] (uint8)
- Or JPEG-compressed byte stream [num_frames] (automatically decoded to RGB)

The converted dataset follows the LeRobot v2.1 format and is stored in a HuggingFace Datasets compatible directory structure:

<output_dir>/<repo_id>/
├── data/
│   ├── train/
│   │   ├── episode_0.parquet
│   │   └── ...
│   └── video/
│       ├── episode_0/
│       │   ├── observation.images.head_cam.mp4
│       │   └── ...
│       └── ...
├── info.json
└── meta.json

Model Training#

We use fine-tuning the Pi0.5 model as an example. First, prepare the Pi0.5 pre-trained model weights, or download our pre-trained Pi0.5 weights: Download the Pi0.5 pre-trained weights from HuggingFace or the project’s download link.

Then modify the dataset path in configs/pi05/pi05_paligemma_aloha_full_finetune.py:

    ...
    freeze_vision_backbone=False,
    pretrained_name_or_path=  # noqa: E251
    './checkpoints/pi05_base/model.safetensors' ,  # Replace with the actual pre-trained weight path
    name_mapping={
            ...

    ...
    datasets=dict(
            type='ParquetDataset',
            data_root_path=  # noqa: E251
                [
                    './datasets/RealRobot_AgileX_aloha_lerobot_v2/20250601_20250615_02_4090', 
                ],  # Replace with the actual dataset path
            transforms=[
                ...

Finally, set the environment variables according to the number of machines:

Single-node multi-GPU: Adjust MLP_WORKER_GPU according to the actual number of GPUs, then launch training using scripts/train.sh:

export MLP_WORKER_GPU=8  # Adjust according to the actual number of GPUs
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
bash scripts/train.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune # Replace with your desired output directory

Multi-node multi-GPU: Set the environment variables for each machine:

Node 0 (Master node):

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500

bash scripts/train.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune

Node 1:

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=1
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500

bash scripts/train.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune

Real-Robot Deployment#

1. Power On the Robot#

Turn on the power supply and power on all four robotic arms
Check that all four robotic arms have green indicator lights, indicating normal power-on

2. Software Environment Setup#

2.1 Create a Conda Environment#

conda create -n aloha_sdk3.0 python=3.8.19
conda activate aloha_sdk3.0

2.2 Install Piper SDK#

git clone https://github.com/agilexrobotics/piper_sdk.git
cd piper_sdk
pip3 install .

3. Launch the Robotic Arms#

Start the arm enable and control node (mode:=1 for follower mode, auto_enable:=true for automatic enabling):

conda activate aloha_sdk3.0
roslaunch piper start_ms_piper.launch mode:=1 auto_enable:=true

4. Launch the Cameras#

Start the three depth cameras:

roslaunch astra_camera nmulti_camera.launch

After startup, use rostopic list to verify that the following topics are being published:

Camera	Color Topic	Depth Topic	Point Cloud Topic
Front (f)	`/camera_f/color/image_raw`	`/camera_f/depth/image_raw`	`/camera_f/depth/points`
Hand (h)	`/camera_h/color/image_raw`	`/camera_h/depth/image_raw`	`/camera_h/depth/points`
Left (l)	`/camera_l/color/image_raw`	`/camera_l/depth/image_raw`	`/camera_l/depth/points`
Right (r)	`/camera_r/color/image_raw`	`/camera_r/depth/image_raw`	`/camera_r/depth/points`

Robotic arm joint state topics:

Topic	Description
`/master/joint_left`	Left master arm joint state
`/master/joint_right`	Right master arm joint state
`/puppet/joint_left`	Left follower arm joint state
`/puppet/joint_right`	Right follower arm joint state
`/puppet/end_pose_left`	Left follower arm end-effector pose
`/puppet/end_pose_right`	Right follower arm end-effector pose
`/left_arm/trajectory`	Left arm trajectory command
`/right_arm/trajectory`	Right arm trajectory command
`/left_arm/model_trajectory`	Left arm model output trajectory
`/right_arm/model_trajectory`	Right arm model output trajectory

5. Run FluxVLA Model Inference#

Launch via the IDE’s “Run and Debug” feature, or run directly from the command line:

WANDB_MODE=disabled HF_ENDPOINT=https://hf-mirror.com \
python scripts/inference_real_robot.py \
    --config configs/gr00t/gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train.py \
    --ckpt-path ./work_dirs/work_dirs/419acaf40_gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train_bs256/checkpoints/step-037356-epoch-06-loss=0.0617.pt

The corresponding VSCode launch.json configuration:

{
    "name": "Inference gr00t eagle",
    "type": "debugpy",
    "request": "launch",
    "program": "scripts/inference_real_robot.py",
    "console": "integratedTerminal",
    "args": [
        "--config", "configs/gr00t/gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train.py",
        "--ckpt-path", "./work_dirs/work_dirs/419acaf40_gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train_bs256/checkpoints/step-037356-epoch-06-loss=0.0617.pt"
    ],
    "env": {
        "WANDB_MODE": "disabled",
        "HF_ENDPOINT": "https://hf-mirror.com"
    }
}

6. Reset the Robotic Arms#

When the task is complete or a reset is needed, send a reset command to return both arms to the zero position (end-effector grippers remain slightly open at 0.1):

Left arm reset:

rostopic pub /master/joint_left sensor_msgs/JointState "header:
  seq: 0
  stamp: {secs: 0, nsecs: 0}
  frame_id: ''
name: ['']
position: [0,0,0,0,0,0,0.1]
velocity: [0]
effort: [0]"

Right arm reset:

rostopic pub /master/joint_right sensor_msgs/JointState "header:
  seq: 0
  stamp: {secs: 0, nsecs: 0}
  frame_id: ''
name: ['']
position: [0,0,0,0,0,0,0.1]
velocity: [0]
effort: [0]"

The 7 values in position correspond to 6 joint angles + 1 gripper open/close value. Setting all to zero returns the arm to its initial pose, while 0.1 keeps the gripper slightly open.