ALOHA Real-Robot Training and Deployment#
This section uses the Agilex Cobot Magic robot as an example to explain how to train and deploy models with real-robot data.
Data Format Conversion#
The data collection scripts provided by Agilex Cobot Magic save data in HDF5 format. We first need to convert this data to LeRobot dataset v2.1 format for FluxVLA training.
We provide a conversion script that can convert data from HDF5 to LeRobot v2.1 format in one step: The data conversion script is available from the project repository. Refer to the project README for details.
HDF5 files should be named in the episode_*.hdf5 format. The script will recursively search for all matching files in the specified directory.
Before running the conversion, verify that the HDF5 files contain the following required fields:
/observations/qpos - Robot Joint Positions#
Data type:
float32orfloat64Shape:
[num_frames, 14]or[num_frames, 16](16-dim is automatically converted to 14-dim)Joint order: Left arm (7 joints) + Right arm (7 joints)
Format details:
16-dim format: Gripper open/close is represented by absolute positions of two gripper fingers (8 dims total)
14-dim format: Gripper open/close is represented by a relative position normalized to [0, 0.1]
Conversion formula:
gripper_value = (left_finger - right_finger) * (0.1 / 0.07)
/observations/images/<camera_name> - Camera Images#
Supported cameras:
head_cam,left_cam,right_camFormats:
Uncompressed 4D numpy array
[num_frames, height, width, channels](uint8)Or JPEG-compressed byte stream
[num_frames](automatically decoded to RGB)
The converted dataset follows the LeRobot v2.1 format and is stored in a HuggingFace Datasets compatible directory structure:
<output_dir>/<repo_id>/
βββ data/
β βββ train/
β β βββ episode_0.parquet
β β βββ ...
β βββ video/
β βββ episode_0/
β β βββ observation.images.head_cam.mp4
β β βββ ...
β βββ ...
βββ info.json
βββ meta.json
Model Training#
We use fine-tuning the Pi0.5 model as an example. First, prepare the Pi0.5 pre-trained model weights, or download our pre-trained Pi0.5 weights: Download the Pi0.5 pre-trained weights from HuggingFace or the projectβs download link.
Then modify the dataset path in configs/pi05/pi05_paligemma_aloha_full_finetune.py:
...
freeze_vision_backbone=False,
pretrained_name_or_path= # noqa: E251
'./checkpoints/pi05_base/model.safetensors' , # Replace with the actual pre-trained weight path
name_mapping={
...
...
datasets=dict(
type='ParquetDataset',
data_root_path= # noqa: E251
[
'./datasets/RealRobot_AgileX_aloha_lerobot_v2/20250601_20250615_02_4090',
], # Replace with the actual dataset path
transforms=[
...
Finally, set the environment variables according to the number of machines:
Single-node multi-GPU: Adjust MLP_WORKER_GPU according to the actual number of GPUs, then launch training using scripts/train.sh:
export MLP_WORKER_GPU=8 # Adjust according to the actual number of GPUs
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
bash scripts/train.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune # Replace with your desired output directory
Multi-node multi-GPU: Set the environment variables for each machine:
Node 0 (Master node):
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500
bash scripts/train.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune
Node 1:
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=1
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500
bash scripts/train.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune
Real-Robot Deployment#
1. Power On the Robot#
Turn on the power supply and power on all four robotic arms
Check that all four robotic arms have green indicator lights, indicating normal power-on
2. Software Environment Setup#
2.1 Create a Conda Environment#
conda create -n aloha_sdk3.0 python=3.8.19
conda activate aloha_sdk3.0
2.2 Install Piper SDK#
git clone https://github.com/agilexrobotics/piper_sdk.git
cd piper_sdk
pip3 install .
3. Launch the Robotic Arms#
Start the arm enable and control node (mode:=1 for follower mode, auto_enable:=true for automatic enabling):
conda activate aloha_sdk3.0
roslaunch piper start_ms_piper.launch mode:=1 auto_enable:=true
4. Launch the Cameras#
Start the three depth cameras:
roslaunch astra_camera nmulti_camera.launch
After startup, use rostopic list to verify that the following topics are being published:
Camera |
Color Topic |
Depth Topic |
Point Cloud Topic |
|---|---|---|---|
Front (f) |
|
|
|
Hand (h) |
|
|
|
Left (l) |
|
|
|
Right (r) |
|
|
|
Robotic arm joint state topics:
Topic |
Description |
|---|---|
|
Left master arm joint state |
|
Right master arm joint state |
|
Left follower arm joint state |
|
Right follower arm joint state |
|
Left follower arm end-effector pose |
|
Right follower arm end-effector pose |
|
Left arm trajectory command |
|
Right arm trajectory command |
|
Left arm model output trajectory |
|
Right arm model output trajectory |
5. Run FluxVLA Model Inference#
Launch via the IDEβs βRun and Debugβ feature, or run directly from the command line:
WANDB_MODE=disabled HF_ENDPOINT=https://hf-mirror.com \
python scripts/inference_real_robot.py \
--config configs/gr00t/gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train.py \
--ckpt-path ./work_dirs/work_dirs/419acaf40_gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train_bs256/checkpoints/step-037356-epoch-06-loss=0.0617.pt
The corresponding VSCode launch.json configuration:
{
"name": "Inference gr00t eagle",
"type": "debugpy",
"request": "launch",
"program": "scripts/inference_real_robot.py",
"console": "integratedTerminal",
"args": [
"--config", "configs/gr00t/gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train.py",
"--ckpt-path", "./work_dirs/work_dirs/419acaf40_gr00t_eagle_3b_aloha_fold_towel_3cam_4090_full_train_bs256/checkpoints/step-037356-epoch-06-loss=0.0617.pt"
],
"env": {
"WANDB_MODE": "disabled",
"HF_ENDPOINT": "https://hf-mirror.com"
}
}
6. Reset the Robotic Arms#
When the task is complete or a reset is needed, send a reset command to return both arms to the zero position (end-effector grippers remain slightly open at 0.1):
Left arm reset:
rostopic pub /master/joint_left sensor_msgs/JointState "header:
seq: 0
stamp: {secs: 0, nsecs: 0}
frame_id: ''
name: ['']
position: [0,0,0,0,0,0,0.1]
velocity: [0]
effort: [0]"
Right arm reset:
rostopic pub /master/joint_right sensor_msgs/JointState "header:
seq: 0
stamp: {secs: 0, nsecs: 0}
frame_id: ''
name: ['']
position: [0,0,0,0,0,0,0.1]
velocity: [0]
effort: [0]"
The 7 values in
positioncorrespond to 6 joint angles + 1 gripper open/close value. Setting all to zero returns the arm to its initial pose, while0.1keeps the gripper slightly open.