LIBERO 仿真数据训练#

LIBERO 是一个广泛使用的机器人操作基准环境,包含多种难度的任务套件。本页介绍如何在 FluxVLA 框架中使用 LIBERO 数据集分别对 GR00T-N1.5 和 π₀ 模型进行训练和评测,以 libero_10 为主要示例。

数据准备#

下载 LIBERO 数据集(LeRobot v2.1 格式):

python download_libero.py \
    --repo-id Aiming1998/libero_10_no_noops_lerobotv2.1 \
    --out /path/to/data/LIBERO_lerobot/libero_10_no_noops_lerobotv2.1

其他 LIBERO 任务套件:

# libero_spatial
python download_libero.py --repo-id Aiming1998/libero_spatial_no_noops_lerobotv2.1 --out ./LIBERO_lerobot/libero_spatial_no_noops_lerobotv2.1
# libero_object
python download_libero.py --repo-id Aiming1998/libero_object_no_noops_lerobotv2.1 --out ./LIBERO_lerobot/libero_object_no_noops_lerobotv2.1
# libero_goal
python download_libero.py --repo-id Aiming1998/libero_goal_no_noops_lerobotv2.1 --out ./LIBERO_lerobot/libero_goal_no_noops_lerobotv2.1

下载后的目录结构:

libero_10_no_noops_lerobotv2.1/
├── data/
│   └── chunk-000/
│       ├── episode_000000.parquet
│       └── ...
├── videos/
│   └── chunk-000/
│       ├── observation.images.image/
│       │   ├── episode_000000.mp4
│       │   └── ...
│       └── observation.images.wrist_image/
│           ├── episode_000000.mp4
│           └── ...
└── meta/
    ├── episodes.jsonl
    ├── episodes_stats.jsonl
    ├── info.json
    └── tasks.jsonl

LIBERO 数据包含 2 个相机视角(image 主视角 + wrist_image 腕部视角),每帧包含 14 维状态和 7 维动作(末端执行器位置 + 旋转 + 夹爪)。

GR00T-N1.5 LIBERO 训练#

使用现成的配置文件 configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py,以下对关键配置进行说明。

1. 模型配置#

model = dict(
    type='LlavaVLA',
    pretrained_name_or_path='/path/to/models/GR00T-N1.5-3B',
    vlm_backbone=dict(
        type='EagleBackbone',
        vlm_path='FluxVLA/models/third_party_models/eagle2_hg_model'),
    vla_head=dict(
        type='FlowMatchingHead',
        state_dim=8,              # 状态输入维度
        hidden_size=1024,
        input_embedding_dim=1536,
        num_layers=1,
        num_heads=4,
        num_inference_timesteps=4,
        traj_length=10,           # 预测未来 10 步动作
        action_dim=7),            # LIBERO 动作维度
    freeze_vlm_backbone=False,
    name_mapping={
        'vlm_backbone.vlm': 'backbone.eagle_model',
        'vla_head': 'action_head'
    },
    freeze_projector=False)

LIBERO 场景下 state_dim=8(6 维末端位姿 + 2 维夹爪状态),action_dim=7(6 维末端增量 + 1 维夹爪)。

2. 数据配置#

train_dataloader = dict(
    batch_size=128,
    per_device_batch_size=8,
    per_device_num_workers=4,
    dataset=dict(
        type='DistributedRepeatingDataset',
        name_mappings={
            'observation.state': ['proprio'],
            'action': ['action']
        },
        statistic_keys=['observation.state', 'timestamp', 'action'],
        statistic_name='libero_10_no_noops',
        datasets=dict(
            type='ParquetDataset',
            data_root_path='/path/to/data/LIBERO_lerobot/libero_10_no_noops_1.0.0_lerobot',
            transforms=[
                dict(
                    type='ProcessParquetInputs',
                    embodiment_id=31,
                    parquet_keys=[
                        'observation.state', 'timestamp', 'actions',
                        'info', 'stats', 'action_masks'
                    ],
                    video_keys=[
                        'observation.images.image',
                        'observation.images.wrist_image',
                    ],
                    name_mappings={
                        'observation.state': ['states'],
                        'actions': ['actions']
                    }),
                dict(type='ParquetPrompter'),
                dict(
                    type='ProcessPromptsWithImage',
                    max_len=600,
                    num_images=2,
                    tokenizer=dict(
                        type='PretrainedTokenizer',
                        model_path='/path/to/models/eagle2_hg_model',
                    )),
                dict(type='ResizeImages', height=224, width=224),
                dict(
                    type='NormalizeImages',
                    means=[[123.515625, 116.04492188, 103.59375],
                           [123.515625, 116.04492188, 103.59375]],
                    stds=[[58.27148438, 57.02636719, 57.27539062],
                          [58.27148438, 57.02636719, 57.27539062]]),
                dict(
                    type='NormalizeStatesAndActions',
                    action_dim=14,
                    state_key='proprio',
                    action_key='action',
                    use_quantiles=False)
            ],
            action_window_size=10,
            action_key='action',
            use_delta=False,
            statistic_name='libero_10_no_noops',
            window_start_idx=0)))

数据变换流水线依次执行:

  1. ProcessParquetInputs — 从 Parquet 文件中解析状态、动作和图像

  2. ParquetPrompter — 从数据中提取任务描述作为语言提示

  3. ProcessPromptsWithImage — 将文本和图像编码为模型输入 token

  4. ResizeImages — 统一图像尺寸为 224×224

  5. NormalizeImages — 使用 Eagle 模型的标准归一化参数

  6. NormalizeStatesAndActions — 对状态和动作进行归一化

3. 训练配置#

runner = dict(
    type='FSDPTrainRunner',
    max_epochs=24,
    learning_rate=2e-5,
    weight_decay=0.0,
    max_grad_norm=1.0,
    sampler=None,
    tokenizer=dict(
        type='PretrainedTokenizer',
        model_path='/path/to/models/eagle2_hg_model',
    ),
    collator=dict(
        type='DictCollator',
        keys=[
            'states', 'observation.eepose', 'timestamp', 'images',
            'img_masks', 'lang_tokens', 'lang_masks', 'actions',
            'action_masks', 'embodiment_ids'
        ],
        meta_keys=['task_description', 'prompt', 'info', 'stats']),
    metric=dict(
        type='VLAMetric',
        active_trackers=('jsonl', 'wandb'),
        run_dir='work_dirs',
        wandb_project='FluxVLA',
        wandb_entity='limx',
        grad_accumulation_steps=1,
        window_size=1),
    lr_scheduler_type='constant',
    warmup_ratio=0.0,
    enable_gradient_checkpointing=True,
    enable_mixed_precision_training=True,
    mixed_precision_dtype='bf16',
    sharding_strategy='full-shard',
    change_key_name=False)

训练使用 FSDP(Fully Sharded Data Parallel)策略进行分布式训练,BF16 混合精度,恒定学习率 2e-5。由于 LIBERO 数据量较小,需要训练 24 个 epoch 以充分收敛。

4. 评测配置#

eval = dict(
    type='LiberoEvalRunner',
    task_suite_name='libero_10',
    model_family='pi0',
    eval_chunk_size=10,
    resize_size=224,
    num_trials_per_task=50,
    num_steps_wait=10,
    seed=7,
    dataset=dict(
        type='LiberoParquetEvalDataset',
        transforms=[
            dict(
                type='ProcessLiberoEvalInputs',
                embodiment_id=31,
                img_keys=['agentview_image', 'robot0_eye_in_hand_image']),
            dict(
                type='TransformImage',
                image_resize_strategy='resize-naive',
                input_sizes=[[3, 224, 224], [3, 224, 224]],
                means=[[123.515625, 116.04492188, 103.59375],
                       [123.515625, 116.04492188, 103.59375]],
                stds=[[58.27148438, 57.02636719, 57.27539062],
                      [58.27148438, 57.02636719, 57.27539062]]),
            dict(
                type='ProcessPromptsWithImage',
                max_len=600,
                num_images=2,
                tokenizer=dict(
                    type='PretrainedTokenizer',
                    model_path='/path/to/models/eagle2_hg_model',
                )),
            dict(
                type='LiberoProprioFromInputs',
                use_quantiles=False,
                pos_key='robot0_eef_pos',
                quat_key='robot0_eef_quat',
                gripper_key='robot0_gripper_qpos',
                out_key='states'),
        ]),
    denormalize_action=dict(
        type='DenormalizeLiberoAction',
        use_quantiles=False,
    ))

评测在 LIBERO 仿真环境中运行,对 libero_10 的每个任务执行 50 次试验并统计成功率。

5. 启动训练#

cd /path/to/fluxvla

# 设置环境变量(单节点 8 GPU)
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

bash scripts/train.sh \
    configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
    work_dirs/gr00t_eagle_3b_libero_10_full_train

在火山云 MLP 平台上,环境变量由平台自动注入,直接运行即可:

bash scripts/train.sh \
    configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
    work_dirs/gr00t_eagle_3b_libero_10_full_train

6. 评测#

训练完成后使用 eval.sh 进行评测:

export MLP_WORKER_GPU=1
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
    work_dirs/gr00t_eagle_3b_libero_10_full_train/checkpoint_epoch_24.pt

也可在训练时附加 --eval-after-train 参数自动评测:

bash scripts/train.sh \
    configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
    work_dirs/gr00t_eagle_3b_libero_10_full_train \
    --eval-after-train

7. 训练输出#

训练完成后,work_dirs/gr00t_eagle_3b_libero_10_full_train/ 目录下会生成:

  • checkpoint_*.pt — 模型检查点

  • dataset_statistics.json — 数据集统计信息(评测和推理时使用)

  • config.yaml / config.json — 训练配置备份

  • *.jsonl — 训练日志(loss 变化等)

π₀ LIBERO 训练#

1. 训练方式选择#

π₀.₅ 支持两种微调策略:

策略

Full Finetune

LoRA Finetune

配置文件

pi05_paligemma_libero_10_full_finetune_pytorch.py

pi05_paligemma_libero_10_lora_finetune.py

训练 Runner

FSDPTrainRunner

DDPTrainRunner

骨干冻结

不冻结(freeze_vlm_backbone=False

冻结 VLM 和 Expert

显存需求

较高

较低

适用场景

大规模数据、追求最佳效果

数据量少、显存受限、快速实验

2. Full Finetune 配置详解#

使用配置文件 configs/pi05/pi05_paligemma_libero_10_full_finetune_pytorch.py

2.1 模型配置#

model = dict(
    type='PI05FlowMatching',
    vlm_backbone=dict(
        type='PaliGemma',
        vlm_backbone_id='paligemma_3b_pt_224',
        vlm_config=dict(...)),       # PaliGemma 详细配置(通常不需修改)
    proj_width=1024,                 # 投影维度
    n_action_steps=10,               # 预测未来 10 步动作
    state_proj=dict(type='LinearProjector', in_dim=8, out_dim=1024),
    action_in_proj=dict(type='LinearProjector', in_dim=7, out_dim=1024),
    action_out_proj=dict(type='LinearProjector', in_dim=1024, out_dim=7),
    action_time_mlp_in=dict(type='LinearProjector', in_dim=2048, out_dim=1024),
    action_time_mlp_out=dict(type='LinearProjector', in_dim=1024, out_dim=1024),
    llm_expert=dict(
        type='GemmaLLMBackbone',
        llm_backbone_id='gemma-2b_causal',
        llm_family='gemma',
        llm_config=dict(...)),       # Gemma Expert 详细配置(通常不需修改)
    freeze_vlm_backbone=False,
    pretrained_name_or_path=
        '/path/to/cache/openpi/openpi-assets/checkpoints/pi0_libero_pytorch/model.safetensors',
    name_mapping={
        'vlm_backbone.vlm.model.language_model':
            'model.paligemma_with_expert.paligemma.model.language_model',
        'vlm_backbone.vlm.model.vision_tower':
            'model.paligemma_with_expert.paligemma.model.vision_tower',
        'vlm_backbone.vlm.model.multi_modal_projector':
            'model.paligemma_with_expert.paligemma.model.multi_modal_projector',
        'action_time_mlp_in.projector': 'model.action_time_mlp_in',
        'action_time_mlp_out.projector': 'model.action_time_mlp_out',
        'llm_expert.llm.model':
            'model.paligemma_with_expert.gemma_expert.model',
        'state_proj.projector': 'model.state_proj',
        'action_in_proj.projector': 'model.action_in_proj',
        'action_out_proj.projector': 'model.action_out_proj'
    })

关键维度参数:

参数

LIBERO

说明

state_proj.in_dim

8

状态输入维度(6 维末端位姿 + 2 维夹爪)

action_in_proj.in_dim

7

动作输入维度

action_out_proj.out_dim

7

动作输出维度

n_action_steps

10

单次预测的动作序列长度

2.2 数据配置#

train_dataloader = dict(
    batch_size=128,
    per_device_batch_size=8,
    per_device_num_workers=4,
    dataset=dict(
        type='DistributedRepeatingDataset',
        name_mappings={
            'observation.state': ['proprio'],
            'action': ['action']
        },
        statistic_keys=['observation.state', 'timestamp', 'action'],
        statistic_name='libero_10_no_noops',
        datasets=dict(
            type='ParquetDataset',
            data_root_path='/path/to/data/LIBERO_lerobot/libero_10_no_noops_1.0.0_lerobot',
            transforms=[
                dict(
                    type='ProcessParquetInputs',
                    parquet_keys=[
                        'observation.state', 'timestamp', 'actions',
                        'info', 'stats', 'action_masks'
                    ],
                    video_keys=[
                        'observation.images.image',
                        'observation.images.wrist_image',
                    ],
                    name_mappings={
                        'observation.state': ['states'],
                        'actions': ['actions']
                    }),
                dict(type='ParquetPrompter'),
                dict(
                    type='ProcessPrompts',
                    tokenizer=dict(
                        type='PretrainedTokenizer',
                        model_path='/path/to/checkpoints/pi0',
                    )),
                dict(type='ResizeImages', height=224, width=224),
                dict(
                    type='NormalizeImages',
                    means=[[123.515625, 116.04492188, 103.59375],
                           [123.515625, 116.04492188, 103.59375]],
                    stds=[[58.27148438, 57.02636719, 57.27539062],
                          [58.27148438, 57.02636719, 57.27539062]]),
                dict(
                    type='NormalizeStatesAndActions',
                    action_dim=14,
                    state_key='proprio',
                    action_key='action',
                    use_quantiles=False)
            ],
            action_window_size=10,
            action_key='action',
            use_delta=False,
            statistic_name='libero_10_no_noops',
            window_start_idx=0)))

备注

π₀ 使用 ProcessPrompts 而非 GR00T 的 ProcessPromptsWithImage 进行文本处理。两者的 tokenizer 路径也不同:π₀ 使用 huggingface/pi0 tokenizer。

2.3 训练配置#

runner = dict(
    type='FSDPTrainRunner',
    max_epochs=24,
    learning_rate=2e-5,
    weight_decay=0.0,
    max_grad_norm=1.0,
    collator=dict(
        type='DictCollator',
        keys=[
            'states', 'observation.eepose', 'timestamp', 'images',
            'img_masks', 'lang_tokens', 'lang_masks', 'actions',
            'action_masks'
        ],
        meta_keys=['task_description', 'prompt', 'info', 'stats']),
    sampler=None,
    metric=dict(
        type='VLAMetric',
        active_trackers=('jsonl', 'wandb'),
        run_dir='work_dirs',
        wandb_project='FluxVLA',
        wandb_entity='limx',
        grad_accumulation_steps=1,
        window_size=1),
    lr_scheduler_type='constant',
    warmup_ratio=0.0,
    enable_gradient_checkpointing=True,
    enable_mixed_precision_training=True,
    mixed_precision_dtype='bf16',
    sharding_strategy='full-shard',
    change_key_name=False)

2.4 评测配置#

eval = dict(
    type='LiberoEvalRunner',
    task_suite_name='libero_10',
    model_family='pi0',
    eval_chunk_size=10,
    resize_size=224,
    num_trials_per_task=50,
    num_steps_wait=10,
    seed=7,
    dataset=dict(
        type='LiberoParquetEvalDataset',
        transforms=[
            dict(
                type='ProcessLiberoEvalInputs',
                img_keys=['agentview_image', 'robot0_eye_in_hand_image']),
            dict(
                type='TransformImage',
                image_resize_strategy='resize-naive',
                input_sizes=[[3, 224, 224], [3, 224, 224]],
                means=[[123.515625, 116.04492188, 103.59375],
                       [123.515625, 116.04492188, 103.59375]],
                stds=[[58.27148438, 57.02636719, 57.27539062],
                      [58.27148438, 57.02636719, 57.27539062]]),
            dict(
                type='LiberoPromptFromInputs',
                tokenizer=dict(
                    type='PretrainedTokenizer',
                    model_path='/path/to/checkpoints/pi0',
                )),
            dict(
                type='LiberoProprioFromInputs',
                use_quantiles=False,
                pos_key='robot0_eef_pos',
                quat_key='robot0_eef_quat',
                gripper_key='robot0_gripper_qpos',
                out_key='states'),
        ]),
    denormalize_action=dict(
        type='DenormalizeLiberoAction',
        use_quantiles=False))

3. LoRA Finetune 配置详解#

使用配置文件 configs/pi0/pi0_paligemma_libero_10_lora_finetune.py

LoRA 模式与 Full Finetune 的关键差异:

model = dict(
    type='PI0FlowMatching',
    # ... 基础配置同上 ...
    freeze_vlm_backbone=True,        # 冻结视觉语言骨干
    freeze_llm_expert=True,          # 冻结动作专家
    use_lora=True,                   # 启用 LoRA
    lora_rank=32,                    # LoRA 秩
    lora_dropout=0.0,
    lora_target_modules=[            # LoRA 作用的模块
        'q_proj', 'v_proj', 'k_proj', 'o_proj',
        'state_proj.projector',
        'action_in_proj.projector',
        'action_out_proj.projector'
    ])

训练 Runner 使用 DDP 而非 FSDP:

runner = dict(
    type='DDPTrainRunner',           # LoRA 使用 DDP
    learning_rate=0.0005,            # LoRA 通常使用较高学习率
    max_epochs=18,
    save_epoch_interval=1,
    # ... 其余同 Full Finetune ...
)

4. 启动训练#

cd /path/to/fluxvla

# 设置环境变量(单节点 8 GPU)
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

# Full Finetune
bash scripts/train.sh \
    configs/pi0/pi0_paligemma_libero_10_full_finetune_pytorch.py \
    work_dirs/pi0_paligemma_libero_10_full_finetune_pytorch

# 或 LoRA Finetune
bash scripts/train.sh \
    configs/pi0/pi0_paligemma_libero_10_lora_finetune.py \
    work_dirs/pi0_paligemma_libero_10_lora_finetune

5. 评测#

export MLP_WORKER_GPU=1
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/pi0/pi0_paligemma_libero_10_full_finetune_pytorch.py \
    work_dirs/pi0_paligemma_libero_10_full_finetune_pytorch/checkpoint_epoch_24.pt