LIBERO 仿真数据训练#
LIBERO 是一个广泛使用的机器人操作基准环境,包含多种难度的任务套件。本页介绍如何在 FluxVLA 框架中使用 LIBERO 数据集分别对 GR00T-N1.5 和 π₀ 模型进行训练和评测,以 libero_10 为主要示例。
数据准备#
下载 LIBERO 数据集(LeRobot v2.1 格式):
python download_libero.py \
--repo-id Aiming1998/libero_10_no_noops_lerobotv2.1 \
--out /path/to/data/LIBERO_lerobot/libero_10_no_noops_lerobotv2.1
其他 LIBERO 任务套件:
# libero_spatial
python download_libero.py --repo-id Aiming1998/libero_spatial_no_noops_lerobotv2.1 --out ./LIBERO_lerobot/libero_spatial_no_noops_lerobotv2.1
# libero_object
python download_libero.py --repo-id Aiming1998/libero_object_no_noops_lerobotv2.1 --out ./LIBERO_lerobot/libero_object_no_noops_lerobotv2.1
# libero_goal
python download_libero.py --repo-id Aiming1998/libero_goal_no_noops_lerobotv2.1 --out ./LIBERO_lerobot/libero_goal_no_noops_lerobotv2.1
下载后的目录结构:
libero_10_no_noops_lerobotv2.1/
├── data/
│ └── chunk-000/
│ ├── episode_000000.parquet
│ └── ...
├── videos/
│ └── chunk-000/
│ ├── observation.images.image/
│ │ ├── episode_000000.mp4
│ │ └── ...
│ └── observation.images.wrist_image/
│ ├── episode_000000.mp4
│ └── ...
└── meta/
├── episodes.jsonl
├── episodes_stats.jsonl
├── info.json
└── tasks.jsonl
LIBERO 数据包含 2 个相机视角(image 主视角 + wrist_image 腕部视角),每帧包含 14 维状态和 7 维动作(末端执行器位置 + 旋转 + 夹爪)。
GR00T-N1.5 LIBERO 训练#
使用现成的配置文件 configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py,以下对关键配置进行说明。
1. 模型配置#
model = dict(
type='LlavaVLA',
pretrained_name_or_path='/path/to/models/GR00T-N1.5-3B',
vlm_backbone=dict(
type='EagleBackbone',
vlm_path='FluxVLA/models/third_party_models/eagle2_hg_model'),
vla_head=dict(
type='FlowMatchingHead',
state_dim=8, # 状态输入维度
hidden_size=1024,
input_embedding_dim=1536,
num_layers=1,
num_heads=4,
num_inference_timesteps=4,
traj_length=10, # 预测未来 10 步动作
action_dim=7), # LIBERO 动作维度
freeze_vlm_backbone=False,
name_mapping={
'vlm_backbone.vlm': 'backbone.eagle_model',
'vla_head': 'action_head'
},
freeze_projector=False)
LIBERO 场景下 state_dim=8(6 维末端位姿 + 2 维夹爪状态),action_dim=7(6 维末端增量 + 1 维夹爪)。
2. 数据配置#
train_dataloader = dict(
batch_size=128,
per_device_batch_size=8,
per_device_num_workers=4,
dataset=dict(
type='DistributedRepeatingDataset',
name_mappings={
'observation.state': ['proprio'],
'action': ['action']
},
statistic_keys=['observation.state', 'timestamp', 'action'],
statistic_name='libero_10_no_noops',
datasets=dict(
type='ParquetDataset',
data_root_path='/path/to/data/LIBERO_lerobot/libero_10_no_noops_1.0.0_lerobot',
transforms=[
dict(
type='ProcessParquetInputs',
embodiment_id=31,
parquet_keys=[
'observation.state', 'timestamp', 'actions',
'info', 'stats', 'action_masks'
],
video_keys=[
'observation.images.image',
'observation.images.wrist_image',
],
name_mappings={
'observation.state': ['states'],
'actions': ['actions']
}),
dict(type='ParquetPrompter'),
dict(
type='ProcessPromptsWithImage',
max_len=600,
num_images=2,
tokenizer=dict(
type='PretrainedTokenizer',
model_path='/path/to/models/eagle2_hg_model',
)),
dict(type='ResizeImages', height=224, width=224),
dict(
type='NormalizeImages',
means=[[123.515625, 116.04492188, 103.59375],
[123.515625, 116.04492188, 103.59375]],
stds=[[58.27148438, 57.02636719, 57.27539062],
[58.27148438, 57.02636719, 57.27539062]]),
dict(
type='NormalizeStatesAndActions',
action_dim=14,
state_key='proprio',
action_key='action',
use_quantiles=False)
],
action_window_size=10,
action_key='action',
use_delta=False,
statistic_name='libero_10_no_noops',
window_start_idx=0)))
数据变换流水线依次执行:
ProcessParquetInputs— 从 Parquet 文件中解析状态、动作和图像ParquetPrompter— 从数据中提取任务描述作为语言提示ProcessPromptsWithImage— 将文本和图像编码为模型输入 tokenResizeImages— 统一图像尺寸为 224×224NormalizeImages— 使用 Eagle 模型的标准归一化参数NormalizeStatesAndActions— 对状态和动作进行归一化
3. 训练配置#
runner = dict(
type='FSDPTrainRunner',
max_epochs=24,
learning_rate=2e-5,
weight_decay=0.0,
max_grad_norm=1.0,
sampler=None,
tokenizer=dict(
type='PretrainedTokenizer',
model_path='/path/to/models/eagle2_hg_model',
),
collator=dict(
type='DictCollator',
keys=[
'states', 'observation.eepose', 'timestamp', 'images',
'img_masks', 'lang_tokens', 'lang_masks', 'actions',
'action_masks', 'embodiment_ids'
],
meta_keys=['task_description', 'prompt', 'info', 'stats']),
metric=dict(
type='VLAMetric',
active_trackers=('jsonl', 'wandb'),
run_dir='work_dirs',
wandb_project='FluxVLA',
wandb_entity='limx',
grad_accumulation_steps=1,
window_size=1),
lr_scheduler_type='constant',
warmup_ratio=0.0,
enable_gradient_checkpointing=True,
enable_mixed_precision_training=True,
mixed_precision_dtype='bf16',
sharding_strategy='full-shard',
change_key_name=False)
训练使用 FSDP(Fully Sharded Data Parallel)策略进行分布式训练,BF16 混合精度,恒定学习率 2e-5。由于 LIBERO 数据量较小,需要训练 24 个 epoch 以充分收敛。
4. 评测配置#
eval = dict(
type='LiberoEvalRunner',
task_suite_name='libero_10',
model_family='pi0',
eval_chunk_size=10,
resize_size=224,
num_trials_per_task=50,
num_steps_wait=10,
seed=7,
dataset=dict(
type='LiberoParquetEvalDataset',
transforms=[
dict(
type='ProcessLiberoEvalInputs',
embodiment_id=31,
img_keys=['agentview_image', 'robot0_eye_in_hand_image']),
dict(
type='TransformImage',
image_resize_strategy='resize-naive',
input_sizes=[[3, 224, 224], [3, 224, 224]],
means=[[123.515625, 116.04492188, 103.59375],
[123.515625, 116.04492188, 103.59375]],
stds=[[58.27148438, 57.02636719, 57.27539062],
[58.27148438, 57.02636719, 57.27539062]]),
dict(
type='ProcessPromptsWithImage',
max_len=600,
num_images=2,
tokenizer=dict(
type='PretrainedTokenizer',
model_path='/path/to/models/eagle2_hg_model',
)),
dict(
type='LiberoProprioFromInputs',
use_quantiles=False,
pos_key='robot0_eef_pos',
quat_key='robot0_eef_quat',
gripper_key='robot0_gripper_qpos',
out_key='states'),
]),
denormalize_action=dict(
type='DenormalizeLiberoAction',
use_quantiles=False,
))
评测在 LIBERO 仿真环境中运行,对 libero_10 的每个任务执行 50 次试验并统计成功率。
5. 启动训练#
cd /path/to/fluxvla
# 设置环境变量(单节点 8 GPU)
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
bash scripts/train.sh \
configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
work_dirs/gr00t_eagle_3b_libero_10_full_train
在火山云 MLP 平台上,环境变量由平台自动注入,直接运行即可:
bash scripts/train.sh \
configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
work_dirs/gr00t_eagle_3b_libero_10_full_train
6. 评测#
训练完成后使用 eval.sh 进行评测:
export MLP_WORKER_GPU=1
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
bash scripts/eval.sh \
configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
work_dirs/gr00t_eagle_3b_libero_10_full_train/checkpoint_epoch_24.pt
也可在训练时附加 --eval-after-train 参数自动评测:
bash scripts/train.sh \
configs/gr00t/gr00t_eagle_3b_libero_10_full_train.py \
work_dirs/gr00t_eagle_3b_libero_10_full_train \
--eval-after-train
7. 训练输出#
训练完成后,work_dirs/gr00t_eagle_3b_libero_10_full_train/ 目录下会生成:
checkpoint_*.pt— 模型检查点dataset_statistics.json— 数据集统计信息(评测和推理时使用)config.yaml/config.json— 训练配置备份*.jsonl— 训练日志(loss 变化等)
π₀ LIBERO 训练#
1. 训练方式选择#
π₀.₅ 支持两种微调策略:
策略 |
Full Finetune |
LoRA Finetune |
|---|---|---|
配置文件 |
|
|
训练 Runner |
|
|
骨干冻结 |
不冻结( |
冻结 VLM 和 Expert |
显存需求 |
较高 |
较低 |
适用场景 |
大规模数据、追求最佳效果 |
数据量少、显存受限、快速实验 |
2. Full Finetune 配置详解#
使用配置文件 configs/pi05/pi05_paligemma_libero_10_full_finetune_pytorch.py。
2.1 模型配置#
model = dict(
type='PI05FlowMatching',
vlm_backbone=dict(
type='PaliGemma',
vlm_backbone_id='paligemma_3b_pt_224',
vlm_config=dict(...)), # PaliGemma 详细配置(通常不需修改)
proj_width=1024, # 投影维度
n_action_steps=10, # 预测未来 10 步动作
state_proj=dict(type='LinearProjector', in_dim=8, out_dim=1024),
action_in_proj=dict(type='LinearProjector', in_dim=7, out_dim=1024),
action_out_proj=dict(type='LinearProjector', in_dim=1024, out_dim=7),
action_time_mlp_in=dict(type='LinearProjector', in_dim=2048, out_dim=1024),
action_time_mlp_out=dict(type='LinearProjector', in_dim=1024, out_dim=1024),
llm_expert=dict(
type='GemmaLLMBackbone',
llm_backbone_id='gemma-2b_causal',
llm_family='gemma',
llm_config=dict(...)), # Gemma Expert 详细配置(通常不需修改)
freeze_vlm_backbone=False,
pretrained_name_or_path=
'/path/to/cache/openpi/openpi-assets/checkpoints/pi0_libero_pytorch/model.safetensors',
name_mapping={
'vlm_backbone.vlm.model.language_model':
'model.paligemma_with_expert.paligemma.model.language_model',
'vlm_backbone.vlm.model.vision_tower':
'model.paligemma_with_expert.paligemma.model.vision_tower',
'vlm_backbone.vlm.model.multi_modal_projector':
'model.paligemma_with_expert.paligemma.model.multi_modal_projector',
'action_time_mlp_in.projector': 'model.action_time_mlp_in',
'action_time_mlp_out.projector': 'model.action_time_mlp_out',
'llm_expert.llm.model':
'model.paligemma_with_expert.gemma_expert.model',
'state_proj.projector': 'model.state_proj',
'action_in_proj.projector': 'model.action_in_proj',
'action_out_proj.projector': 'model.action_out_proj'
})
关键维度参数:
参数 |
LIBERO |
说明 |
|---|---|---|
|
8 |
状态输入维度(6 维末端位姿 + 2 维夹爪) |
|
7 |
动作输入维度 |
|
7 |
动作输出维度 |
|
10 |
单次预测的动作序列长度 |
2.2 数据配置#
train_dataloader = dict(
batch_size=128,
per_device_batch_size=8,
per_device_num_workers=4,
dataset=dict(
type='DistributedRepeatingDataset',
name_mappings={
'observation.state': ['proprio'],
'action': ['action']
},
statistic_keys=['observation.state', 'timestamp', 'action'],
statistic_name='libero_10_no_noops',
datasets=dict(
type='ParquetDataset',
data_root_path='/path/to/data/LIBERO_lerobot/libero_10_no_noops_1.0.0_lerobot',
transforms=[
dict(
type='ProcessParquetInputs',
parquet_keys=[
'observation.state', 'timestamp', 'actions',
'info', 'stats', 'action_masks'
],
video_keys=[
'observation.images.image',
'observation.images.wrist_image',
],
name_mappings={
'observation.state': ['states'],
'actions': ['actions']
}),
dict(type='ParquetPrompter'),
dict(
type='ProcessPrompts',
tokenizer=dict(
type='PretrainedTokenizer',
model_path='/path/to/checkpoints/pi0',
)),
dict(type='ResizeImages', height=224, width=224),
dict(
type='NormalizeImages',
means=[[123.515625, 116.04492188, 103.59375],
[123.515625, 116.04492188, 103.59375]],
stds=[[58.27148438, 57.02636719, 57.27539062],
[58.27148438, 57.02636719, 57.27539062]]),
dict(
type='NormalizeStatesAndActions',
action_dim=14,
state_key='proprio',
action_key='action',
use_quantiles=False)
],
action_window_size=10,
action_key='action',
use_delta=False,
statistic_name='libero_10_no_noops',
window_start_idx=0)))
备注
π₀ 使用 ProcessPrompts 而非 GR00T 的 ProcessPromptsWithImage 进行文本处理。两者的 tokenizer 路径也不同:π₀ 使用 huggingface/pi0 tokenizer。
2.3 训练配置#
runner = dict(
type='FSDPTrainRunner',
max_epochs=24,
learning_rate=2e-5,
weight_decay=0.0,
max_grad_norm=1.0,
collator=dict(
type='DictCollator',
keys=[
'states', 'observation.eepose', 'timestamp', 'images',
'img_masks', 'lang_tokens', 'lang_masks', 'actions',
'action_masks'
],
meta_keys=['task_description', 'prompt', 'info', 'stats']),
sampler=None,
metric=dict(
type='VLAMetric',
active_trackers=('jsonl', 'wandb'),
run_dir='work_dirs',
wandb_project='FluxVLA',
wandb_entity='limx',
grad_accumulation_steps=1,
window_size=1),
lr_scheduler_type='constant',
warmup_ratio=0.0,
enable_gradient_checkpointing=True,
enable_mixed_precision_training=True,
mixed_precision_dtype='bf16',
sharding_strategy='full-shard',
change_key_name=False)
2.4 评测配置#
eval = dict(
type='LiberoEvalRunner',
task_suite_name='libero_10',
model_family='pi0',
eval_chunk_size=10,
resize_size=224,
num_trials_per_task=50,
num_steps_wait=10,
seed=7,
dataset=dict(
type='LiberoParquetEvalDataset',
transforms=[
dict(
type='ProcessLiberoEvalInputs',
img_keys=['agentview_image', 'robot0_eye_in_hand_image']),
dict(
type='TransformImage',
image_resize_strategy='resize-naive',
input_sizes=[[3, 224, 224], [3, 224, 224]],
means=[[123.515625, 116.04492188, 103.59375],
[123.515625, 116.04492188, 103.59375]],
stds=[[58.27148438, 57.02636719, 57.27539062],
[58.27148438, 57.02636719, 57.27539062]]),
dict(
type='LiberoPromptFromInputs',
tokenizer=dict(
type='PretrainedTokenizer',
model_path='/path/to/checkpoints/pi0',
)),
dict(
type='LiberoProprioFromInputs',
use_quantiles=False,
pos_key='robot0_eef_pos',
quat_key='robot0_eef_quat',
gripper_key='robot0_gripper_qpos',
out_key='states'),
]),
denormalize_action=dict(
type='DenormalizeLiberoAction',
use_quantiles=False))
3. LoRA Finetune 配置详解#
使用配置文件 configs/pi0/pi0_paligemma_libero_10_lora_finetune.py。
LoRA 模式与 Full Finetune 的关键差异:
model = dict(
type='PI0FlowMatching',
# ... 基础配置同上 ...
freeze_vlm_backbone=True, # 冻结视觉语言骨干
freeze_llm_expert=True, # 冻结动作专家
use_lora=True, # 启用 LoRA
lora_rank=32, # LoRA 秩
lora_dropout=0.0,
lora_target_modules=[ # LoRA 作用的模块
'q_proj', 'v_proj', 'k_proj', 'o_proj',
'state_proj.projector',
'action_in_proj.projector',
'action_out_proj.projector'
])
训练 Runner 使用 DDP 而非 FSDP:
runner = dict(
type='DDPTrainRunner', # LoRA 使用 DDP
learning_rate=0.0005, # LoRA 通常使用较高学习率
max_epochs=18,
save_epoch_interval=1,
# ... 其余同 Full Finetune ...
)
4. 启动训练#
cd /path/to/fluxvla
# 设置环境变量(单节点 8 GPU)
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
# Full Finetune
bash scripts/train.sh \
configs/pi0/pi0_paligemma_libero_10_full_finetune_pytorch.py \
work_dirs/pi0_paligemma_libero_10_full_finetune_pytorch
# 或 LoRA Finetune
bash scripts/train.sh \
configs/pi0/pi0_paligemma_libero_10_lora_finetune.py \
work_dirs/pi0_paligemma_libero_10_lora_finetune
5. 评测#
export MLP_WORKER_GPU=1
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
bash scripts/eval.sh \
configs/pi0/pi0_paligemma_libero_10_full_finetune_pytorch.py \
work_dirs/pi0_paligemma_libero_10_full_finetune_pytorch/checkpoint_epoch_24.pt