推理/评测配置#

推理配置定义了真机部署时的任务指令、数据处理流程、机器人初始姿态及硬件通信方式。我们通过 inference 字典进行配置,核心内容包括:

  • type:推理运行器类型,如 AlohaInferenceRunner

  • task_descriptions:任务描述字典

  • seed:随机种子

  • dataset:推理数据集配置(图像键名与 transforms)

  • denormalize_action:动作反归一化配置

  • action_chunk:动作块大小

  • prepare_pose:机器人准备姿态

  • operator:机器人操作器配置(ROS topic)

以下是一个完整的推理配置示例:

inference = dict(
    type='AlohaInferenceRunner',
    task_descriptions={
        '1': ('Fold the white towel in half, then fold it again, '
              'and make final adjustments to ensure the edges are '
              'neatly aligned.')
    },
    seed=7,
    dataset=dict(
        type='PrivateInferenceDataset',
        img_keys=[
            'cam_high',
            'cam_left_wrist',
            'cam_right_wrist'
        ],
        transforms=[
            dict(
                type='NormalizeStatesAndActions',
                state_dim=32,
                state_key='proprio',
                action_key='action',
                norm_type='min_max'),
            dict(type='PreparePromptWithState'),
            dict(
                type='ProcessPrompts',
                tokenizer=dict(
                    type='PretrainedTokenizer',
                    model_path='/path/to/checkpoints/paligemma-3b-pt-224',
                )),
            dict(type='ResizeImages', height=224, width=224),
            dict(type='SimpleNormalizeImages'),
        ]),
    denormalize_action=dict(
        type='DenormalizePrivateAction',
        norm_type='min_max',
        action_dim=14,
    ),
    action_chunk=50,
    prepare_pose=[
        [-0.19779752, 1.07020684, -0.61802348,
         -1.30887565, 1.1520192, 2.10289164, 0.092],
        [0.34008822, 0.95214585, -0.56617991,
         1.13862221, 0.82892144, -1.80234897, 0.06909]
    ],
    operator=dict(
        type='AlohaOperator',
        img_front_topic='/camera_h/color/image_raw',
        img_left_topic='/camera_l/color/image_raw',
        img_right_topic='/camera_r/color/image_raw',
        img_front_depth_topic='/camera_h/depth/image_raw',
        img_left_depth_topic='/camera_l/depth/image_raw',
        img_right_depth_topic='/camera_r/depth/image_raw',
        puppet_arm_left_cmd_topic='/master/joint_left',
        puppet_arm_right_cmd_topic='/master/joint_right',
        puppet_arm_left_topic='/puppet/joint_left',
        puppet_arm_right_topic='/puppet/joint_right',
        robot_base_topic='/odom_raw',
        robot_base_cmd_topic='/cmd_vel',
    ))

除真机推理外,我们还支持在仿真环境中进行模型评测。评测配置通过 eval 字典定义,核心内容包括:

  • type:评测运行器类型,如 LiberoEvalRunner

  • task_suite_name:任务套件名称(如 libero_10

  • model_family:模型系列标识

  • 评测参数eval_chunk_sizenum_trials_per_tasknum_steps_waitseed

  • dataset:评测数据集配置

  • denormalize_action:动作反归一化配置

以下是一个 LIBERO 仿真评测配置示例:

eval = dict(
    type='LiberoEvalRunner',
    task_suite_name='libero_10',
    model_family='pi0',
    eval_chunk_size=10,
    resize_size=224,
    num_trials_per_task=50,
    num_steps_wait=10,
    seed=7,
    dataset=dict(
        type='LiberoParquetEvalDataset',
        transforms=[
            dict(
                type='ProcessLiberoEvalInputs',
                img_keys=[
                    'agentview_image',
                    'robot0_eye_in_hand_image'
                ],
            ),
            dict(
                type='TransformImage',
                image_resize_strategy='resize-naive',
                input_sizes=[
                    [3, 224, 224],
                    [3, 224, 224]
                ],
                means=[
                    [123.515625, 116.04492188, 103.59375],
                    [123.515625, 116.04492188, 103.59375]
                ],
                stds=[
                    [58.27148438, 57.02636719, 57.27539062],
                    [58.27148438, 57.02636719, 57.27539062]
                ],
            ),
            dict(
                type='LiberoPromptFromInputs',
                use_conversation=False,
                tokenizer=dict(
                    type='PaligemmaTokenizer'
                )),
            dict(
                type='LiberoProprioFromInputs',
                norm_type='mean_std',
                pos_key='robot0_eef_pos',
                quat_key='robot0_eef_quat',
                gripper_key='robot0_gripper_qpos',
                state_dim=32,
                out_key='states'),
        ]),
    denormalize_action=dict(
        type='DenormalizeLiberoAction',
        norm_type='mean_std',
        action_dim=7,
    ),
)