Inference / Evaluation Configuration#

Inference configuration defines the task instructions, data processing pipeline, robot initial pose, and hardware communication settings for real-robot deployment. It is configured via the inference dict. The core fields include:

type: Inference runner type, e.g., AlohaInferenceRunner
task_descriptions: Task description dictionary
seed: Random seed
dataset: Inference dataset configuration (image key names and transforms)
denormalize_action: Action denormalization configuration
action_chunk: Action chunk size
prepare_pose: Robot preparation pose
operator: Robot operator configuration (ROS topics)

Below is a complete inference configuration example:

inference = dict(
    type='AlohaInferenceRunner',
    task_descriptions={
        '1': ('Fold the white towel in half, then fold it again, '
              'and make final adjustments to ensure the edges are '
              'neatly aligned.')
    },
    seed=7,
    dataset=dict(
        type='PrivateInferenceDataset',
        img_keys=[
            'cam_high',
            'cam_left_wrist',
            'cam_right_wrist'
        ],
        transforms=[
            dict(
                type='NormalizeStatesAndActions',
                state_dim=32,
                state_key='proprio',
                action_key='action',
                norm_type='min_max'),
            dict(type='PreparePromptWithState'),
            dict(
                type='ProcessPrompts',
                tokenizer=dict(
                    type='PretrainedTokenizer',
                    model_path='/path/to/checkpoints/paligemma-3b-pt-224',
                )),
            dict(type='ResizeImages', height=224, width=224),
            dict(type='SimpleNormalizeImages'),
        ]),
    denormalize_action=dict(
        type='DenormalizePrivateAction',
        norm_type='min_max',
        action_dim=14,
    ),
    action_chunk=50,
    prepare_pose=[
        [-0.19779752, 1.07020684, -0.61802348,
         -1.30887565, 1.1520192, 2.10289164, 0.092],
        [0.34008822, 0.95214585, -0.56617991,
         1.13862221, 0.82892144, -1.80234897, 0.06909]
    ],
    operator=dict(
        type='AlohaOperator',
        img_front_topic='/camera_h/color/image_raw',
        img_left_topic='/camera_l/color/image_raw',
        img_right_topic='/camera_r/color/image_raw',
        img_front_depth_topic='/camera_h/depth/image_raw',
        img_left_depth_topic='/camera_l/depth/image_raw',
        img_right_depth_topic='/camera_r/depth/image_raw',
        puppet_arm_left_cmd_topic='/master/joint_left',
        puppet_arm_right_cmd_topic='/master/joint_right',
        puppet_arm_left_topic='/puppet/joint_left',
        puppet_arm_right_topic='/puppet/joint_right',
        robot_base_topic='/odom_raw',
        robot_base_cmd_topic='/cmd_vel',
    ))

In addition to real-robot inference, we also support model evaluation in simulation environments. Evaluation configuration is defined via the eval dict. The core fields include:

type: Evaluation runner type, e.g., LiberoEvalRunner
task_suite_name: Task suite name (e.g., libero_10)
model_family: Model family identifier
Evaluation parameters: eval_chunk_size, num_trials_per_task, num_steps_wait, seed
dataset: Evaluation dataset configuration
denormalize_action: Action denormalization configuration

Below is a LIBERO simulation evaluation configuration example:

eval = dict(
    type='LiberoEvalRunner',
    task_suite_name='libero_10',
    model_family='pi0',
    eval_chunk_size=10,
    resize_size=224,
    num_trials_per_task=50,
    num_steps_wait=10,
    seed=7,
    dataset=dict(
        type='LiberoParquetEvalDataset',
        transforms=[
            dict(
                type='ProcessLiberoEvalInputs',
                img_keys=[
                    'agentview_image',
                    'robot0_eye_in_hand_image'
                ],
            ),
            dict(
                type='TransformImage',
                image_resize_strategy='resize-naive',
                input_sizes=[
                    [3, 224, 224],
                    [3, 224, 224]
                ],
                means=[
                    [123.515625, 116.04492188, 103.59375],
                    [123.515625, 116.04492188, 103.59375]
                ],
                stds=[
                    [58.27148438, 57.02636719, 57.27539062],
                    [58.27148438, 57.02636719, 57.27539062]
                ],
            ),
            dict(
                type='LiberoPromptFromInputs',
                use_conversation=False,
                tokenizer=dict(
                    type='PaligemmaTokenizer'
                )),
            dict(
                type='LiberoProprioFromInputs',
                norm_type='mean_std',
                pos_key='robot0_eef_pos',
                quat_key='robot0_eef_quat',
                gripper_key='robot0_gripper_qpos',
                state_dim=32,
                out_key='states'),
        ]),
    denormalize_action=dict(
        type='DenormalizeLiberoAction',
        norm_type='mean_std',
        action_dim=7,
    ),
)