Inference / Evaluation Configuration#
Inference configuration defines the task instructions, data processing pipeline, robot initial pose, and hardware communication settings for real-robot deployment. It is configured via the inference dict. The core fields include:
type: Inference runner type, e.g.,
AlohaInferenceRunnertask_descriptions: Task description dictionary
seed: Random seed
dataset: Inference dataset configuration (image key names and transforms)
denormalize_action: Action denormalization configuration
action_chunk: Action chunk size
prepare_pose: Robot preparation pose
operator: Robot operator configuration (ROS topics)
Below is a complete inference configuration example:
inference = dict(
type='AlohaInferenceRunner',
task_descriptions={
'1': ('Fold the white towel in half, then fold it again, '
'and make final adjustments to ensure the edges are '
'neatly aligned.')
},
seed=7,
dataset=dict(
type='PrivateInferenceDataset',
img_keys=[
'cam_high',
'cam_left_wrist',
'cam_right_wrist'
],
transforms=[
dict(
type='NormalizeStatesAndActions',
state_dim=32,
state_key='proprio',
action_key='action',
norm_type='min_max'),
dict(type='PreparePromptWithState'),
dict(
type='ProcessPrompts',
tokenizer=dict(
type='PretrainedTokenizer',
model_path='/path/to/checkpoints/paligemma-3b-pt-224',
)),
dict(type='ResizeImages', height=224, width=224),
dict(type='SimpleNormalizeImages'),
]),
denormalize_action=dict(
type='DenormalizePrivateAction',
norm_type='min_max',
action_dim=14,
),
action_chunk=50,
prepare_pose=[
[-0.19779752, 1.07020684, -0.61802348,
-1.30887565, 1.1520192, 2.10289164, 0.092],
[0.34008822, 0.95214585, -0.56617991,
1.13862221, 0.82892144, -1.80234897, 0.06909]
],
operator=dict(
type='AlohaOperator',
img_front_topic='/camera_h/color/image_raw',
img_left_topic='/camera_l/color/image_raw',
img_right_topic='/camera_r/color/image_raw',
img_front_depth_topic='/camera_h/depth/image_raw',
img_left_depth_topic='/camera_l/depth/image_raw',
img_right_depth_topic='/camera_r/depth/image_raw',
puppet_arm_left_cmd_topic='/master/joint_left',
puppet_arm_right_cmd_topic='/master/joint_right',
puppet_arm_left_topic='/puppet/joint_left',
puppet_arm_right_topic='/puppet/joint_right',
robot_base_topic='/odom_raw',
robot_base_cmd_topic='/cmd_vel',
))
In addition to real-robot inference, we also support model evaluation in simulation environments. Evaluation configuration is defined via the eval dict. The core fields include:
type: Evaluation runner type, e.g.,
LiberoEvalRunnertask_suite_name: Task suite name (e.g.,
libero_10)model_family: Model family identifier
Evaluation parameters:
eval_chunk_size,num_trials_per_task,num_steps_wait,seeddataset: Evaluation dataset configuration
denormalize_action: Action denormalization configuration
Below is a LIBERO simulation evaluation configuration example:
eval = dict(
type='LiberoEvalRunner',
task_suite_name='libero_10',
model_family='pi0',
eval_chunk_size=10,
resize_size=224,
num_trials_per_task=50,
num_steps_wait=10,
seed=7,
dataset=dict(
type='LiberoParquetEvalDataset',
transforms=[
dict(
type='ProcessLiberoEvalInputs',
img_keys=[
'agentview_image',
'robot0_eye_in_hand_image'
],
),
dict(
type='TransformImage',
image_resize_strategy='resize-naive',
input_sizes=[
[3, 224, 224],
[3, 224, 224]
],
means=[
[123.515625, 116.04492188, 103.59375],
[123.515625, 116.04492188, 103.59375]
],
stds=[
[58.27148438, 57.02636719, 57.27539062],
[58.27148438, 57.02636719, 57.27539062]
],
),
dict(
type='LiberoPromptFromInputs',
use_conversation=False,
tokenizer=dict(
type='PaligemmaTokenizer'
)),
dict(
type='LiberoProprioFromInputs',
norm_type='mean_std',
pos_key='robot0_eef_pos',
quat_key='robot0_eef_quat',
gripper_key='robot0_gripper_qpos',
state_dim=32,
out_key='states'),
]),
denormalize_action=dict(
type='DenormalizeLiberoAction',
norm_type='mean_std',
action_dim=7,
),
)