Model Evaluation#

scripts/eval.sh Usage Guide

Overview#

scripts/eval.sh is a shell script for launching distributed model evaluation. It wraps the torchrun command to facilitate the evaluation of trained models in multi-node, multi-GPU environments.

Basic Usage#

bash scripts/eval.sh [CONFIG] [CKPT_PATH] [additional arguments...]

Script Parameters#

Positional Arguments#

Argument	Position	Description
`CONFIG`	$1	Path to the evaluation configuration file (typically the same as the training configuration)
`CKPT_PATH`	$2	Path to the model checkpoint file
Additional arguments	$3+	Additional arguments passed directly to `eval.py`

Environment Variables (Required for Distributed Evaluation)#

These environment variables configure the distributed parameters for torchrun:

Environment Variable	Description	Example
`MLP_WORKER_GPU`	Number of GPUs per node	`8`
`MLP_WORKER_NUM`	Total number of nodes participating in evaluation	`1`
`MLP_ROLE_INDEX`	Rank index of the current node (starting from 0)	`0`
`MLP_WORKER_0_HOST`	IP address or hostname of the master node (rank 0)	`{MASTER_NODE_IP}`
`MLP_WORKER_0_PORT`	Communication port of the master node	`29500`

Additional Arguments Supported by eval.py#

The following arguments can be passed as additional arguments to eval.py:

Argument	Type	Description
`--config`	path	Configuration file path (automatically passed by the script)
`--ckpt-path`	path	Checkpoint file path (automatically passed by the script)
`--cfg-options`	key=value pairs	Override configuration file settings in the format `xxx=yyy`

Usage Examples#

1. Single-Node Single-GPU Evaluation#

# Set environment variables
export MLP_WORKER_GPU=1
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

# Launch evaluation
bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

2. Single-Node Multi-GPU Evaluation (8 GPUs)#

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

3. Volcengine Cloud Platform Evaluation#

On the Volcengine MLP platform, the environment variables MLP_WORKER_GPU, MLP_WORKER_NUM, MLP_ROLE_INDEX, MLP_WORKER_0_HOST, and MLP_WORKER_0_PORT are automatically injected by the platform. No manual configuration is required; simply execute the command directly.

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

4. Multi-Node Distributed Evaluation (Manual Configuration)#

If not running on the Volcengine platform, environment variables must be configured manually:

Node 0 (Master Node):

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

Node 1:

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=1
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

5. Overriding Configuration with Additional Arguments#

# Modify the evaluation environment
bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt \
    --cfg-options eval.env.task_suite_name=libero_goal

6. Evaluating the Latest Checkpoint#

# Use the latest.pt symlink (if available)
bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/latest-checkpoint.pt

Output Artifacts#

Upon evaluation completion, the following outputs are generated in the working directory:

rollouts/ - Video recordings of the evaluation process (if save_rollout_videos is enabled)
- Video file naming format: {date}--episode={n}--success={True/False}--task={task_name}.mp4
Evaluation metric logs - Containing statistical information such as success rates and completion rates

Important Notes#

Checkpoint Path: Ensure that CKPT_PATH points to a valid checkpoint file, typically located within the training work_dir directory.
Configuration Consistency: The configuration file used for evaluation should be consistent with the one used during training, or at minimum, the model architecture-related configurations must be identical.
Environment Variables Must Be Set: All MLP_* environment variables must be correctly configured before executing the script; otherwise, torchrun will fail to initialize distributed evaluation.
GPU Memory: Although GPU memory consumption during evaluation is generally lower than during training, sufficient GPU memory must still be available.
Evaluation Environment Dependencies: Certain evaluation tasks (e.g., LIBERO) may require additional environment dependencies. Ensure that all requisite packages are properly installed.
Port Conflicts: Ensure that the port specified by MLP_WORKER_0_PORT is not occupied and is accessible through the firewall.
NCCL Backend: NCCL is used as the default distributed backend. Ensure compatibility between CUDA and NCCL versions.