Model Evaluation#

scripts/eval.sh Usage Guide

Overview#

scripts/eval.sh is a shell script for launching distributed model evaluation. It wraps the torchrun command to facilitate the evaluation of trained models in multi-node, multi-GPU environments.

Basic Usage#

bash scripts/eval.sh [CONFIG] [CKPT_PATH] [additional arguments...]

Script Parameters#

Positional Arguments#

Argument

Position

Description

CONFIG

$1

Path to the evaluation configuration file (typically the same as the training configuration)

CKPT_PATH

$2

Path to the model checkpoint file

Additional arguments

$3+

Additional arguments passed directly to eval.py

Environment Variables (Required for Distributed Evaluation)#

These environment variables configure the distributed parameters for torchrun:

Environment Variable

Description

Example

MLP_WORKER_GPU

Number of GPUs per node

8

MLP_WORKER_NUM

Total number of nodes participating in evaluation

1

MLP_ROLE_INDEX

Rank index of the current node (starting from 0)

0

MLP_WORKER_0_HOST

IP address or hostname of the master node (rank 0)

{MASTER_NODE_IP}

MLP_WORKER_0_PORT

Communication port of the master node

29500

Additional Arguments Supported by eval.py#

The following arguments can be passed as additional arguments to eval.py:

Argument

Type

Description

--config

path

Configuration file path (automatically passed by the script)

--ckpt-path

path

Checkpoint file path (automatically passed by the script)

--cfg-options

key=value pairs

Override configuration file settings in the format xxx=yyy

Usage Examples#

1. Single-Node Single-GPU Evaluation#

# Set environment variables
export MLP_WORKER_GPU=1
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

# Launch evaluation
bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

2. Single-Node Multi-GPU Evaluation (8 GPUs)#

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

3. Volcengine Cloud Platform Evaluation#

On the Volcengine MLP platform, the environment variables MLP_WORKER_GPU, MLP_WORKER_NUM, MLP_ROLE_INDEX, MLP_WORKER_0_HOST, and MLP_WORKER_0_PORT are automatically injected by the platform. No manual configuration is required; simply execute the command directly.

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

4. Multi-Node Distributed Evaluation (Manual Configuration)#

If not running on the Volcengine platform, environment variables must be configured manually:

Node 0 (Master Node):

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

Node 1:

export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=1
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500

bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt

5. Overriding Configuration with Additional Arguments#

# Modify the evaluation environment
bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt \
    --cfg-options eval.env.task_suite_name=libero_goal

6. Evaluating the Latest Checkpoint#

# Use the latest.pt symlink (if available)
bash scripts/eval.sh \
    configs/pi05/pi05_paligemma_libero10_full_finetune.py \
    work_dirs/pi05_paligemma_libero10_full_finetune/latest-checkpoint.pt

Output Artifacts#

Upon evaluation completion, the following outputs are generated in the working directory:

  • rollouts/ - Video recordings of the evaluation process (if save_rollout_videos is enabled)

    • Video file naming format: {date}--episode={n}--success={True/False}--task={task_name}.mp4

  • Evaluation metric logs - Containing statistical information such as success rates and completion rates

Important Notes#

  1. Checkpoint Path: Ensure that CKPT_PATH points to a valid checkpoint file, typically located within the training work_dir directory.

  2. Configuration Consistency: The configuration file used for evaluation should be consistent with the one used during training, or at minimum, the model architecture-related configurations must be identical.

  3. Environment Variables Must Be Set: All MLP_* environment variables must be correctly configured before executing the script; otherwise, torchrun will fail to initialize distributed evaluation.

  4. GPU Memory: Although GPU memory consumption during evaluation is generally lower than during training, sufficient GPU memory must still be available.

  5. Evaluation Environment Dependencies: Certain evaluation tasks (e.g., LIBERO) may require additional environment dependencies. Ensure that all requisite packages are properly installed.

  6. Port Conflicts: Ensure that the port specified by MLP_WORKER_0_PORT is not occupied and is accessible through the firewall.

  7. NCCL Backend: NCCL is used as the default distributed backend. Ensure compatibility between CUDA and NCCL versions.