Model Evaluation#
scripts/eval.sh Usage Guide
Overview#
scripts/eval.sh is a shell script for launching distributed model evaluation. It wraps the torchrun command to facilitate the evaluation of trained models in multi-node, multi-GPU environments.
Basic Usage#
bash scripts/eval.sh [CONFIG] [CKPT_PATH] [additional arguments...]
Script Parameters#
Positional Arguments#
Argument |
Position |
Description |
|---|---|---|
|
$1 |
Path to the evaluation configuration file (typically the same as the training configuration) |
|
$2 |
Path to the model checkpoint file |
Additional arguments |
$3+ |
Additional arguments passed directly to |
Environment Variables (Required for Distributed Evaluation)#
These environment variables configure the distributed parameters for torchrun:
Environment Variable |
Description |
Example |
|---|---|---|
|
Number of GPUs per node |
|
|
Total number of nodes participating in evaluation |
|
|
Rank index of the current node (starting from 0) |
|
|
IP address or hostname of the master node (rank 0) |
|
|
Communication port of the master node |
|
Additional Arguments Supported by eval.py#
The following arguments can be passed as additional arguments to eval.py:
Argument |
Type |
Description |
|---|---|---|
|
path |
Configuration file path (automatically passed by the script) |
|
path |
Checkpoint file path (automatically passed by the script) |
|
key=value pairs |
Override configuration file settings in the format |
Usage Examples#
1. Single-Node Single-GPU Evaluation#
# Set environment variables
export MLP_WORKER_GPU=1
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
# Launch evaluation
bash scripts/eval.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt
2. Single-Node Multi-GPU Evaluation (8 GPUs)#
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500
bash scripts/eval.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt
3. Volcengine Cloud Platform Evaluation#
On the Volcengine MLP platform, the environment variables MLP_WORKER_GPU, MLP_WORKER_NUM, MLP_ROLE_INDEX, MLP_WORKER_0_HOST, and MLP_WORKER_0_PORT are automatically injected by the platform. No manual configuration is required; simply execute the command directly.
bash scripts/eval.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt
4. Multi-Node Distributed Evaluation (Manual Configuration)#
If not running on the Volcengine platform, environment variables must be configured manually:
Node 0 (Master Node):
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500
bash scripts/eval.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt
Node 1:
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=2
export MLP_ROLE_INDEX=1
export MLP_WORKER_0_HOST={MASTER_NODE_IP}
export MLP_WORKER_0_PORT=29500
bash scripts/eval.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt
5. Overriding Configuration with Additional Arguments#
# Modify the evaluation environment
bash scripts/eval.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt \
--cfg-options eval.env.task_suite_name=libero_goal
6. Evaluating the Latest Checkpoint#
# Use the latest.pt symlink (if available)
bash scripts/eval.sh \
configs/pi05/pi05_paligemma_libero10_full_finetune.py \
work_dirs/pi05_paligemma_libero10_full_finetune/latest-checkpoint.pt
Output Artifacts#
Upon evaluation completion, the following outputs are generated in the working directory:
rollouts/- Video recordings of the evaluation process (ifsave_rollout_videosis enabled)Video file naming format:
{date}--episode={n}--success={True/False}--task={task_name}.mp4
Evaluation metric logs - Containing statistical information such as success rates and completion rates
Important Notes#
Checkpoint Path: Ensure that
CKPT_PATHpoints to a valid checkpoint file, typically located within the trainingwork_dirdirectory.Configuration Consistency: The configuration file used for evaluation should be consistent with the one used during training, or at minimum, the model architecture-related configurations must be identical.
Environment Variables Must Be Set: All
MLP_*environment variables must be correctly configured before executing the script; otherwise,torchrunwill fail to initialize distributed evaluation.GPU Memory: Although GPU memory consumption during evaluation is generally lower than during training, sufficient GPU memory must still be available.
Evaluation Environment Dependencies: Certain evaluation tasks (e.g., LIBERO) may require additional environment dependencies. Ensure that all requisite packages are properly installed.
Port Conflicts: Ensure that the port specified by
MLP_WORKER_0_PORTis not occupied and is accessible through the firewall.NCCL Backend: NCCL is used as the default distributed backend. Ensure compatibility between CUDA and NCCL versions.