Training and Evaluation Script Interfaces#

This page focuses on two frequently used entry points: scripts/train.sh and scripts/eval.sh.

Purpose#

scripts/train.sh: Launches distributed training, wrapping torchrun.
scripts/eval.sh: Launches distributed evaluation, wrapping torchrun.

Training Entry Point (`scripts/train.sh`)#

Command Format#

bash scripts/train.sh [CONFIG] [WORK_DIR] [额外参数...]

Positional Arguments#

Argument	Position	Default	Description
`CONFIG`	`$1`	`configs/pi05/pi05_paligemma_libero10_full_finetune.py`	Training configuration file path
`WORK_DIR`	`$2`	`work_dirs/pi05_paligemma_libero10_full_finetune`	Log and checkpoint output directory
Additional arguments	`$3+`	None	Passed through to `train.py`

Distributed Environment Variables#

Environment Variable	Description
`MLP_WORKER_GPU`	Number of GPUs per node
`MLP_WORKER_NUM`	Total number of nodes
`MLP_ROLE_INDEX`	Rank of the current node
`MLP_WORKER_0_HOST`	Master node address
`MLP_WORKER_0_PORT`	Master node port

Common Additional Arguments#

Argument	Type	Description
`--cfg-options`	`key=value` pair	Override configuration items
`--eval-after-train`	`flag`	Automatically evaluate after training
`--resume-from`	path	Resume from a checkpoint

Evaluation Entry Point (`scripts/eval.sh`)#

Command Format#

bash scripts/eval.sh [CONFIG] [CKPT_PATH] [额外参数...]

Positional Arguments#

Argument	Position	Description
`CONFIG`	`$1`	Evaluation configuration file path
`CKPT_PATH`	`$2`	Checkpoint file path
Additional arguments	`$3+`	Passed through to `eval.py`

Common Additional Arguments#

Argument	Type	Description
`--cfg-options`	`key=value` pair	Override evaluation configuration items

Minimal Runnable Example#

# 训练
export MLP_WORKER_GPU=8
export MLP_WORKER_NUM=1
export MLP_ROLE_INDEX=0
export MLP_WORKER_0_HOST=localhost
export MLP_WORKER_0_PORT=29500

bash scripts/train.sh \
  configs/pi05/pi05_paligemma_libero10_full_finetune.py \
  work_dirs/pi05_paligemma_libero10_full_finetune

# 评估
bash scripts/eval.sh \
  configs/pi05/pi05_paligemma_libero10_full_finetune.py \
  work_dirs/pi05_paligemma_libero10_full_finetune/checkpoint_step_10000.pt