Training Configuration#
Training configuration defines the optimization strategy, distributed training settings, and logging during the training process. It is configured via the runner dict. The core fields include:
type: Training runner type, e.g.,
FSDPTrainRunnerBasic training parameters:
max_epochs,learning_rate,weight_decay,max_grad_normcollator: Data collator that defines how samples are assembled into batches
tokenizer: Tokenizer configuration
metric: Metric logger configuration
Learning rate scheduling:
lr_scheduler_type,warmup_ratioMemory optimization:
enable_gradient_checkpointing,enable_mixed_precision_training,mixed_precision_dtype,sharding_strategy
Below is a complete training configuration example:
runner = dict(
type='FSDPTrainRunner',
max_epochs=24,
learning_rate=5e-5,
weight_decay=0.0,
max_grad_norm=1.0,
collator=dict(
type='DictCollator',
keys=[
'states',
'observation.eepose',
'timestamp',
'images',
'img_masks',
'lang_tokens',
'lang_masks',
'actions',
'action_masks'
],
meta_keys=[
'task_description',
'prompt',
'info',
'stats'
]),
sampler=None,
tokenizer=dict(
type='PaligemmaTokenizer'
),
metric=dict(
type='VLAMetric',
active_trackers=('jsonl', 'wandb'),
run_dir='work_dirs',
wandb_project='limvla',
wandb_entity='limx',
grad_accumulation_steps=1,
window_size=1),
lr_scheduler_type='linear-warmup+cosine-decay',
warmup_ratio=0.03,
enable_gradient_checkpointing=True,
enable_mixed_precision_training=True,
mixed_precision_dtype='bf16',
sharding_strategy='full-shard',
change_key_name=False)