Model Interfaces#

This page consolidates the FluxVLA model-related interfaces, focusing on how to use existing model components within the configuration.

Purpose#

Model interfaces are used to define and assemble the core submodules of VLA:

  • Top-level VLA (VLAS)

  • VLM Backbone (VLM_BACKBONES)

  • Vision Backbone (VISION_BACKBONES)

  • LLM Backbone (LLM_BACKBONES)

  • Projector (PROJECTORS)

  • VLA Head (HEADS)

Core Parameters (Configuration Level)#

Common fields (located under model or inference_model):

Field

Description

type

Top-level VLA type name

pretrained_name_or_path

Pretrained weight path or model name

vlm_backbone

VLM configuration (including type, vlm_path, etc.)

vla_head

Action head configuration (e.g., state_dim, action_dim, ori_action_dim)

name_mapping

Pretrained weight key mapping

freeze_vlm_backbone / freeze_projector

Freezing strategy

Minimal Example#

model = dict(
    type='LlavaVLA',
    pretrained_name_or_path='./checkpoints/GR00T-N1.5-3B',
    vlm_backbone=dict(
        type='EagleBackbone',
        vlm_path='fluxvla/models/third_party_models/eagle2_hg_model'),
    vla_head=dict(
        type='FlowMatchingHead',
        state_dim=64,
        action_dim=32,
        ori_action_dim=14),
    freeze_vlm_backbone=False,
    freeze_projector=False)

Interface Organization#

As described in the tutorials, model construction follows the Registry + type reference pattern:

  1. Components are registered via decorators.

  2. Configuration specifies components through the type field.

  3. The Runner constructs models from the configuration during training/inference.