Model Interfaces#

This page consolidates the FluxVLA model-related interfaces, focusing on how to use existing model components within the configuration.

Purpose#

Model interfaces are used to define and assemble the core submodules of VLA:

Top-level VLA (VLAS)
VLM Backbone (VLM_BACKBONES)
Vision Backbone (VISION_BACKBONES)
LLM Backbone (LLM_BACKBONES)
Projector (PROJECTORS)
VLA Head (HEADS)

Core Parameters (Configuration Level)#

Common fields (located under model or inference_model):

Field	Description
`type`	Top-level VLA type name
`pretrained_name_or_path`	Pretrained weight path or model name
`vlm_backbone`	VLM configuration (including `type`, `vlm_path`, etc.)
`vla_head`	Action head configuration (e.g., `state_dim`, `action_dim`, `ori_action_dim`)
`name_mapping`	Pretrained weight key mapping
`freeze_vlm_backbone` / `freeze_projector`	Freezing strategy

Minimal Example#

model = dict(
    type='LlavaVLA',
    pretrained_name_or_path='./checkpoints/GR00T-N1.5-3B',
    vlm_backbone=dict(
        type='EagleBackbone',
        vlm_path='fluxvla/models/third_party_models/eagle2_hg_model'),
    vla_head=dict(
        type='FlowMatchingHead',
        state_dim=64,
        action_dim=32,
        ori_action_dim=14),
    freeze_vlm_backbone=False,
    freeze_projector=False)

Interface Organization#

As described in the tutorials, model construction follows the Registry + type reference pattern:

Components are registered via decorators.
Configuration specifies components through the type field.
The Runner constructs models from the configuration during training/inference.