Model Interfaces#
This page consolidates the FluxVLA model-related interfaces, focusing on how to use existing model components within the configuration.
Purpose#
Model interfaces are used to define and assemble the core submodules of VLA:
Top-level VLA (
VLAS)VLM Backbone (
VLM_BACKBONES)Vision Backbone (
VISION_BACKBONES)LLM Backbone (
LLM_BACKBONES)Projector (
PROJECTORS)VLA Head (
HEADS)
Core Parameters (Configuration Level)#
Common fields (located under model or inference_model):
Field |
Description |
|---|---|
|
Top-level VLA type name |
|
Pretrained weight path or model name |
|
VLM configuration (including |
|
Action head configuration (e.g., |
|
Pretrained weight key mapping |
|
Freezing strategy |
Minimal Example#
model = dict(
type='LlavaVLA',
pretrained_name_or_path='./checkpoints/GR00T-N1.5-3B',
vlm_backbone=dict(
type='EagleBackbone',
vlm_path='fluxvla/models/third_party_models/eagle2_hg_model'),
vla_head=dict(
type='FlowMatchingHead',
state_dim=64,
action_dim=32,
ori_action_dim=14),
freeze_vlm_backbone=False,
freeze_projector=False)
Interface Organization#
As described in the tutorials, model construction follows the Registry + type reference pattern:
Components are registered via decorators.
Configuration specifies components through the
typefield.The Runner constructs models from the configuration during training/inference.