FluxVLA Engine Documentation#

Welcome to FluxVLA!

FluxVLA Engine is a full-stack, end-to-end engineering platform for deploying embodied intelligence applications. Built on the core design principles of unified configuration, standardized interfaces, module decoupling, and deployability, it creates a complete engineering loop from data to real-device deployment. With the goal of providing a standardized industry–academia–research foundation, it significantly lowers the engineering barrier for VLA research and development.

Core Modules

All-in-One Configuration Single configuration surface — switch between training, evaluation, and deployment in one click.

Modular Training Composable, block-style VLA assembly — scale out with FSDP / DDP multi-node multi-GPU

Flash Fused Kernels Fused GPU kernels and graph-friendly inference paths for high-throughput, low-latency robot foundation models

Guided Trajectory Smoothing Constrains the denoising process with known action prefixes, reducing inconsistencies between adjacent action chunks for smoother real-time control

Demo

1-Hour Towel Folding

The towel-folding task demonstrates four stages — retrieving the towel, flattening a crumpled towel, folding it correctly, and placing it — captured in a 15× speed recording of one continuous hour with zero failures.

Classroom

Architecture Overview

Deep dive into FluxVLA's layered design and execution pipeline — how models, data, and engines work together.

Config Deep-Dive

Master the four config modules — model, data, training, and inference — to flexibly compose experiments.

Modular building blocks for custom VLA model assembly

Add Custom Models

Step-by-step guide to register and integrate your own VLA model into the FluxVLA framework.

Inference Deployment

End-to-end real-robot deployment for Aloha, Tron2, and UR3 — from model export to on-device inference.

Project Overview

VLA Models

OpenVLA

LlavaVLA

GR00T

Pi0

Pi0.5

▸

Backbones

LLaMA / Gemma / Qwen

DinoSigLIP

PaliGemma

QwenVL

▸

Data

Parquet

RLDS

Multi-Dataset Mix

▸

Training

FSDP / DDP

LoRA

AMP

Checkpoint Resume

Auto Post-Eval

▸

Eval & Deploy

Multi-GPU Eval

LIBERO Benchmark

Real-Robot Inference

RTC Guidance

Highlights

FluxVLA is unique with:

One modular VLA spine: all models inherit from BaseVLA—vision encoder, language encoder, projection into the LLM space, and an action head—so you can swap OpenVLA, LlavaVLA, GR00T, Pi0, and Pi0.5 without rewriting the training story.

Backbone breadth

LLMs: LLaMA, Gemma, and Qwen families.
Vision: DinoSigLIP (DINO + SigLIP).
VLMs: PaliGemma and QwenVL.

Dataset breadth: first-class Parquet and RLDS pipelines plus multi-dataset mixed training for heterogeneous data.

FluxVLA is complete at scale with:

Distributed training: FSDP and DDP for large-scale runs.

Practical training stack: LoRA, AMP, checkpoint resumption, and automatic post-training evaluation.

From benchmarks to real robots: multi-GPU evaluation, LIBERO (including setups without ray tracing, e.g. A100), real-robot inference scripts, and an inference mode that skips loading full pretrained weights to save memory.

FluxVLA is flexible and easy to use with:

Clear project layout: fluxvla/ holds models (VLAs, backbones, heads, projectors), datasets, transforms, tokenizers, engines, optimizers, and collators; configs/ is organized by family (openvla, llava, gr00t, pi0, pi05); scripts/ wires train, eval, and real-robot inference.

End-to-end data and training flow: load Parquet or RLDS → transform and collate → forward → action loss → backprop, with pluggable runners (FSDP/DDP) and standard optimizers and logging/checkpointing.

Proven tooling: PyTorch 2.6, Hugging Face Transformers 4.53.x, Flash Attention 2.5.x, TensorFlow for RLDS, and LIBERO—suited to manipulation, multi-task learning, transfer learning, and VLA research iteration.

Forward-looking roadmap: more vision/VLM backbones and VLA methods, VLM or chain-of-thought data training, Isaac Sim integration, and richer logging.

Quick Start

Tutorials

Examples

Courses

API Reference

🦞 OpenClaw