Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
The powerful generalization capabilities of Vision-Language-Action (VLA) models are bottlenecked by their heavy reliance on massive, redundant, and unevenly valued datasets, hindering their widespread application. Existing model-centric optimization paths, whether through model compression (which often leads to performance degradation) or policy distillation (where the output is model-dependent and lacks versatility), have failed to fundamentally address this data-level challenge. To this end, we propose a fundamentally different, data-centric generative data distillation framework, FT-NCFM. Our framework employs a self-contained FT (Fact-Tracing) engine that combines causal attribution with programmatic contrastive verification to assess the intrinsic value of samples. This evaluation guides an NCFM adversarial game to synthesize a model-agnostic, information-dense, reusable data asset. Experimental results on several mainstream VLA benchmarks show that using just 5% of our distilled coreset, we can train models that achieve over 90% of the task success rate of models trained on the full dataset, while reducing training time by 75.4%. Our work demonstrates that intelligent data distillation is a highly promising new path for building efficient, high-performance VLA models. Code and datasets will be made available upon acceptance.
