Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Event cameras provide microsecond latency and high dynamic range, making them ideal for 3D perception tasks in traffic scenes with challenging lighting conditions. Yet existing methods often struggle to generalize to out-of-domain environments due to the limited availability of diverse training data. While synthetic data offers an easily accessible alternative, it introduces a significant sim-to-real gap, particularly in motion patterns. We tackle this challenge by introducing Motion-Adaptation Mamba (MA-Mamba), a dual-track framework that advances both architecture and data augmentation. At the architectural level, we introduce a lightweight Spatio-Temporal Association module that captures motion-induced appearance variations at arbitrary scales, and an Adaptive Memory Balancing module, built on the Mamba state-space framework, that adaptively filters memory updates to maintain stable scene context under diverse dynamics. At the data level, we design event-oriented augmentations that simulate varied motion patterns and apply priority-based masked sequence modeling to strengthen long-range spatio-temporal reasoning. Trained solely on synthetic data, MA-Mamba delivers substantial zero-shot gains on multiple real-world benchmarks, demonstrating strong robustness and generalizability.
