Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Understanding motion is crucial for visual object tracking in complex and dynamic motion scenarios. However, existing methods often rely on simple template updates or temporal feature propagation, neglecting the effective mining and utilization of motion information. To address this issue, we propose a motion-aware spatio-temporal framework that achieves motion perception by explicitly matching motion patterns and modeling motion relationships between frames. Specifically, our method introduces a motion pattern dictionary that encodes diverse and representative motion patterns as learnable features, enabling effective motion modeling. During tracking, features from the search region retrieve the most relevant motion patterns from the dictionary to capture current motion dynamics. The decoder then integrates temporal motion correlations for enhanced motion awareness. Additionally, we incorporate geometric cues into the search region features to enhance spatial perception, mitigate occlusion-induced ambiguity, and improve foreground-background separation. Extensive experiments on seven challenging benchmarks show that our approach consistently outperforms existing methods, confirming the effectiveness of motion pattern modeling and geometry-guided enhancement in alleviating tracking drift. Our MoDTrack achieves a 1.2\% higher AUC score on the LaSOT benchmark compared to the latest state-of-the-art methods, further validating the superiority of our approach.