Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Gait recognition has emerged as a promising biometric technique for long-distance and non-intrusive human identification. While Transformers have revolutionized vision tasks, their adaptation to gait recognition remains underexplored due to domain-specific challenges such as sparse silhouette modality, spatial-temporal dynamics, fine-grained motion cues, and limited training data. In this paper, we propose Gait Transformer (GaT), an end-to-end Transformer backbone specifically tailored for silhouette-based gait recognition. GaT introduces three key components: (1) a hybrid patch embedding module that combines convolutional stems with group-batch normalization to enhance structural preservation; (2) a decomposed token mixer that explicitly models both short-range and long-range dependencies across spatial-temporal dimensions; and (3) a hybrid positional encoding strategy that integrates absolute, relative, and rotary embeddings to support efficient training under data scarcity. Without relying on any pretraining, GaT achieves state-of-the-art performance on Gait3D, GREW, and CCGR-MINI.