Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Multi-person eyeblink detection in untrimmed in-the-wild videos is an emerging and challenging task. Due to its significant spatio-temporal fine-grained characteristics compared to general actions, we empirically find that general action detectors, though effective in broader domains, struggle with this task (i.e.,Blink-AP$ < $2\%). Specialized eyeblink methods alleviate it through fine-grained spatio-temporal operations. SOTA method proposes a unified model combining instance-aware face localization and eyeblink detection through joint multi-task learning and feature sharing. While effectiveness, it exhibits two critical limitations that may contribute to its unsatisfactory performance (i.e.,Blink-AP$=$10.11\%): (1) Face localization and eyeblink detection require distinct spatio-temporal feature granularities, making joint modeling in a unified feature space suboptimal. (2) Eyeblink task training could be largely affected by unstable face-eye feature learning under the joint training paradigm. To address this, we propose DeFB, a decomposed feature learning paradigm with favorable effectiveness and efficiency: (1) We design to model face and eye in feature spaces of different granularities, which greatly enhances fine-grained perception while reducing computational costs compared with unified feature space; (2) To address the instability in face-eye feature learning, an asynchronous learning mechanism for the face and eye feature spaces is adopted, with eye feature learning serving as a refinement process based on well-trained coarse face features, which also maintains efficient feature sharing as in the existing unified model. Compared with SOTA method, DeFB doubles the performance (Blink-AP: 24.65\% v.s. 10.11\%) while boosting efficiency by nearly 35\%. DeFB can also be integrated as a plugin to substantially augment the eyeblink detection capabilities of general action detectors. Code will be released to facilitate relevant fields.