Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Video-based human pose estimation has vast applications such as action recognition, sports analytics, and crime detection. However, this task is challenging as it involves interpreting both spatial context and temporal dynamics to accurately localize human anatomical keypoints in video sequences. Current approaches, often based on attention mechanisms, perform well but struggle in challenging scenarios like rapid motion and pose occlusion. We attribute these failures to two fundamental limitations: spatial uniformity, where models indiscriminately assign attention to both joint-relevant features and background clutter, thereby introducing spatial noise; and temporal rigidity, an inability to adapt to large joint displacements, resulting in severe feature misalignment during rapid motion. To overcome these challenges, we introduce PSTPose, a novel progressive spatiotemporal refinement framework. Specifically, to address the spatial uniformity problem, we propose a Discriminative Feature Enhancement (DFE) module that emphasizes joint-relevant features and a Feature Cluster Grouping (FCG) module that forms compact, semantically meaningful regions. For the temporal rigidity problem, we introduce a Deformable Spatiotemporal Fusion (DSF) module that adaptively aligns features across consecutive frames via deformation-aware sampling. This design ensures robust keypoint localization, particularly in cluttered and dynamic scenes. Extensive experiments on four large-scale benchmarks, PoseTrack2017, PoseTrack2018, PoseTrack21, and Sub-JHMDB, demonstrate that PSTPose establishes a new state-of-the-art. The implementation is anonymously released and available in the supplementary material.

Downloads

Paper

Next from AAAI 2026

MoBGS: Motion Deblurring Dynamic 3D Gaussian Splatting for Blurry Monocular Video
poster

MoBGS: Motion Deblurring Dynamic 3D Gaussian Splatting for Blurry Monocular Video

AAAI 2026

+3
Quan Bui and 5 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved