Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

AAAI 2025

February 27, 2025

Philadelphia, United States

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

motion tracking

cv

Human mesh recovery (HMR) is crucial in many computer vision applications; from health to entertainment, among others. HMR from monocular images has predominantly been addressed by deterministic methods that output a single prediction for a given $2D$ image. However, HMR from a single image is an ill-posed problem due to depth ambiguity and occlusions. Probabilistic methods have attempted to address this by generating and fusing multiple plausible $3D$ reconstructions, but their performance has often lagged behind deterministic approaches. In this paper, we introduce $\textbf{GenHMR}$, a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in the $2D \rightarrow 3D$ mapping process. GenHMR comprises two key components: (1) $\textbf{a pose tokenizer}$ to convert $3D$ human poses into a sequence of discrete tokens in a latent space, and (2) $\textbf{an image-conditional masked transformer}$ to learn the probabilistic distributions of the pose tokens, conditioned on the input image prompt along with the randomly masked token sequence. During $\textit{inference}$, the model samples from the learned conditional distribution to iteratively decode high-confidence pose tokens, thereby reducing $3D$ reconstruction uncertainties. To further refine the reconstruction, a $2D$ pose-guided refinement technique is proposed to directly fine-tune the decoded pose tokens in the latent space, which forces the projected $3D$ body mesh to align with the $2D$ pose clues. Experiments on benchmark datasets demonstrate that GenHMR significantly outperforms state-of-the-art methods. The project website can be found at \url{https://anonymous-ai-model.github.io/GenHMR/}

Next from AAAI 2025

S-INF: Towards Realistic Indoor Scene Synthesis via Scene Implicit Neural Field
poster

S-INF: Towards Realistic Indoor Scene Synthesis via Scene Implicit Neural Field

AAAI 2025

+3
Lixin Duan and 5 other authors

27 February 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved