AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

3D human motion generation has seen a substantial rise in interest over the recent years, and while considerable progress has been made performance wise, many of the approaches in the state-of-the-art still struggle with complex and detailed generations unseen in the original data. This is commonly attributed to the scarcity of available motion datasets, and the prohibitive cost for generating more training examples. Motivated by this set of challenges, we introduce CoMA, A multimodal framework designed for complex human motion generation, editing and comprehension. CoMA employs multiple independent agents, powered by large language and vision models, as well as a mask transformer-based motion generator with body part specific encoders and codebooks for fine-grained, detailed generations. This recipe allows for generation of short and long motion sequences with detailed instructions, editing generations with user provided text instructions and also self-correcting output sequences for even better motions. We evaluate our method with the two most popular benchmark human motion datasets, using novel splits that separate them into basic and complex actions, and subsequently compare CoMA's performance with state-of-the-art methods.

Downloads

Paper

Next from AAAI 2026

Lifelong Domain Adaptive 3D Human Pose Estimation
poster

Lifelong Domain Adaptive 3D Human Pose Estimation

AAAI 2026

+1Chen Chen
Chen Chen and 3 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved