
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Although predicting others' behavior is a fundamental capacity of the human mind, building this intuitive psychology into machines has remained a challenge. To advance this aim, we introduce the Animate Agent World Modeling Benchmark - featuring agents engaged in a diverse repertoire of behaviors, such as goal-directed interactions with objects and multi-agent interactions, all governed by realistic physics. Humans tend to predict the future based on expected events rather than simulating step-by-step. Thus, our benchmark includes a cognitively-inspired evaluation pipeline designed to assess whether the simulated trajectories of world models capture the correct sequences of events. To perform well, models need to leverage predictive cues from the observations in order to accurately simulate the goals of animate agents over long horizons. Although recent developments have incorporated world models into state of the art model-based reinforcement learning (RL) agents, we demonstrate that these models perform poorly in our evaluations. A hierarchical oracle model sets an upper bound for performance, suggesting that in order to excel, a model should scaffold their predictions with abstractions like goals that guide the simulation process towards relevant future events.
Authors:
Logan Matthew Cross: Stanford University; Violet Xiang: Stanford University; Nick Haber: Stanford; Daniel Yamins: Stanford University
