Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background

CogSci 2024

July 25, 2024

Rotterdam, Netherlands

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Humans are remarkably adept at inferring the causes of events in our environment; doing so often requires incorporating information from multiple sensory modalities. For instance, if a car slows down in front of us, inferences about why they did so are rapidly revised if we also hear sirens in the distance. Here, we investigate the ability to reconstruct others' actions and events from the past by integrating multimodal information. Participants were asked to infer which of two agents performed an action in a household setting given either visual evidence, auditory evidence, or both. We evaluate our task on humans, a large language model (GPT-4), and a large multimodal model (GPT-4V). We find that humans are relatively accurate overall and perform best when given multimodal evidence, seeming to put more emphasis on visual evidence than on auditory evidence. GPT-4's overall accuracy closely matches that of humans in all modalities, but is only weakly correlated with human accuracy across trials, suggesting different reasoning mechanisms. Meanwhile, GPT-4V has lower accuracy and exhibits no evidence of incorporating multimodal information. People's ability to reconstruct the behavior of others relies on successfully integrating evidence across different senses. Such multimodal reasoning presents an intriguing challenge for multimodal AI systems.

Authors:

Sarah A Wu: Stanford University; Erik Brockbank: Stanford University; Hannah Cha: Stanford University; Jan-Philipp Fränken: University of Edinburgh; Emily Jin: Stanford University; Zhuoyi Huang: Stanford; Weiyu Liu: Stanford University; Ruohan Zhang: Stanford University; Jiajun Wu: Stanford University; Tobias Gerstenberg: Stanford University

Downloads

Paper
access premium content

Next from CogSci 2024

Understanding Time in Children's Mind: Development of Mental Timelines on Three-dimensional Axes
poster

Understanding Time in Children's Mind: Development of Mental Timelines on Three-dimensional Axes

CogSci 2024

Jiayu Jiang

25 July 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved