Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
keywords:
theory of mind
computer science
artificial intelligence
reasoning
We introduce the INtuitive Theory Use and Inference Test (INTUIT), a cognitive test battery targeting common-sense physical and social reasoning. INTUIT adapts classic story-based question-and-answer methods for AI evaluation using VIGNET --- a novel tool that addresses some limitations of existing test batteries through procedurally generated vignettes. We evaluated INTUIT on three GPT models (GPT-4o, GPT-4o-mini, GPT-4.1-mini), one reasoning model (o3-mini), and a human sample (N = 147). Humans generally outperformed models, especially on object function and agent intention inference types. These results highlight INTUIT’s sensitivity to intuitive reasoning capabilities and VIGNET's broader application for the evaluation of cognitive capabilities in humans and AI.