Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

As large language models (LLMs) are increasingly deployed in high-stakes domains such as education, healthcare, and law, accurately evaluating their nuanced reasoning process becomes essential to ensure their safety, reliability, and trustworthiness. However, most existing benchmarks evaluate LLMs at a coarse granularity. They emphasize end results and neglect complex reasoning steps, which leads to masking latent deficits, producing misleading high scores, and ultimately limiting accurate assessment of model suitability in complex real-world scenarios. To address these limitations, we introduce \textit{CogProbe}, a diagnostic benchmark that decomposes complex reasoning processes into orthogonal cognitive operations, featuring multilingual datasets \textit{CogEval} and cognitively informed metrics for fine-grained evaluation of LLM cognitive capabilities. Drawing from cognitive psychology, we design a comprehensive taxonomy of model capabilities, comprising 5 macro-cognitive capabilities and 17 corresponding micro-cognitive operations, which facilitates precise identification of latent weaknesses and provides detailed assessments of model capabilities, supporting informed deployment of LLMs in real-world scenarios. Experimental results demonstrate that our method can effectively assess implicit cognitive capabilities. They further reveal that, despite achieving high scores on traditional benchmarks, current LLMs exhibit significant cognitive deficits, particularly in metacognitive capability. Merely training models on coarse-grained datasets does not effectively enhance their underlying cognitive capabilities.

Downloads

Paper

Next from AAAI 2026

BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Models
poster

BuildingWorld: A Structured 3D Building Dataset for Urban Foundation Models

AAAI 2026

Shangfeng Huang and 2 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved