AAAI 2026

January 23, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Motivated by applications in forecasting, we study chronological reasoning in LLMs. We test LLMs’ ability to understand and enforce chronological order in three types of tasks : sorting randomly shuffled historical events; conditional sorting of events defined by some conditions; and anachronism detection based on intersections of multiple timelines. Our experiments use events that we first confirm are known to the LLM; this ensures that we test chronological understanding on an LLM’s pretrained internal knowledge. Across three LLM families— GPT-4.1 (standard), GPT-5 (hybrid-reasoning), and Claude 3.7 Sonnet (large-reasoning, with and without Extended Thinking), we find that performance degrades rapidly with problem complexity but improves greatly for reasoning models with test-time extended reasoning.

Downloads

SlidesPaperTranscript English (automatic)

Next from AAAI 2026

PANDA: Empowering Small Language Models for Proactive Dialogue Through Agent-Based Synthesis (Student Abstract)
technical paper

PANDA: Empowering Small Language Models for Proactive Dialogue Through Agent-Based Synthesis (Student Abstract)

AAAI 2026

Haopeng Li and 2 other authors

23 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved