Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Recent studies have explored the capabilities of large language models (LLMs) in solving knowledge-intensive mathematical reasoning problems. However, existing benchmarks predominantly involve static theorems that LLMs have encountered during pretraining, making it difficult to assess whether these models can incorporate new or evolving knowledge into their reasoning processes. In this work, we introduce TaxReasoning, a novel benchmark designed to evaluate LLMs’ abilities in real-world tax calculation scenarios. These tasks require not only mathematical reasoning and numerical computation, but also the extraction and application of complex, frequently updated tax regulations. Through extensive experiments with state-of-the-art LLMs using diverse prompting strategies and knowledge augmentation techniques, we uncover substantial limitations in their ability to handle dynamic, knowledge-intensive questions—primarily due to missing domain-specific knowledge and ineffective retrieval. Even the best-performing models fall significantly short of human-level performance. Our analysis points to key avenues for improvement, including enhancing LLMs’ reasoning capabilities, developing more effective knowledge summarization techniques, and improving retrieval strategies. TaxReasoning offers a challenging new testbed for advancing LLMs toward more reliable reasoning in real-world, evolving, and knowledge-intensive domains.

Downloads

Paper

Next from AAAI 2026

Diffusion-based Personalized Pathology Disentanglement for Impaired Gait Analysis
poster

Diffusion-based Personalized Pathology Disentanglement for Impaired Gait Analysis

AAAI 2026

Xiaoyue Wan and 1 other author

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved