AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Despite being successful in board games and reinforcement learning (RL), Monte-Carlo Tree Search (MCTS) combined with Multi-Armed Bandit (MAB) has seen limited success in domain-independent classical planning until recently. Previous work Wissow and Asai, 2024 showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, because cost-to-go estimates are unbounded, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal bandits for planning tasks. Existing work has two issues: first, Gaussian MABs under-specify the support of cost-to-go estimates as $(-\infty,\infty)$, which we can narrow down. Second, Full-Bellman backup Schulte and Keller, 2014 that backpropagates sample max/min lacks theoretical justification. We use \emph{Peaks-Over-Threashold Extreme Value Theory} to resolve both issues at once, propose a new bandit algorithm (UCB1-Uniform). We formally prove its regret bound and empirically demonstrate its performance in classical planning.

Downloads

SlidesPaperTranscript English (automatic)

Next from AAAI 2026

Spatiotemporal Transformers with Multiple Instance Learning for Label-Efficient Behavioral Analysis in Autism (Student Abstract)
technical paper

Spatiotemporal Transformers with Multiple Instance Learning for Label-Efficient Behavioral Analysis in Autism (Student Abstract)

AAAI 2026

Emily Yu

23 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved