Singapore

Despite being successful in board games and reinforcement learning (RL),
Monte-Carlo Tree Search (MCTS) combined with Multi-Armed Bandit (MAB)
has seen limited success in domain-independent classical planning until recently.
Previous work [Wissow and Asai, 2024] showed that UCB1, designed for bounded rewards, does not perform well
as applied to cost-to-go estimates in classical planning, because cost-to-go estimates are unbounded,
and showed improved performance using a Gaussian reward MAB instead.
This paper further sharpens our understanding of ideal bandits for planning tasks.
Existing work has two issues:
first, Gaussian MABs under-specify the support of cost-to-go estimates as $(-\infty,\infty)$,
which we can narrow down.
Second, Full-Bellman backup [Schulte and Keller, 2014]
that backpropagates sample max/min lacks theoretical justification.
We use \emph{Peaks-Over-Threashold Extreme Value Theory} to resolve both issues at once,
propose a new bandit algorithm (UCB1-Uniform).
We formally prove its regret bound and
empirically demonstrate its performance in classical planning.

AAAI 2026

Extreme Value Monte Carlo Tree Search for Classical Planning

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The identification of unique traits and behavior is
essential to providing personalized intervention in
individuals with Autism Spectrum Disorder. However, the
limited personalized quantitative data with experts'
annotations in autism research pose a fundamental challenge
to train AI models for unique behavioral patten discovery.
Multiple Instance Learning (MIL) has demonstrated promising
results in medical domains, where annotations are only
needed at the group (i.e., bag) level instead of individual
data instances. It provides a cost-effective way to train
statistical models with limited labeled data. Additionally,
the rise of pretrained models have shown great success in
improving the performance in few-shot learning scenarios.
In this work, we propose a novel framework that integrates
a transformer encoder pre-trained on large-scale
spatiotemporal data with Multiple Instance Learning (MIL),
for unique behavioral pattern detection from autistic
individuals. Our results demonstrated the discrimination of
individual-level autistic behavioral differences and the
accurate classification of behaviors across distinct
groups: typically developing (TD) and autistic (ASD). These
results show promising progress towards tools that can be
used for personalized intervention for autistic
individuals, and more interpretable AI diagnostics.

Spatiotemporal Transformers with Multiple Instance Learning for Label-Efficient Behavioral Analysis in Autism (Student Abstract)

Regional disaster resilience quantifies the changing nature of physical risks to inform policy instruments ranging from local immediate recovery to international sustainable development. While many existing state-of-practice methods have greatly advanced the dynamic mapping of exposure and hazard, our understanding of large-scale physical vulnerability has remained static, costly, limited, region-specific, coarse-grained, overly aggregated, and inadequately calibrated. With the significant growth in the availability of time-series satellite imagery and derived products for exposure and hazard, we focus our work on the equally important yet challenging element of the risk equation: physical vulnerability. Given this unique problem, we leverage machine learning methods that flexibly capture spatial contextual relationships, limited temporal observations, and uncertainty in a unified probabilistic spatiotemporal inference framework. We therefore introduce Graph Variational State-Space Model ($\textbf{GraphVSSM}$), a novel modular spatiotemporal approach that uniquely integrates graph deep learning, state-space modeling, and variational inference using time-series data and prior expert belief systems in a weakly supervised or coarse-to-fine-grained manner. We present three major results: a city-wide demonstration in Quezon City, Philippines; an investigation of sudden changes in the cyclone-impacted coastal Khurushkul community (Bangladesh) and the mudslide-affected Freetown (Sierra Leone); and an open geospatial dataset, $\textbf{METEOR 2.5D}$, that spatiotemporally enhances the existing global static dataset for 46 UN-recognized Least Developed Countries (as of 2020). Beyond advancing the practice of regional disaster resilience assessment and improving our understanding of global progress in disaster risk reduction, our method also offers a probabilistic deep learning approach, contributing to broader urban studies that require compositional data analysis in weakly supervised settings.

GraphVSSM: Graph Variational State-Space Model for Probabilistic Spatiotemporal Inference of Dynamic Exposure and Vulnerability for Regional Disaster Resilience Assessment

Recent breakthroughs in Large Language Models (LLMs) have positioned them as a promising paradigm for agents, with long-term planning and decision-making emerging as core general-purpose capabilities for adapting to diverse scenarios and tasks. Real-time strategy (RTS) games serve as an ideal testbed for evaluating these two capabilities, as their inherent gameplay requires both macro-level strategic planning and micro-level tactical adaptation and action execution. Existing RTS game-based environments either suffer from relatively high computational demands or lack support for textual observations, which has constrained the use of RTS games for LLM evaluation. Motivated by this, we present TowerMind, a novel environment grounded in the tower defense (TD) subgenre of RTS games. TowerMind preserves the key evaluation strengths of RTS games for assessing LLMs, while featuring low computational demands and a multimodal observation space, including pixel-based, textual, and structured game-state representations. In addition, TowerMind supports the evaluation of model hallucination and provides a high degree of customizability. We design five benchmark levels to evaluate several widely used LLMs under different multimodal input settings. The results reveal a clear performance gap between LLMs and human experts across both capability and hallucination dimensions. The experiments further highlight key limitations in LLM behavior, such as inadequate planning validation, a lack of multifinality in decision-making, and inefficient action use. We also evaluate two classic reinforcement learning algorithms: Ape-X DQN and PPO. By offering a lightweight and multimodal design, TowerMind complements the existing RTS game-based environment landscape and introduces a new benchmark for the AI agent field.

TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents

Label smoothing is a widely studied regularization technique in machine learning. However, its potential for node classification in graph-structured data, spanning homophilic to heterophilic graphs, remains largely unexplored. We introduce posterior label smoothing, a novel method for transductive node classification that derives soft labels from a posterior distribution conditioned on neighborhood labels. 
The likelihood and prior distributions are estimated from the global statistics of the graph structure, allowing our approach to adapt naturally to various graph properties. We evaluate our method on 10 benchmark datasets using eight baseline models, demonstrating consistent improvements in classification accuracy. The following analysis demonstrates that soft labels mitigate overfitting during training, leading to better generalization performance, and that pseudo-labeling effectively refines the global label statistics of the graph. Our code is available at https://anonymous.4open.science/r/PosteL.

Posterior Label Smoothing for Node Classification

Existing benchmarks in e-commerce primarily focus on basic user intents, such as finding or purchasing products. However, real-world users often pursue more complex goals, such as applying vouchers, managing budgets, and finding multi-products seller. To bridge this gap, we propose ShoppingBench, a novel end-to-end shopping benchmark designed to encompass increasingly challenging levels of grounded intent. Specifically, we propose a scalable framework to simulate user instructions based on various intents derived from sampled real-world products. To facilitate consistent and reliable evaluations, we provide a large-scale shopping sandbox that serves as an interactive simulated environment, incorporating over 2.5 million real-world products. Experimental results demonstrate that even state-of-the-art language agents (such as GPT-4.1) achieve absolute success rates under 50\% on our benchmark tasks, highlighting the significant challenges posed by our ShoppingBench. In addition, we propose a trajectory distillation strategy and leverage supervised fine-tuning, along with reinforcement learning on synthetic trajectories, to distill the capabilities of a large language agent into a smaller one. As a result, our trained agent achieves competitive performance compared to GPT-4.1.

ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents

Long-horizon planning is crucial in complex environments, but diffusion-based planners like Diffuser are limited by the trajectory lengths observed during training. This creates a dilemma: long trajectories are needed for effective planning, yet they degrade model performance. In this paper, we introduce this extendable long-horizon planning challenge and propose a two-phase solution. First, Progressive Trajectory Extension incrementally constructs longer trajectories through multi-round compositional stitching. Second, the Hierarchical Multiscale Diffuser enables efficient training and inference over long horizons by reasoning across temporal scales. To avoid the need for multiple separate models, we propose Adaptive Plan Pondering and the Recursive HM-Diffuser, which unify hierarchical planning within a single model. Experiments show our approach yields strong performance gains, advancing scalable and efficient decision-making over long-horizons.

Extendable Planning via Multiscale Diffusion

Task-Oriented Grasping (TOG) presents a significant challenge, requiring a nuanced understanding of task semantics, object affordances, and the functional constraints dictating how an object should be grasped for a specific task. To address these challenges, we introduce GRIM (Grasp Re-alignment via Iterative Matching), a novel training-free framework for task-oriented grasping. Initially, a coarse alignment strategy is developed using a combination of geometric cues and principal component analysis (PCA)-reduced DINO features for similarity scoring. Subsequently, the full grasp pose associated with the retrieved memory instance is transferred to the aligned scene object and further refined against a set of task-agnostic, geometrically stable grasps generated for the scene object, prioritizing task compatibility. In contrast to existing learning-based methods, GRIM demonstrates strong generalization capabilities, achieving robust performance with only a small number of conditioning examples.

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Tabular data synthesis presents unique challenges, with Transformer models remaining underexplored despite the applications of Variational Autoencoders and Generative Adversarial Networks. To address this gap, we propose the Transformer-based Tabular Variational AutoEncoder (TTVAE), leveraging the attention mechanism for capturing complex data distributions. The inclusion of the attention mechanism enables our model to understand complex relationships among heterogeneous features, a task often difficult for traditional methods. TTVAE facilitates the integration of interpolation within the latent space during the data generation process. Specifically, TTVAE is trained once, establishing a low-dimensional representation of real data, and then various latent interpolation methods can efficiently generate synthetic latent points. Through extensive experiments on diverse datasets, TTVAE consistently achieves state-of-the-art performance, highlighting its adaptability across different feature types and data sizes. This innovative approach, empowered by the attention mechanism and the integration of interpolation, addresses the complex challenges of tabular data synthesis, establishing TTVAE as a powerful solution.

TTVAE: Transformer-based generative modeling for tabular data generation

The pursuit of high-efficiency Artificial General Intelligence (AGI) requires more than brute-force scaling of model size and data. While scaling remains a key driver of capability, equally important are scalable architectural and principles—designs that continue to work, improve, and remain controllable as we vary model scale, domains, and modalities. Central to our approach is the concept of the “Specialized Generalist” – a pathway that achieves deep expertise across multiple domains without sacrificing broad generalization capabilities. In this talk, we introduce the “Specialized Generalist” paradigm and our implementation of it, SAGE (Synergistic Architecture for Generalized Expertise), a three-layer architecture designed to balance specialization and generalization in a systematic way. We will describe how SAGE’s Base Model, Synergy Fusion, and Exploration-Evolution layers interact in practice, focusing on concrete mechanisms for coordinating domain-specific expertise with broad general reasoning. We will share empirical results and recent advances in large reasoning models, embodied AI, and scientific applications to further illustrate the approach. A central motivation is to support “AGI for Science” by building a stable plateau of capabilities that can reliably assist with complex scientific workflows rather than isolated demos. Finally, we will outline the safety and governance questions that arise when deploying Specialized Generalist systems in high-impact settings, and discuss what we have learned so far about monitoring, alignment, and operational safeguards.

Quest of AI towards Specializable Generalist: From Reasoning to Scientific Discovery

Physicists aim to explain the Universe in terms of a compact, interpretable set of principles. Deducing those principles from experiments poses many challenging and problems which are ripe for application of AI and present opportunities to develop new AI techniques. I will describe how AI has changed the way particle physicists work and speculate about the role of AI in the future of fundamental physics. Finally, I will describe my experience in science communication, as an author, podcaster and television producer.

Downloads

Next from AAAI 2026

Spatiotemporal Transformers with Multiple Instance Learning for Label-Efficient Behavioral Analysis in Autism (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES