Singapore

Multi-agent path finding (MAPF) is the challenging problem of finding conflict-free paths with minimal costs for multiple agents. While traditional MAPF solvers are centralized using heuristic search, reinforcement learning (RL) is becoming increasingly popular due to its potential to learn decentralized and generalizing policies. RL-based MAPF must cope with spatial coordination, which is often addressed by combining independent training with ad hoc measures like replanning and communication. Such ad hoc measures often complicate the approach and require knowledge beyond the actual accessible information in RL, such as the full map occupation or broadcast communication channels, which limits generalizability, effectiveness, and sample efficiency. In this paper, we propose Partitioned Attention-based Reverse Curricula for Enhanced Learning (PARCEL), considering a bounding region for each agent. PARCEL trains all agents with overlapping regions jointly via self-attention to avoid potential conflicts. By employing a reverse curriculum, where the bounding regions grow as the policies improve, all agents will eventually merge into a single coordinated group. We evaluate PARCEL in two simple coordination tasks and four MAPF benchmark maps. Compared with state-of-the-art RL-based MAPF methods, PARCEL demonstrates better effectiveness and sample efficiency without ad hoc measures.

AAAI 2026

Spatially Grouped Curriculum Learning for Multi-Agent Path Finding

multi-agent pathfinding

curriculum learning

reinforcement learning

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Rank aggregation is a task of combining the rankings of items from multiple users into a single ranking that best represents the users' rankings. Alabi et al. (AAAI'22) presents differentially-private (DP) polynomial-time approximation schemes (PTASes) and $5$-approximation algorithms with certain additive errors for the Kemeny rank aggregation problem in both central and local models.
In this paper, we present improved DP PTASes with smaller additive error in the central model. Furthermore, we are first to study the footrule rank aggregation problem under DP. We give a near-optimal algorithm for this problem; as a corollary, this leads to 2-approximation algorithms with the same additive error as the $5$-approximation algorithms of Alabi et al. for the Kemeny rank aggregation problem in both central and local models.

Improved Differentially Private Algorithms for Rank Aggregation

Decentralized partially observable Markov decision processes with communication (Dec-POMDP-Com) provide a framework for multiagent decision making under uncertainty, but the NEXP-complete complexity for finite-horizon problems renders solutions intractable in general. While sharing actions and observations can reduce the complexity to PSPACE-complete, we propose an approach that bridges POMDPs and Dec-POMDPs by communicating only suggested joint actions, eliminating the need to share observations while maintaining performance comparable to fully centralized planning and execution. Our algorithm estimates joint beliefs using shared actions to prune infeasible beliefs. Each agent maintains possible belief sets for other agents, pruning them based on suggested actions to form an estimated joint belief usable with any centralized policy. This approach requires solving a POMDP for each agent, reducing computational complexity while preserving performance. We demonstrate its effectiveness on several Dec-POMDP benchmarks, showing performance comparable to centralized methods when shared actions enable effective belief pruning. This action-based communication framework offers a natural avenue for integrating human-agent cooperation, opening new directions for scalable multiagent planning under uncertainty, with applications in both autonomous systems and human-agent teams.

Efficient Multiagent Planning via Shared Action Suggestions

Algorithms for resolving majority cycles in preference aggregation have been studied extensively in computational social choice. Several sophisticated cycle-resolving methods, including Tideman's Ranked Pairs, Schulze's Beat Path, and Heitzig's River, are refinements of the Split Cycle (SC) method that resolves majority cycles by discarding the weakest pairwise majority victories in each cycle. Recently, Holliday and Pacuit proposed a new refinement of Split Cycle, dubbed Stable Voting, and a simplification thereof, called Simple Stable Voting (SSV). They conjectured that SSV is a refinement of SC whenever no two pairwise majority victories are of the same size. In this paper, we prove the conjecture up to 6 alternatives and refute it for more than 6 alternatives. While our proof of the conjecture for up to 5 alternatives uses traditional mathematical reasoning, our 6-alternative proof and 7-alternative counterexample were obtained with the use of SAT solving. The SAT encoding underlying this proof and counterexample is applicable far beyond SC and SSV: it can be used to test properties of any voting method whose choice of winners depends only on the ordering of margins of victory between alternatives by size.

Stable Voting and the Splitting of Cycles

Recent vision-language models (VLMs) show strong reasoning capabilities through training with reinforcement learning from verifiable rewards (RLVR). Despite their impressive capabilities, current VLMs focus on a limited range of reasoning tasks, such as mathematical and logical reasoning, due to the lack of readily available verifiable reward data in broader domains. As a result, these models struggle to generalize their reasoning abilities to the wide variety of challenges encountered in real-world environments. To address this limitation, we collect and assemble a comprehensive RL-ready visual reasoning training dataset encompassing 46 datasets across 13 dimensions of 5 domains, covering a wide range of realistic scenarios such as infographic reasoning, mathematical reasoning, spatial reasoning, and general science reasoning. Based on this dataset, we propose an influence function-based data filtering strategy and a multi-round data curriculum method to iteratively strengthen general visual reasoning abilities. Using this approach, we train a general reasoning VLM, namely Vision-G1. Our 7B model achieves state-of-the-art performance across nine visual reasoning benchmarks, surpassing previous similar-sized VLMs and even GPT-4o and Gemini-1.5 Flash. The code and dataset will be publicly available to facilitate future research.

Vision-G1: Towards General Reasoning Vision-Language Models via Reinforcement Learning

Counterfactual reasoning is widely recognized as one of the most challenging and intricate aspects of causality in artificial intelligence. In this paper, we evaluate the performance of large language models (LLMs) in counterfactual reasoning. In contrast to previous studies that primarily focus on commonsense causal reasoning, where LLMs often rely on prior knowledge for inference, we specifically assess their ability to perform counterfactual inference using a set of formal rules. To support this evaluation, we introduce a new benchmark dataset, \textbf{CounterBench}, comprising 1.2K counterfactual reasoning questions. The dataset is designed with varying levels of difficulty, diverse causal graph structures, distinct types of counterfactual questions, and multiple nonsensical name variants. Our experiments demonstrate that counterfactual reasoning poses a significant challenge for LLMs, with most models performing at levels comparable to random guessing. To enhance LLM's counterfactual reasoning ability, we propose a novel reasoning paradigm, \textbf{CoIn}, which guides LLMs through iterative reasoning and backtracking to systematically explore counterfactual solutions. Experimental results show that our method significantly improves LLM performance on counterfactual reasoning tasks and consistently enhances performance across different LLMs. Our dataset is available at https://huggingface.co/datasets/CounterBench/CounterBench.

CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models

The Profiled Vehicle Routing Problem (PVRP) extends the classical VRP by incorporating vehicle–client-specific preferences and constraints, reflecting real‑world requirements such as zone restrictions and service‑level preferences. While recent reinforcement‑learning solvers have shown promising performance, they require retraining for each new profile distribution, suffer from poor representation ability, and struggle to generalize to out‑of‑distribution instances. In this paper, we address these limitations by introducing **U**nified **S**olver for **P**rofiled **R**outing (USPR), a novel framework that natively handles arbitrary profile types. USPR introduces on three key innovations: (i) Profile Embeddings (PE) to encode any combination of profile types; (ii) Multi‑Head Profiled Attention (MHPA), an attention mechanism that models rich interactions between vehicles and clients; (iii) Profile‑aware Score Reshaping (PSR), which dynamically adjusts decoder logits using profile scores to improve generalization. Empirical results on diverse PVRP benchmarks demonstrate that USPR achieves state‑of‑the‑art results among learning‑based methods while offering significant gains in flexibility and computational efficiency. We make our source code publicly available to foster future research.

USPR: Learning a Unified Solver for Profiled Routing

Adversarial Missingness (AM) attacks aim to manipulate model fitting by carefully engineering a *missing* data problem to achieve a specific malicious objective.
AM attacks are significantly different from prior data poisoning attacks in that no malicious data inserted and no data is maliciously perturbed. Current AM attacks are feasible only under the assumption that the modeler (victim) uses full-information maximum likelihood methods to handle missingness. This work aims to remedy this limitation of AM attacks; in the approach taken here, the adversary achieves their goal by solving a bi-level optimization problem to engineer the adversarial missingness mechanism, where the lower level problem incorporates a differentiable approximation of the targeted missingness remediation technique. As instantiations of this framework, AM attacks are provided for three popular techniques: (i) complete case analysis, (ii) mean imputation, and (iii) regression-based imputation for general *empirical risk minimization* (ERM) problems. 
Experiments on real-world data show that AM attacks are successful with modest levels of missingness (less than 20%). 
Furthermore, we show on the real-world *Twins* dataset that AM attacks can manipulate the estimated average treatment effect (ATE) as an instance of the general ERM problems: the adversary succeeds in not only reversing the sign, but also in substantially inflating the ATE values from a true value of $-1.61$% to a manipulated one as high as $10$%. These experimental results hold when the ATE is calculated using multiple regression-based estimators with different architectures, even when the adversary is restricted to modifying only a subset of the training data. The goals of this work are to: (i) establish the vulnerability to AM attacks of a significantly wider class of missingness remediation strategies than established in prior work, and (ii) brings the AM threat model to the attention of the community, as there are currently no defense strategies for these attacks.

Exploiting Missing Data Remediation Strategies Using Adversarial Missingness Attacks

Understanding multimodal metaphors represents a crucial pathway for machines to comprehend human cognition. However, current research remains constrained by superficial dataset annotations, insufficient systematic evaluation of large language models, and fragmented task frameworks. To bridge these gaps, the paper proposes a systematic solution featuring: (I) We present the largest fine-grained **M**ulti-task **M**ultimodal **M**etaphor **U**nderstanding **C**hallenge **D**ataset (**M$^{3}$UCD**) built via multi-perspective collaborative annotation. It contains 15,345 samples, each annotated with 12 manual attribute labels. (II) Systematic benchmarking of LLMs' capacity boundaries in metaphor understanding. Evaluation results reveal the persistent challenges LLMs face in this domain while validating M$^{3}$UCD's effectiveness and potential. (III) A concise and unified multi-task baseline framework was developed and demonstrated its effectiveness in enhancing the metaphor understanding capabilities of MLLMs. M$^{3}$UCD will be publicly released to advance metaphor research.

$\textit{Disclaimer}$: M$^{3}$UCD contains samples with potentially sensitive content (e.g., sarcasm, offensiveness, fake news, cultural references).

M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs

Current Zero-Shot Temporal Action Localization (ZSTAL) methods, whether training-based or training-free ones, still predominantly rely on a single, unified query to localize an entire action. This unified representation is fundamentally ill-suited for complex real-world activities, as it fails to capture their internal compositional structure and adapt to dynamic, multi-stage variations across videos. To address this, we regard ZSTAL as a compositional reasoning task and introduce CASCADE, a Context-Aware Staged Action DEcomposition framework. Inspired by the human cognitive process of perceiving context, decomposing events, and reconstructing instances, CASCADE follows a training-free pipeline. It first perceives the video's context by leveraging a Multimodal Large Language Model (MLLM) to both filter out irrelevant actions and then generate a rich, video-specific caption for each action present in the video. An LLM then decomposes this caption into multiple, temporally ordered stages, which serve as fine-grained queries to guide the MLLM in estimating frame-level confidence scores. Recognizing that this decomposition can fragment a single action, a novel hierarchical merging logic then reconstructs complete instances by intelligently fusing these preliminary temporal segments based on their semantic progression and coherence. Extensive experiments and ablation studies on THUMOS14 and ActivityNet-1.3 show that CASCADE not only sets a new state-of-the-art among training-free methods but, most notably, significantly outperforms all prior training-based approaches on ActivityNet-1.3.

Decompose and Conquer: Compositional Reasoning for Zero-Shot Temporal Action Localization

Pre-trained gaze models learn to identify useful patterns commonly found across users, but subtle user-specific variations (i.e., eyelid shape or facial structure) can degrade model performance.
Test-time personalization (TTP) adapts pre-trained models to these user-specific domain shifts using only a few unlabeled samples.
Efficient fine-tuning is critical in performing this domain adaptation: data and computation resources can be limited-especially for on-device customization.
While popular parameter-efficient fine-tuning (PEFT) methods address adaptation costs by updating only a small set of weights, they may not be taking full advantage of structures encoded in pre-trained filters.
To more effectively leverage existing structures learned during pre-training, we reframe personalization as a process to reweight existing features rather than learning entirely new ones.

We present Attentive Low-Rank Filter Adaptation (Alfa) to adapt gaze models by reweighting semantic patterns in pre-trained filters.
With Alfa, singular value decomposition (SVD) extracts dominant spatial components that capture eye and facial characteristics across users.
Via an attention mechanism, we need only a few unlabeled samples to adjust and reweight pre-trained structures, selectively amplifying those relevant to a target user.
Alfa achieves the lowest average gaze errors across four cross-dataset gaze benchmarks, outperforming existing TTP methods and low-rank adaptation (LoRA)-based variants.
We also show that Alfa's attentive low-rank methods can be applied to applications beyond vision, such as diffusion-based language models.

Content not yet available

Next from AAAI 2026

Improved Differentially Private Algorithms for Rank Aggregation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES