Singapore

Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training–inference gap and lack the capacity for fine-grained token selection across multiple dimensions—such as queries, key-values (KV), and heads—leading to suboptimal performance and 
acceleration gains.
In this paper, we introduce \texttt{OmniSparse}, a training-aware fine-grained sparse attention of long-video MLLMs, which is applied in both training and inference with dynamic token budget allocation. Specifically, OmniSparse contains three adaptive and complementary mechanisms: (1) query selection as lazy-active classification, aiming to retain active queries that capture broader semantic similarity, while discarding most of lazy ones that focus on limited local context and exhibit high functional redundancy with their neighbors, (2) KV selection with head-level dynamic budget allocation, where a shared budget is determined based on the flattest head and applied uniformly across all heads to ensure attention recall after selection, and (3) KV cache slimming to alleviate head-level redundancy, which selectively fetches visual KV cache according to the head-level decoding query pattern.
Experimental results demonstrate that OmniSparse can achieve comparable performance with full attention, achieving 2.7$\times$ speedup during prefill and 2.4$\times$ memory reduction for decoding.

AAAI 2026

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

mllm

sparse attention

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Many real-world applications pose challenges in incorporating fairness constraints into the $k$-center clustering problem, where the dataset consists of $m$ demographic groups, each with a specified upper bound on the number of centers to ensure fairness. Focusing on big data scenarios, this paper addresses the problem in a streaming setting, where data points arrive one by one sequentially in a continuous stream. Leveraging a structure called the $\lambda$-independent center set, we propose a one-pass streaming algorithm that first computes a reserved set of points during the streaming process. Then, for the post-streaming process, we propose an approach for selecting centers from the reserved point set by analyzing all three possible cases, transforming the most complicated one into a specially constrained vertex cover problem in an auxiliary graph. Our algorithm achieves a tight approximation ratio of 5 while consuming $O(k\log n)$ memory. It can also be readily adapted to solve the offline fair $k$-center problem, achieving a 3-approximation ratio that matches the current state of the art. Furthermore, we extend our approach to a semi-structured data stream, where data points from each group arrive in batches. In this setting, we present a 3-approximation algorithm for $m = 2$ and a 4-approximation algorithm for general $m$. Lastly, we conduct extensive experiments to evaluate the performance of our approaches, demonstrating that they outperform existing baselines in both clustering cost and runtime efficiency.

Improved Streaming Algorithm for Fair k-Center Clustering

Multi-agent systems of large language models (LLMs) show promise for complex reasoning, but their effectiveness is often limited by fixed collaboration protocols. These frameworks typically focus on macro-level orchestration while overlooking agents’ internal deliberative capabilities. This critical meta-cognitive blindspot treats agents as passive executors unable to adapt their strategy based on internal cognitive states like uncertainty or confidence. We introduce the Meta-Policy Deliberation Framework (MPDF), where agents learn a decentralized policy over a set of high-level meta-cognitive actions: Persist, Refine, and Concede. To overcome the instability of traditional policy gradients in this setting, we develop SoftRankPO, a novel reinforcement learning algorithm. SoftRankPO stabilizes training by shaping advantages based on the rank of rewards mapped through smooth normal quantiles, making the learning process robust to reward variance. Experiments show that MPDF with SoftRankPO achieves a a 4--5\% absolute gain in average accuracy across five mathematical and general reasoning benchmarks compared to six state-of-the-art heuristic and learning-based multi-agent reasoning algorithms. Our work presents a paradigm for learning adaptive, meta-cognitive policies for multi-agent LLM systems, shifting the focus from designing fixed protocols to learning dynamic, deliberative strategies.

Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning

Existing methods for jailbreaking a Large Language Model (LLM) have largely focused on disguising a harmful request as benign, either through a single interaction with the LLM (as in single-turn methods) or through multiple interactions (as in multi-turn methods). In this paper, we propose Contextual History for Adaptive and Simple Exploitation (CHASE), a novel method for LLM jailbreaking that extends the success of existing multi-turn methods by showing that the conversational history of an LLM can additionally be exploited profitably to increase the chances of successful jailbreaking. To our knowledge, CHASE represents the first attempt to address LLM jailbreaking by considering both the linguistic aspect (i.e., how to linguistically disguise a harmful request as benign) and the extra-linguistic aspect (i.e., exploiting the conversational history of an LLM) of the problem.

CHASE: Contextual History for Adaptive and Simple Exploitation in Large Language Model Jailbreaking

TSP is a classic and extensively studied problem with numerous real-world applications in artificial intelligence and operations research. It is well-known that TSP admits a constant approximation ratio on metric graphs but becomes NP-hard to approximate within any computable function $f(n)$ on general graphs. This disparity highlights a significant gap between the results on metric graphs and general graphs. Recent research has introduced some parameters to measure the ``distance'' of general graphs from being metric and explored FPT approximation algorithms parameterized by these parameters. Two commonly studied parameters are $p$, the number of vertices in triangles violating the triangle inequality, and $q$, the minimum number of vertices whose removal results in a metric graph. In this paper, we present improved FPT approximation algorithms with respect to these two parameters. For $p$, we propose an FPT algorithm with a 1.5-approximation ratio, improving upon the previous ratio of 2.5. For $q$, we significantly enhance the approximation ratio from 11 to 3, advancing the state of the art in both cases.

FPT Approximation Algorithms for TSP on Non-Metric Graphs

Recent advances in LLM-based multi-agent systems have demonstrated remarkable capabilities in complex decision-making scenarios such as financial trading and software engineering. However, evaluating each individual agent’s effectiveness and online optimization of underperforming agents remain open challenges. To address these issues, we present HiveMind, a self-adaptive framework designed to optimize LLM multi-agent collaboration through contribution analysis. At its core, HiveMind introduces Contribution-Guided Online Prompt Optimization (CG-OPO), which autonomously refines agent prompts based on their quantified contributions. We first propose the Shapley value as a grounded metric to quantify each agent's contribution, thereby identifying underperforming agents in a principled manner for automated prompt refinement. To overcome the computational complexity of the classical Shapley value, we present DAG-Shapley, a novel and efficient attribution algorithm that leverages the inherent Directed Acyclic Graph structure of the agent workflow to axiomatically prune non-viable coalitions. By hierarchically reusing intermediate outputs of agents in the DAG, our method further reduces redundant computations, and achieving substantial cost savings without compromising the theoretical guarantees of Shapley values. Evaluated in a multi-agent stock-trading scenario, HiveMind achieves superior performance compared to static baselines. Notably, DAG-Shapley reduces LLM calls by over 80% while maintaining attribution accuracy comparable to full Shapley values, establishing a new standard for efficient credit assignment and enabling scalable, real-world optimization of multi-agent collaboration.

HiveMind: Contribution-Guided Online Prompt Optimization of LLM Multi-Agent Systems

Graph Neural Networks (GNNs) have demonstrated strong performance across various tasks by leveraging the structural information inherent in graph-structured data. To address the challenge of edge heterophily, where connected nodes may have dissimilar labels or features, two main families of GNNs have emerged: Mixture-of-Experts (MoE) based spatial GNNs and frequency filtering based spectral GNNs. While MoE-based spatial GNNs intuitively assign experts to different hops without solid theoretical grounding, spectral GNNs are based on principled insights from graph signal processing but often rely on manually designed filters and global operators, limiting their scalability and adaptability. In this work, we identify an inherent connection between these two families by showing that the eigengraph components in spectral methods can be treated as experts within an MoE framework. Building on this insight, we propose MORGAN, a novel spectral GNN that integrates Mixture-of-Experts into the spectral domain. MORGAN performs eigen-decomposition of the graph Laplacian, partitions the spectrum into multiple frequency bands, and assigns a dedicated expert network to each band. A learnable gating function dynamically combines these experts based on the spectral characteristics of the input. To support scalable and inductive learning, we further develop MORGAN(L), which incorporates subgraph sampling to enable localized spectral filtering without requiring full access to the graph Laplacian. Extensive experiments on 16 real-world benchmark datasets show that MORGAN achieves competitive or superior performance compared to state-of-the-art baselines, particularly in inductive node classification under heterophilic settings.

MORGAN: To Bridge Mixture of Experts and Spectral Graph Neural Network

Traffic simulation is essential for validating the safety and reliability of autonomous driving systems, yet data-driven simulation methods often struggle with distribution shifts, limiting their generalizability across diverse datasets (domains). To address this, we present Causal Driving Pattern Transfer (CDPT), a novel two-stage knowledge distillation framework built upon diffusion model to enhance cross-domain generalizability. In Phase I, we implement hybrid self-distillation within the source domain by integrating feature-, response-, and contrastive-level distillation, which enables the model to decompose complex driving behaviors into their core causal components, including scene-conditioned driven patterns, multi-agent interaction dynamics and casual saliency. In Phase II, we introduce a continual distillation strategy: few-shot samples from the target domain are used to initiate generation of diverse synthetic scenarios, allowing the student model to continually adapt to novel environments without retraining on large-scale data. Extensive experiments demonstrates that CDPT achieves strong generalization in both open-loop and closed-loop simulations, effectively generating realistic, interaction-aware behaviors that are critical for scalable and reliable autonomous driving testing.

Transferring Causal Driving Patterns for Generalizable Traffic Simulation with Diffusion-Based Distillation

Multi-agent path finding (MAPF) is the challenging problem of finding conflict-free paths with minimal costs for multiple agents. While traditional MAPF solvers are centralized using heuristic search, reinforcement learning (RL) is becoming increasingly popular due to its potential to learn decentralized and generalizing policies. RL-based MAPF must cope with spatial coordination, which is often addressed by combining independent training with ad hoc measures like replanning and communication. Such ad hoc measures often complicate the approach and require knowledge beyond the actual accessible information in RL, such as the full map occupation or broadcast communication channels, which limits generalizability, effectiveness, and sample efficiency. In this paper, we propose Partitioned Attention-based Reverse Curricula for Enhanced Learning (PARCEL), considering a bounding region for each agent. PARCEL trains all agents with overlapping regions jointly via self-attention to avoid potential conflicts. By employing a reverse curriculum, where the bounding regions grow as the policies improve, all agents will eventually merge into a single coordinated group. We evaluate PARCEL in two simple coordination tasks and four MAPF benchmark maps. Compared with state-of-the-art RL-based MAPF methods, PARCEL demonstrates better effectiveness and sample efficiency without ad hoc measures.

Spatially Grouped Curriculum Learning for Multi-Agent Path Finding

Rank aggregation is a task of combining the rankings of items from multiple users into a single ranking that best represents the users' rankings. Alabi et al. (AAAI'22) presents differentially-private (DP) polynomial-time approximation schemes (PTASes) and $5$-approximation algorithms with certain additive errors for the Kemeny rank aggregation problem in both central and local models.
In this paper, we present improved DP PTASes with smaller additive error in the central model. Furthermore, we are first to study the footrule rank aggregation problem under DP. We give a near-optimal algorithm for this problem; as a corollary, this leads to 2-approximation algorithms with the same additive error as the $5$-approximation algorithms of Alabi et al. for the Kemeny rank aggregation problem in both central and local models.

Improved Differentially Private Algorithms for Rank Aggregation

Decentralized partially observable Markov decision processes with communication (Dec-POMDP-Com) provide a framework for multiagent decision making under uncertainty, but the NEXP-complete complexity for finite-horizon problems renders solutions intractable in general. While sharing actions and observations can reduce the complexity to PSPACE-complete, we propose an approach that bridges POMDPs and Dec-POMDPs by communicating only suggested joint actions, eliminating the need to share observations while maintaining performance comparable to fully centralized planning and execution. Our algorithm estimates joint beliefs using shared actions to prune infeasible beliefs. Each agent maintains possible belief sets for other agents, pruning them based on suggested actions to form an estimated joint belief usable with any centralized policy. This approach requires solving a POMDP for each agent, reducing computational complexity while preserving performance. We demonstrate its effectiveness on several Dec-POMDP benchmarks, showing performance comparable to centralized methods when shared actions enable effective belief pruning. This action-based communication framework offers a natural avenue for integrating human-agent cooperation, opening new directions for scalable multiagent planning under uncertainty, with applications in both autonomous systems and human-agent teams.

Content not yet available

Next from AAAI 2026

Improved Streaming Algorithm for Fair k-Center Clustering

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES