Singapore

Context-based offline meta-reinforcement learning (meta-RL) is a paradigm that integrates meta-learning with offline reinforcement learning. It learns a strategy to extract task-specific contexts from trajectories of meta-training tasks and leverages this strategy for adapting to unseen target tasks. However, existing methods struggle to generate generalizable contexts for adaptations due to context shift, which arises from the behavior policy overfitting to offline data. We argue that leveraging the internal relationships among tasks, rather than treating each task in isolation, is crucial for mitigating the impact of context shift. Hence, we propose a framework called cross-task contexts for improving generalization in meta-RL (CTMRL). Specifically, we design a context quantization variational auto-encoder (CQ-VAE), which clusters task-specific contexts of meta-training tasks into discrete codes based on the internal relationships among tasks. Cross-task contexts are constructed with these codes, reflecting shared information across similar tasks. These cross-task contexts not only serve as high-level structures to capture similarity across tasks but also provide a foundation for hard contrastive learning that enhances the distinguishability of similar yet distinct tasks, thereby improving the generalization of contexts and facilitating adaptation to unseen target tasks. The evaluation in meta-environments confirms the performance advantage of CTMRL over existing methods.

AAAI 2026

Improving Generalization in Offline Meta-Reinforcement Learning via Cross-task Contexts

context.

offline reinforcement learning

meta-reinforcement learning

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Multimodal data is typically collected through heterogeneous sensors and processing pipelines. However, due to variations in acquisition environments, device capabilities, and feature extraction methods, such data often suffers from incompleteness and inconsistent quality across modalities. To address these challenges, prior studies have explored modality selection and data completion strategies to improve information fusion. Nevertheless, these approaches face two main limitations: (1) they struggle to simultaneously ensure computational efficiency for large-scale graph data and maintain structural and semantic consistency across heterogeneous modality graphs; and (2) most of them operate at the modality level and fail to capture fine-grained, sample-specific quality variations.

To overcome these issues, we propose a novel clustering framework, Sample Weighted Incomplete Multimodal Clustering Based on Graph Coarsening Label Extraction (IMC-GCSW). The proposed method introduces a graph coarsening-based label extraction strategy. It significantly reduces the computational cost of multimodal graph processing, while preserving key node information and local topological structures. Furthermore, a quality-aware sample weighting strategy is designed to enable fine-grained modeling of modality-specific data quality, allowing the model to dynamically suppress the influence of low-quality modalities on individual samples. Experiments on both general-purpose datasets and the Fructus Aurantii Disease and Pest Datasets demonstrate that the proposed method exhibits superior performance and strong adaptability in handling multimodal data with incompleteness and quality inconsistency.

Sample Weighted Incomplete Multimodal Clustering Based on Graph Coarsening Label Extraction

Automatic medical report generation has the potential to support clinical diagnosis, reduce the workload of radiologists, and demonstrate potential for enhancing diagnostic consistency. However, current evaluation metrics often fail to reflect the clinical reliability of generated reports. Early overlap-based methods focus on textual matches between predicted and ground-truth entities but miss fine-grained clinical details (e.g., anatomical location, severity). Some diagnostic metrics are limited by fixed vocabularies or templates, reducing their ability to capture diverse clinical expressions. LLM-based approaches further lack interpretable reasoning steps, making it hard to assess or trust their behavior in safety-critical settings. These limitations hinder the comprehensive assessment of the reliability of generated reports and pose risks in their selection for clinical use. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper, which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs stable calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments validate that GEMA-Score achieves the highest correlation with human expert evaluations on a public dataset, demonstrating its effectiveness in clinical scoring (Kendall coefficient = 0.69 for Rexval dataset and Kendall coefficient = 0.45 for RadEvalX dataset).

GEMA-Score: Granular Explainable Multi-Agent Scoring Framework for Radiology Report Evaluation

Vision Transformers (ViT) and their variants have achieved remarkable success in various complex tasks, but these accomplishments come with high computational costs and significant inference latency. Token pruning, as an effective technique, reduces computational burden by removing redundant or unimportant tokens, thereby lowering model resource consumption and inference time. Although existing retraining-free token pruning algorithms perform well in accelerating inference, their pruning strategies are often limited to locally optimal mask configurations. They fail to fully explore the interdependencies among intra-layer mask variables from a global perspective, which in turn constrains the overall performance improvement of the model. To address these limitations, we propose V-Pruner (A Fast and Globally-informed Token Pruning Framework for Vision Transformer). This framework delivers a fast, efficient, and streamlined end-to-end pruning workflow that operates without user intervention. This algorithm consists of three stages: Token Mask Search, Token Mask Rearrangement, and Token Mask Tuning. In the Token Mask Search stage, we utilize Fisher information to identify key and redundant tokens; In the Token Mask Rearrangement stage, we introduce Reinforcement learning algorithm to deeply explore the global interactions among intra-layer mask variables, thereby overcoming the limitation of traditional methods that focus only on local information and enhancing the overall pruning performance; Finally, in the Token Mask Tuning stage, we precisely adjust the mask variables to restore the accuracy of the pruned model, aiming to compensate for any potential accuracy loss during the pruning process. We evaluated this approach on ViT-L, DeiT-B, DeiT-S, and DeiT-T models, and experimental results show that compared to existing pruning methods, V-Pruner exhibits superior performance in balancing accuracy, speed, and FLOPs, providing a significant competitive advantage.

V-Pruner: A Fast and Globally-informed Token Pruning Framework for Vision Transformer

The rapid progress of multi-modal large language models (MLLMs) has boosted the task of image quality assessment (IQA). However, a key challenge arises from the inherent mismatch between the discrete token outputs of MLLMs and the continuous nature of quality scores required by IQA tasks. This discrepancy significantly hinders the performance of MLLM-based IQA methods. Previous approaches that convert discrete token predictions into continuous scores often suffer from conversion errors. Moreover, the semantic confusion introduced by level tokens (e.g., “good”) further constrains the performance of MLLMs on IQA tasks and degrades their original capabilities to related tasks. To tackle these problems, we provide a theoretical analysis of the errors inherent in previous approaches and, motivated by this analysis, propose a simple yet effective framework, Q-Scorer. This framework incorporates a lightweight regression module and IQA-specific score tokens into the MLLM pipeline. Extensive experiments demonstrate that Q-Scorer achieves state-of-the-art performance across multiple IQA benchmarks, generalizes well to mixed datasets, and further improves combined with other methods.

Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

Kernel power $k$-means (KPKM) leverages a family of means to mitigate local minima issues in kernel $k$-means. However, KPKM faces two key limitations: (1) the computational burden of the full kernel matrix restricts its use on extensive data, and (2) the lack of authentic centroid-sample assignment learning reduces its noise robustness. To overcome these challenges, we propose RFF-KPKM, introducing the first approximation theory for applying random Fourier features (RFF) to KPKM. RFF-KPKM employs RFF to generate efficient, low-dimensional feature maps, bypassing the need for the whole kernel matrix.
Crucially, we are the first to establish strong theoretical guarantees for this combination: (1) an excess risk bound of $\mathcal{O}(\sqrt{k^3/n})$, (2) strong consistency with membership values, and (3) a $(1+\varepsilon)$ relative error bound achievable using the RFF of dimension $\mathrm{poly}(\varepsilon^{-1}\log k)$. Furthermore, to improve robustness and the ability to learn multiple kernels, we propose IP-RFF-MKPKM, an improved possibilistic RFF-based multiple kernel power $k$-means. IP-RFF-MKPKM ensures the scalability of MKPKM via RFF and refines cluster assignments by combining the merits of the possibilistic membership and fuzzy membership. Experiments on large-scale datasets demonstrate the superior efficiency and clustering accuracy of the proposed methods compared to the state-of-the-art alternatives.

Enhancing Kernel Power $K$-means: Scalable and Robust Clustering with Random Fourier Features and Possibilistic Method

Partial label learning (PLL) aims to learn from the data where each instance is associated with a candidate label set, with only one being valid. Most existing approaches are designed to eliminate noisy labels and use the remaining reliable ones for model training, following a label-centric learning paradigm. In this paper, we propose a new PLL method called Semantic-Aware Feature Enhancement (SAFE), which tackles the problem through a novel feature-centric learning paradigm. SAFE presumes that the candidate labels are correct while the observed features are partial, and thus seeks to recover the underlying missing features. In this manner, a desired predictive model is constructed by integrating the observed and recovered features, which are responsible for predicting the true label and the remaining candidate labels, respectively. To ensure the quality of recovered features, SAFE jointly explores the intrinsic topological structures via dynamic graphs in both feature and label spaces as guidance for semantic-aware feature enhancement. Extensive experimental results on some popular datasets demonstrate the effectiveness and superiority of the proposed method over state-of-the-art PLL approaches.

Semantic-Aware Feature Enhancement for Partial Label Learning

Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large language models (LLMs) have established LLM-based agents as a major branch of MAS, enabling major breakthroughs in complex problem solving and world modeling. However, the reliability implications of this shift remain largely unexplored. i.e., whether substituting traditional agents with LLM-based agents can effectively enhance the reliability of MAS. In this work, we investigate and quantify the reliability of LLM-based agents from the perspective of Byzantine fault tolerance. We observe that LLM-based agents demonstrate stronger skepticism when processing erroneous message flows, a characteristic that enables them to outperform traditional agents across different topological structures. Motivated by the results of the pilot experiment, we design CP-WBFT, a confidence probe-based weighted Byzantine Fault Tolerant consensus mechanism to enhance the stability of MAS with different topologies. It capitalizes on the intrinsic reflective and discriminative capabilities of LLMs by employing a probe-based, weighted information flow transmission method to improve the reliability of LLM-based agents. Extensive experiments demonstrate that CP-WBFT achieves superior performance across diverse network topologies under extreme Byzantine conditions (85.7% fault rate). Notably, Hidden-level Confidence Probe consistently outperforms traditional methods, achieving 100% accuracy on complete graphs and maintaining robust reliability across both mathematical reasoning and safety assessment tasks.

Rethinking the Reliability of Multi-agent System: A Perspective from Byzantine Fault Tolerance

We address payoff-based decentralized learning in infinite-horizon zero-sum Markov games. In this setting, each player makes decisions based solely on received rewards, without observing the opponent's strategy or actions, nor sharing information. Prior works established polynomial-time convergence to an approximate Nash equilibrium under strong reachability and mixing time assumptions. We propose a convergent algorithm that significantly relaxes these assumptions, requiring only the existence of a single policy with bounded reachability and mixing time. Our key algorithmic novelty is introducing Tsallis entropy regularization to smooth the best-response policy updates. By suitably tuning this regularization, we ensure sufficient exploration, thus bypassing previous stringent assumptions on the MDP. We prove a polynomial-time convergence to an approximate Nash equilibrium by establishing novel properties of the value and policy updates induced by the Tsallis entropy regularizer.

Learning in Zero-Sum Markov Games: Relaxing Strong Reachability and Mixing Time Assumptions

Few-shot graph learning remains a fundamental yet challenging problem, especially under heterophilic graph settings where connected nodes are likely to belong to different classes. In such scenarios, two key challenges arise: (1) unreliable or noisy graph structures that hinder effective message passing, and (2) semantic inconsistency: in heterophilic graphs, aggregating messages from neighbors of different classes entangles representations and introduces misleading semantics. These issues are further exacerbated by the limited labeled data inherent to few-shot learning, making it difficult to adaptively repair structure or disentangle semantics. To address these challenges, we propose DAPrompt, a Dual Alignment Prompt framework that jointly calibrates graph structure and semantic representations across the learning pipeline. In the pretraining stage, DAPrompt incorporates a graph structure learning module to denoise and repair the underlying topology, enhancing structural reliability. In the prompt tuning stage, we introduce two coordinated modules: a structure-aware prompt learner, which employs prompt tokens to repair unreliable graph structures and capture structure-level alignment, and a semantics-aligned prompt learner, which enhances the graph using target node semantics to mitigate representation noise caused by class-mismatched propagation. Extensive experiments on both node-level and graph-level few-shot benchmarks validate its effectiveness, achieving state-of-the-art performance and highlighting the value of structure-semantic dual alignment in heterophilic few-shot graph learning.

DAPrompt: Dual Alignment Prompt of Structure and Semantics for Few-shot Graph Learning

The advent of multimodal large language models (MLLMs) has sparked interest in their application to electrocardiogram (ECG) analysis. 
However, existing ECG-focused MLLMs primarily focus on report generation tasks, often limited to single 12-lead, short-duration (10s) ECG inputs, thereby underutilizing the potential of MLLMs. To this end, we aim to develop a MLLM for ECG analysis that supports a broader range of tasks and more flexible ECG inputs. 
However, existing ECG-QA datasets are often monotonous. To address this gap, we first constructed the anyECG dataset, which encompasses a wide variety of tasks, including report generation, abnormal waveform localization, and open-ended question answering. In addition to standard hospital ECGs, we introduced long-duration reduced-lead ECGs for home environments and multiple ECG comparison scenarios commonly encountered in clinical practice.
Furthermore, we propose the anyECG-chat model, which supports dynamic-length ECG inputs and multiple ECG inputs. We trained the model using a three-stage curriculum training recipe with the anyECG dataset.
A comprehensive evaluation was conducted, demonstrating that anyECG-chat is capable of supporting various practical application scenarios, including not only common report generation tasks but also abnormal waveform localization for long-duration reduced-lead ECGs in home environments and comprehensive comparative analysis of multiple ECGs.

Downloads

Next from AAAI 2026

Sample Weighted Incomplete Multimodal Clustering Based on Graph Coarsening Label Extraction

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES