Singapore

In multimodal sentiment analysis, modality missingness and quality degradation are common. Existing methods often rely on batch-level modality generation, generation but neglect sample-level missingness, hence their flexibility is limited severely in real-world scenarios. To address this, Sample-specific Modality Diagnosis and Cross-modal Enhancement for Incomplete Multimodal Representations (SMCIR) is proposed. Specifically, The Dynamic Multi-feature Fusion Detector (DMFD) is presented, which detects missingness and severity at the sample-level using indicators such as information entropy, modality similarity, and mutual information. Unlike batch-based methods, the DMFD provides fine-grained detection and adaptive responses, improving sensitivity to modality disturbances. Meanwhile, the Context-aware Modality Completion Generator (CMCG) is developed to restore missing modalities through context-guided reconstruction using multiscale feature fusion and cross-modal attention. In this way, the proposed CMCG method can avoid redundancy and inconsistency, enhancing the consistency and discriminativity of the fused representation. In CMCG, the text modality serves as a stable guide to improve context consistency. Experiments on the CMU-MOSI and CMU-MOSEI datasets show that SMCIR outperforms existing full-modal and non-recovery-based methods, well validating its efficacy and superiority in multimodal learning.

AAAI 2026

Sample-specific Modality Diagnosis and Cross-modal Enhancement for Incomplete Multimodal Representations

multimodal sentiment analysis，sample-specific modality diagnosis，generative models

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Multi-relational graph clustering aims to uncover complex node interactions by leveraging multiple relational views, yet existing methods often suffer from two key limitations: they assume equal importance across views and decouple representation learning from clustering, both of which hinder overall performance. To address these issues, we propose OMC-DVM, a novel end-to-end Online Multi-Relational Graph Clustering With Dominant View Mining framework. 
OMC-DVM introduces two core innovations: (1) A unsupervised dominant view mining module that dynamically identifies the dominant view using Maximum Mean Discrepancy (MMD) and adaptively aligns other views to it, mitigating view imbalance. 
(2) An online ,multi-relational clustering process that unifies representation learning and clustering into a single stage.
By performing clustering-level contrastive learning , OMC-DVM directly generates cluster assignments in an end-to-end manner.

Online Multi-Relational Clustering with Dominant View Mining

Goal-conditioned Hierarchical Reinforcement Learning (GCHRL) has demonstrated effectiveness in addressing complicated decision-making tasks by providing ''temporal extraction'', which decomposes tasks into smaller and more manageable ''subgoals''. This enables agents to plan over a longer time scale. However, achieving optimal exploration and exploitation still remains a challenge, especially for long-horizon or sparse-reward scenarios. In this paper, we introduce Active exploraion and hierarchical Self-Imitation (ASI), an effective scheme to enhance exploration and exploitation based on subgoal representation learning. The key point of ASI is to utilize temporal adjacency information in the representation space. We construct and dynamically update an adjacency graph that captures the relationships between subgoals. Based on the adjacency information provided by the graph, we design two mechanisms: (1) active ``frontier-reaching'' exploration that faster expands the explored area by targeting boundary regions, and (2) hierarchical self-imitation learning that leverages historical experience to facilitate both frontier reaching and policy training. Experimental results show that our method accelerates exploration and outperforms existing baselines in challenging long-horizon continuous control tasks.

Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning

Multi-objective combinatorial optimization problems (MOCOP) frequently arise in practical applications that require the simultaneous optimization of conflicting objectives. Although traditional evolutionary algorithms can be effective, they typically depend on domain knowledge and repeated parameter tuning, limiting flexibility when applied to unseen MOCOP instances. Recently, integration of Large Language Models (LLMs) into evolutionary computation has opened new avenues for automatic heuristic generation, using their advanced language understanding and code synthesis capabilities. Nevertheless, most existing approaches predominantly focus on single-objective tasks, often neglecting key considerations such as runtime efficiency and heuristic diversity in multi-objective settings. To bridge this gap, we introduce Multi-heuristics for MOCOP via Pareto-Grid-guided Evolution of LLMs (MPaGE), a novel enhancement of the Simple Evolutionary Multiobjective Optimization (SEMO) framework that leverages LLMs and Pareto Front Grid (PFG) technique. By partitioning the objective space into grids and retaining top-performing candidates to guide heuristic generation, MPaGE utilizes LLMs to prioritize heuristics with semantically distinct logical structures during variation, thus promoting diversity and mitigating redundancy within the population. Through extensive evaluations, MPaGE demonstrates superior performance over existing LLM-based frameworks, and achieves competitive results to traditional Multi-objective evolutionary algorithms (MOEAs), with significantly faster runtime.

Pareto-Grid-Guided Large Language Models for Fast and High-Quality Heuristics Design in Multi-Objective Combinatorial Optimization

Joint multilingual instruction tuning is a widely adopted approach to improve the multilingual instruction-following ability and downstream performance of large language models (LLMs), but the resulting multilingual capability remains highly sensitive to the composition and selection of the training data.
Existing selection methods, often based on features like text quality, diversity, or task relevance, typically overlook the intrinsic linguistic structure of multilingual data.
In this paper, we propose LangGPS, a lightweight two-stage pre-selection framework guided by language separability—a signal that quantifies how well samples in different languages can be distinguished in the model’s representation space. LangGPS first filters training data based on separability scores and then refines the subset using existing selection methods.
Extensive experiments across six benchmarks and 22 languages demonstrate that applying LangGPS on top of existing selection methods improves their effectiveness and generalizability in multilingual training, especially for understanding tasks and low-resource languages.
Further analysis reveals that highly separable samples facilitate the formation of clearer language boundaries and support faster adaptation, while low-separability samples tend to function as bridges for cross-lingual alignment.
Besides, we also find that language separability can serves as an effective signal for multilingual curriculum learning, where interleaving samples with diverse separability levels yields stable and generalizable gains.
Together, we hope our work offers a new perspective on data utility in multilingual contexts and support the development of more linguistically informed LLMs.

LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning

In this paper, we investigate code-integrated reasoning (CIR), where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external code tools effectively, which is supported by tool-augmented reinforcement learning (RL). Despite its benefits, tool-augmented RL can still suffer from potential instability in the learning dynamics. In light of this challenge, we present a systematic approach ETIR (Effective TIR) to improving the training effectiveness and stability of tool-augmented RL for code-integrated reasoning. Specifically, we develop enhanced training strategies that balance exploration and stability, progressively building tool-use capabilities while improving reasoning performance. Through extensive experiments on five mainstream mathematical reasoning benchmarks, our model demonstrates significant performance improvements over multiple competitive baselines. Furthermore, we conduct an in-depth analysis of the mechanism of code-integrated reasoning, revealing several key insights, such as the extension of model’s capability boundaries and the simultaneous improvement of reasoning efficiency through code integration. These findings underscore the potential of code-integrated reasoning as a scalable paradigm for advancing robust and efficient language model reasoning.

Towards Effective Code-Integrated Reasoning

Efficiently fine-tuning pre-trained models for downstream tasks is a key challenge in the era of foundation models. Parameter-efficient fine-tuning (PEFT) presents a promising solution, achieving performance comparable to full fine-tuning by updating only a small number of adaptation weights per layer. Traditional PEFT methods typically rely on a single expert, where the adaptation weight is a low-rank matrix. However, for complex tasks, the data's inherent diversity poses a significant challenge for such models, as a single adaptation weight cannot adequately capture the features of all samples. To address this limitation, we explore how to integrate multiple **small** adaptation experts into a compact structure to defeat a **large** adapter. Specifically, we propose Tucker Adaptation (TuckA), a method with four key properties: (i) We use Tucker decomposition to create a compact 3D tensor where each slice naturally serves as an expert. The low-rank nature of this decomposition ensures that the number of parameters scales efficiently as more experts are added. (ii) We introduce a hierarchical strategy that organizes these experts into groups at different granularities, allowing the model to capture both local and global data patterns. (iii) We develop an efficient batch-level routing mechanism, which reduces the router's parameter size by a factor of $L$ compared to routing at every adapted layer (where $L$ is the number of adapted layers) (iv) We propose data-aware initialization to achieve loss-free expert load balancing based on theoretical analysis. Extensive experiments on benchmarks in natural language understanding, image classification, and mathematical reasoning speak to the efficacy of TuckA, offering a new and effective solution to the PEFT problem.

TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning

Clinical reinforcement learning (RL) holds promise for treatment recommendation but remains hindered by black-box decision processes, limited safety guarantees, and lack of individualized reasoning. We introduce Delphi Engine, the first fully trainable neuro-symbolic causal RL framework for dynamic treatment planning, designed to answer three core clinical questions in real time: Why this action? Why is it safe? Why for this patient?
Specifically, Delphi integrates: (1) causality-aware state modeling using discretized physiological variables and subtype-specific causal graphs; (2) adaptive symbolic rule constraints, combining clinical guidelines and behavior-derived rules into soft differentiable logic; and (3) interpretable decision fusion, where actions are selected based on joint neural-symbolic Q-values and explained via structured LLM-based justifications.
We evaluate Delphi on the MIMIC-III sepsis cohort using both standard off-policy evaluations (WIS↑1.47, DR↑1.29, RMSE↓0.207) and the first blinded physician evaluation of an explainable RL system in healthcare. Delphi consistently outperforms historical physicians' treatments in safety (+10.4\%), understandability (+8.9\%), and adoption rate (+5.75\%) across six clinical axes. These results highlight Delphi’s potential as a safe, interpretable, and patient-specific AI assistant for critical care.

Delphi: A Neuro-Symbolic Framework for Individualized, Safe and Interpretable Treatment Recommendation

Zero-Shot Anomaly Detection (ZSAD) aims to identify and localize anomalous regions in images of unseen object classes. While recent methods based on vision-language models like CLIP show promise, their performance is constrained by existing prompt engineering strategies. Current approaches, whether relying on single fixed, learnable, or dense dynamic prompts, suffer from a representational bottleneck and are prone to overfitting on auxiliary data, failing to generalize to the complexity and diversity of unseen anomalies. To overcome these limitations, we propose $\mathtt{PromptMoE}$. Our core insight is that robust ZSAD requires a compositional approach to prompt learning. Instead of learning monolithic prompts, $\mathtt{PromptMoE}$ learns a pool of expert prompts, which serve as a basis set of composable semantic primitives, and a visually-guided Mixture-of-Experts (MoE) mechanism to dynamically combine them for each instance.
Our framework materializes this concept through a Visually-Guided Mixture of Prompt (VGMoP) that employs an image-gated sparse MoE to aggregate diverse normal and abnormal expert state prompts, generating semantically rich textual representations with strong generalization. Extensive experiments across 15 datasets in industrial and medical domains demonstrate the effectiveness and state-of-the-art performance of $\mathtt{PromptMoE}$.

PromptMoE: Generalizable Zero-Shot Anomaly Detection via Visually-Guided Prompt Mixtures

Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a solution, but current approaches typically ignore the distinct roles of model components and the heterogeneous importance across layers, thereby limiting adaptation efficiency.
Motivated by the observation that Rotary Position Embeddings (RoPE) induce critical activations in the low-frequency dimensions of attention states, we propose RoPE-aware Selective Adaptation (RoSA), a novel PEFT framework that allocates trainable parameters in a more targeted and effective manner.
RoSA comprises a RoPE-aware Attention Enhancement (RoAE) module, which selectively enhances the low-frequency components of RoPE-influenced attention states, and a Dynamic Layer Selection (DLS) strategy that adaptively identifies and updates the most critical layers based on LayerNorm gradient norms.
By combining dimension-wise enhancement with layer-wise adaptation, RoSA achieves more targeted and efficient fine-tuning.
Extensive experiments on fifteen commonsense and arithmetic benchmarks demonstrate that RoSA outperforms existing mainstream PEFT methods under comparable trainable parameters. The code is available to ease reproducibility.

RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models

Scanpath prediction in omnidirectional images (ODIs) serves as a critical component for optimizing foveated rendering efficiency and enhancing interactive quality in virtual reality systems. However, existing scanpath prediction methods for ODIs still suffer from fundamental limitations: (1) inadequate modeling and capturing of long-range temporal dependencies in fixation regions, and (2) suboptimal integration of spatial and temporal visual features, ultimately compromising prediction performance. To address these limitations, we propose a novel Dual-Temporal Modulated Diffusion model for Omnidirectional Images Scanpath Prediction, named SalDiff-DTM model, to effectively generate realistic human eye viewing trajectories. Specifically, to effectively model spatial relationships, we propose a novel Dual-Graph Convolutional Network (Dual-GCN) module that simultaneously captures semantic-level and image-level correlations. By integrating both local spatial details and global contextual information across the internal temporal dimension, this module achieves comprehensive and robust modeling of spatial relationships. To further enhance the modeling of temporal dependencies inherent in diverse fixation patterns, we introduce TABiMamba (Temporal-Aware BiLSTM-Mamba), a dedicated module that synergistically combines the contextual sensitivity of BiLSTM with the long-range sequence modeling capabilities of Mamba. This design facilitates deep information flow and context-aware sequential reasoning, thereby enabling high-fidelity capture of intricate temporal correlations. Inspired by the progressive refinement mechanism of diffusion models in various generative tasks, we propose a saliency-guided diffusion module that formulates the prediction problem as a conditional generative process, iteratively yielding accurate and perceptually plausible scanpaths. Extensive experiments demonstrate that SalDiff-DTM significantly outperforms state-of-the-art models, paving the way for future advancements in eye-tracking technologies and cognitive modeling, while broadening the horizons for immersive VR development.

Downloads

Next from AAAI 2026

Online Multi-Relational Clustering with Dominant View Mining

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Online Multi-Relational Clustering with Dominant View Mining

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads