Singapore

Text-attributed graphs (TAGs), which associate rich textual descriptions with each node, are widely employed to represent complex relationships among real-world textual entities.
Currently, representation learning for TAGs leverages large language models (LLMs) to transform node-matched textual descriptions into node features or labels, followed by message passing in graph neural networks (GNNs) that further improves the expressiveness of graph representation learning.
Nevertheless, a simple experiment we conducted demontrates that not all LLMs are readily compatible with GNNs.
A salient finding indicates that architectural heterogeneity among LLMs manifests as substantial performance gap across diverse TAGs representation learning.
Moreover, the node semantics encoded by LLMs are often misaligned with the message passing in GNNs, causing $\textit{performance collapse}$. 
Motivated by this observation, we propose a noval self-supervised graph learning framework called $\underline{S}$tage-$\underline{A}$ware $\underline{G}$raph $\underline{C}$ontrastive $\underline{L}$earning (SAGCL).
In particular, we propose the node-oriented mixture of experts (NodeMoE) to assign suitable candidate experts for each node. 
It flexibly balances the strengths of different language experts by low-rank decomposition and reparameterization strategies. 
Subsequently, to align the inductive biases of graph structures with the semantic perception capabilities of LLMs, the message passing in GNNs is decoupled into the feature transformation and the feature propagation stage. 
Given the two stage views, stage-aware graph contrastive learning is proposed to match the node semantics encoded by the LLM with the locally aware topological patterns within the GNN via self-supervised contrastive learning.
Experiments on eight datasets and three diverse tasks demonstrate the effectiveness of SAGCL.

AAAI 2026

Stage-Aware Graph Contrastive Learning with Node-oriented Mixture of Experts

social network analysis & community

graph mining

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Open-set noisy label learning faces a critical challenge in maintaining robust DNN performance when training data contains both in-distribution noisy (IDN) and out-of-distribution (OOD) samples. These noisy samples induce overconfident but erroneous predictions due to their ambiguous positions relative to category boundaries. Current methods address this by filtering noisy samples based on visual features alone, they fail to resolve the semantic ambiguity near decision boundaries, where limited visual cues lead to unreliable sample purification. To this end, we propose Content Diversity-guided Ambiguity Mitigation (CDgAM), a novel framework that leverages diverse contents to mitigate visual ambiguity in open-set noisy label learning. CDgAM leverages textual descriptions of intra-class commonality and inter-class disparity to dynamically refine semantic boundaries, reducing bias in prototype learning. To further suppress early-stage uncertainty in visual representations, we design a region-sensitive distillation regularization that transfers boundary-aware knowledge from a multimodal large language model to the target DNN. Extensive experiments conducted on various datasets with different noise levels demonstrate the effectiveness of our CDgAM, outperforming state-of-the-art methods for open-set noisy label learning.

Content Diversity-guided Ambiguity Mitigation for Open-Set Noisy Label Learning

Vision Transformers (ViTs) have achieved strong performance in video action recognition, but their high computational cost limits their practicality. Lightweight CNNs are more efficient but suffer from accuracy gaps. Cross-Architecture Knowledge Distillation (CAKD) addresses this by transferring knowledge from ViTs to CNNs, yet existing methods often struggle with architectural mismatch and overlook the value of stronger homogeneous CNN teachers. To tackle these challenges, we propose a Dual-Teacher Knowledge Distillation framework that leverages both a heterogeneous ViT teacher and a homogeneous CNN teacher to collaboratively guide a lightweight CNN student. We introduce two key components: (1) Discrepancy-Aware Teacher Weighting, which dynamically fuses the predictions from ViT and CNN teachers by assigning adaptive weights based on teacher confidence and prediction discrepancy with the student, enabling more informative and effective supervision; and (2) a Structure Discrepancy-Aware Distillation strategy, where the student learns the residual features between ViT and CNN teachers via a lightweight auxiliary branch, focusing on transferable architectural differences without mimicking all of ViT’s high-dimensional patterns. Extensive experiments on benchmarks including HMDB51, EPIC-KITCHENS-100, and Kinetics-400, demonstrate that our method consistently outperforms state-of-the-art distillation approaches, achieving notable performance improvements with a maximum accuracy gain of 5.95% on HMDB51.

Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models

Deep learning has shown strong potential in modeling complex spatiotemporal dynamics. However, most existing methods depend on densely and uniformly sampled data, which is often unavailable in practice due to sensor and cost limitations. In many real-world settings, such as mobile sensing and physical experiments, data are burst-sampled with short high-frequency segments followed by long gaps, making it difficult to learn accurate dynamics from sparse observations. To address this issue, we propose Physics-Informed Multi-Scale Recurrent Learning (PIMRL), a novel framework specifically designed for burst-sampled spatiotemporal data. PIMRL combines macro-scale latent dynamics inference with micro-scale adaptive refinement guided by incomplete prior information from partial differential equations (PDEs). It further introduces a temporal message-passing mechanism to effectively propagate information across burst intervals. This multi-scale architecture enables PIMRL to model complex systems accurately even under severe data scarcity. We evaluate our approach on five benchmark datasets involving 1D to 3D multi-scale PDEs. The results show that PIMRL consistently outperforms state-of-the-art baselines, achieving substantial improvements and reducing errors by up to 80\% in the most challenging settings, which demonstrates the clear advantage of our model. Our work demonstrates the effectiveness of physics-informed recurrent learning for accurate and efficient modeling of sparse spatiotemporal systems.

PIMRL: Physics-Informed Multi-Scale Recurrent Learning for Burst-Sampled Spatiotemporal Dynamics

Facial Expression Recognition (FER) seeks to classify affective states from facial images, which remains a challenging problem due to variations in real-world conditions. FER task becomes particularly complex when handling unconstrained environments characterized by partial occlusions, different head poses, and so on. To address the above problems, current approaches rely on extensive learnable parameters and complex model architectures, which inevitably lead to overfitting and cause the FER model to focus on non-discriminative facial regions. In this work, we propose an HKAFER model that can adaptively enhance visual expression representations through efficiently fine-tuning the image encoder in large Visual Foundation Models (VFMs) and Vision-Language Models (VLMs). Specifically, we establish Heterogeneous Kronecker Adaptation (HeKA), which consists of multi-scale adapters based on Kronecker product in a parallel manner, offering significantly diverse subspaces to learn the incremental matrices. Besides, we also propose Dual-Branch Interactive Router (DBIR) to dynamically assign the weights of adapters, which promotes collaboration and information flow among them. In this way, our HKAFER can effectively capture robust spatial features and the regional associations. Experimental results demonstrate that our proposed model not only outperforms state-of-the-art methods on several FER benchmarks but also uses significantly fewer trainable parameters.

HKAFER: Achieve Visual Parameter-Efficient Fine-Tuning via Heterogeneous Kronecker Adaptation for Facial Expression Recognition

The proliferation of synthetic facial imagery has intensified the need for robust Open-World DeepFake Attribution (OW-DFA), which aims to attribute both known and unknown forgeries using labeled data for known types and unlabeled data containing a mixture of known and novel types. However, existing OW-DFA methods face two critical limitations: 1) A confidence skew that leads to unreliable pseudo-labels for novel forgeries, resulting in biased training. 2) An unrealistic assumption that the number of unknown forgery types is known a priori. To address these challenges, we propose a Confidence-aware Asymmetric Learning (CAL) framework, which adaptively balances model confidence across known and novel forgery types. CAL mainly consists of two components: Confidence-aware Consistency Regularization (CCR) and Asymmetric Confidence Reinforcement (ACR). CCR mitigates pseudo-label bias by dynamically scaling sample losses based on normalized confidence, gradually shifting the training focus from high- to low-confidence samples. ACR complements this by separately calibrating confidence for known and novel classes through selective learning on high-confidence samples, guided by their confidence gap. Together, CCR and ACR form a mutually reinforcing loop that significantly improves the model's OW-DFA performance. Moreover, we introduce a Dynamic Prototype Pruning (DPP) strategy that automatically estimates the number of novel forgery types in a coarse-to-fine manner, removing the need for unrealistic prior assumptions and enhancing the scalability of our methods to real-world OW-DFA scenarios. Extensive experiments on the standard and OW-DFA benchmark and a newly extended benchmark incorporating advanced manipulations demonstrate that CAL consistently outperforms previous methods, achieving new state-of-the-art performance on both known and novel forgery attribution. Code and datasets will be made publicly available.

Open-World Deepfake Attribution via Confidence-Aware Asymmetric Learning

The challenge in WiFi-based cross-domain Behavior Recognition lies in the significant interference of domain-specific signals on gesture variation. However, previous methods alleviate this interference by mapping the phase from multiple domains into a common feature space. If the Doppler Frequency Shift (DFS) signal is used to dynamically supplement the phase features to achieve better generalization, enabling model to not only explore a wider feature space but also avoid potential degradation of gesture semantic information. Specifically, we propose a novel Salient-aware Adaptive WiFi Sensing for Cross-domain Behavior Recognition (Wi-CBR}, which constructs a dual-branch self-attention module that captures temporal features from phase information reflecting dynamic path length variations, while extracting spatial features from DFS correlated with motion velocity. Moreover, we design a Saliency Guidance Module that employs group attention mechanisms to mine critical activity features, and utilizes gating mechanisms to optimize information entropy, facilitating feature fusion and enabling effective interaction between salient and non-salient behavior characteristics. Extensive experiments on two large-scale public datasets (Widar3.0 and XRF55) demonstrate the superior performance of our method in both in-domain and cross-domain scenarios.

Wi-CBR: Salient-aware Adaptive WiFi Sensing for Cross-domain Behavior Recognition

Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional tri-plane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as scanned scenes from PROX and Replica datasets, demonstrate its cutting-edge performance while validating its effectiveness and generalization ability.

Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy

With the increasing scale and complexity of graph data, node attributes are also becoming richer and more complex, spanning multi-view/modal features and informative text. Classic GNNs equipped with shallow encoders are no longer sufficient to handle such data independently, making model collaboration across different architectures an inevitable trend. 
Recently, the integration of Large Language Models (LLMs) and GNNs has attracted significant attention. However, the inherent disparity between these models remains a key challenge. 
Promising solutions have considered fine-tuning Small Language Models (SLMs) to bridge the gap between GNNs and frozen LLMs. 
Yet, this introduces another problem: large and small models bring complementary views of knowledge, but how to effectively integrate them and allow mutual refinement remains a significant research gap.
To address these challenges, we introduce COLA, a collaborative large–small model framework that enables seamless cooperation among semantic LLMs, task-specific fine-tuned SLMs, and structure-aware GNNs.
COLA features a unique Consensus–Complement Coordination (CoCo) mechanism, wherein the Mixture-of-Coordinators (MoC) architecturally aligns the LLM and SLM. Built upon MoC, a flexible graph-knowledge infusion strategy encourages the joint alignment and graph knowledge learning of textual representations. Extensive evaluations across nine diverse datasets demonstrate that COLA consistently achieves state-of-the-art performance, validating the effectiveness and generality of our collaborative paradigm.

Unifying Multi-View Knowledge for Graph Learning via Model Collaboration

Short texts present significant challenges for clustering due to semantic sparsity, limited contextual information, and ambiguous category boundaries. While recent studies incorporating contrastive learning and cluster structure optimization have improved performance, their reliance on augmented samples often introduces noise and weakens the capacity of pretrained language models to capture fine-grained semantics. To address these issues, we propose a Graph-augmented and Over-smoothing-resistant Contrastive Clustering framework (GOCC). Specifically, GOCC constructs sentence-level and cluster-level graphs to capture local semantic similarity and global structural patterns, incorporating these signals into sentence representations to enhance representational quality and clustering suitability. Moreover, we introduce a contrastive mechanism based on intermediate layer representations within graph-augmented contrastive learning to alleviate semantic over-smoothing caused by deep networks. Finally, a target-distribution-driven clustering optimization strategy is employed to leverage high-confidence samples in guiding cluster assignments. Experimental results on several benchmark short text datasets demonstrate that GOCC consistently outperforms state-of-the-art methods in terms of clustering accuracy and normalized mutual information.

Graph-augmented and Over-smoothing-resistant Contrastive Clustering for Short Text

Large language models (LLMs), such as ChatGPT, have achieved remarkable success across a wide range of fields. However, their trustworthiness remains a significant concern, as they are still susceptible to jailbreak attacks aimed at eliciting inappropriate or harmful responses. 
Most existing jailbreak attacks, nevertheless, mainly operate at the natural language level and rely on a single attack strategy, limiting their effectiveness in comprehensively assessing LLM robustness. In this paper, we propose Equacode, a novel multi-strategy jailbreak approach for large language models via equation-solving and code completion. This approach transforms malicious intent into a mathematical problem and then requires the LLM to solve it using code, leveraging the complexity of cross-domain tasks to divert the model's focus toward task completion rather than safety constraints. Experimental results show that Equacode achieves an average success rate of 91.19\% on the GPT series and 97.62\% across 5 state-of-the-art LLMs, all with only a single query. Further, ablation experiments demonstrate that EquaCode outperforms either the mathematical equation module or the code module alone. This suggests a strong synergistic effect, thereby demonstrating that multi-strategy approach yields results greater than the sum of its parts. EquaCode is open-source and available at this repository: \url{https://github.com/lzzzr123/Equacode}.

Downloads

Next from AAAI 2026

Content Diversity-guided Ambiguity Mitigation for Open-Set Noisy Label Learning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Content Diversity-guided Ambiguity Mitigation for Open-Set Noisy Label Learning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads