Singapore

Images are generally represented by pixel intensities or color values, which are usually used as direct inputs for learning. This study innovatively proposes a geometric image representation method and refreshes the general learning model (e.g., autoencoder) in the diffeomorphic space. Based on the theory of geometric optimal transport and quasiconformal mapping, we equivalently transform the intensity representation into a shape representation. The image space becomes a diffeomorphic space, where any image can be uniquely represented as a Beltrami coefficient function defined on a uniform grid reference, and vice versa. This innovative geometric image representation (G-IR) captures the fine-grained structure inherent in the entire image, which is different from the traditional feature extraction that focuses on the internal geometric objects of the image (such as boundaries and axes). The diffeomorphic property preserves structure in the generation process, which is very necessary in the field of real physics. It can be assembled into existing pipelines as a plug-in, providing structure-preserving properties for the entire framework. Applications in classical problems such as image reconstruction and interpolation have verified the efficiency, efficacy and applicability of G-IR, and show its performance that is superior to common image pixel-level appearance representations.

AAAI 2026

G-IR: Geometric Image Representation for Learning

ml: representation learning，cv: representation learning for vision，ml: deep generative models & autoencoders

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large Language Models (LLMs) have recently been integrated into Graph Neural Networks (GNNs) to improve learning on text-attributed graphs (TAGs), combining semantic-rich node features with structural information. However, this integration introduces dual vulnerabilities: GNNs are sensitive to structural perturbations, while LLM-derived features are vulnerable to prompt injection and adversarial phrasing. While existing adversarial attacks largely perturb structure or text independently, we find that uni-modal attacks cause only modest degradation in LLM-enhanced GNNs. Moreover, many existing attacks assume unrealistic capabilities, such as white-box access or direct modification of graph data.

To address these gaps, we propose GraphTextack, the first black-box, multi-modal node injection attack designed specifically for LLM-enhanced GNNs. GraphTextack injects nodes with carefully crafted structure and semantics to degrade model performance, operating under a realistic threat model without relying on model internals or surrogate models. To navigate the combinatorial, non-differentiable search space of connectivity and feature assignments, GraphTextack introduces a novel evolutionary optimization framework with a multi-objective fitness function that balances local prediction disruption and global graph influence. Extensive experiments on multiple benchmark datasets and state-of-the-art LLM-enhanced GNN models show that GraphTextack significantly outperforms strong baselines, achieving higher drop in accuracy and lower runtime on average.

GraphTextack: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs

Most commonsense reasoning models overlook the influence of personality traits, limiting their effectiveness in personalized systems such as dialogue generation. To address this limitation, we introduced the Personality-aware Commonsense Knowledge Graph (PCoKG), a structured dataset comprising $521,316$ quadruples. We began by employing three evaluators to score and filter events from the ATOMIC dataset, selecting those that are likely to elicit diverse reasoning patterns across different personality types. For knowledge graph construction, we leveraged the role-playing capabilities of large language models (LLMs) to perform reasoning tasks. To enhance the quality of the generated knowledge, we incorporated a debate mechanism consisting of a supporter, an opposer, and a judge, which iteratively refined the outputs through feedback loops. We evaluated the dataset from multiple perspectives and conducted fine-tuning and ablation experiments using multiple LLM backbones to assess PCoKG's robustness and the effectiveness of its construction pipeline. Our LoRA-based fine-tuning results indicated a positive correlation between model performance and the parameter scale of the base models. Finally, we applied PCoKG to persona-based dialogue generation, where it demonstrated improved consistency between generated responses and reference outputs. This work bridges the gap between commonsense reasoning and individual cognitive differences, enabling more personalized and context-aware AI systems.

PCoKG: Personality-aware Commonsense Reasoning with Debate

Traffic prediction plays an important role in urban management. However, existing methods rely on centralized traffic data, which may raise privacy concerns. Federated traffic prediction offers a promising solution for clients (e.g., traffic management administrations) in different regions to collaboratively train models in a distributed manner without exposing private data. Nonetheless, data isolation inherently breaks the correlations between nodes (i.e., traffic sensors collecting data) from different regions, which leads to the missing inter-client dependency. Consequently, current works either fail to capture the missing inter-client dependency or compromise data privacy to recover the inter-client dependency. To address this issue, we propose a novel Federated method which recovers the inter-client dependency with HIdden global componeNTs (FedHINT). We find that the traffic data from different local regions actually contain hidden global components that reflect cross-regional traffic changes. Therefore, our FedHINT aims to extract hidden global components from each client to generate proxy nodes that represent global information, which are then utilized to recover the inter-client dependency. To be specific, we employ an attention module, which is guided by the shared global queries to capture hidden global components from local traffic data, to generate proxy nodes. Subsequently, our FedHINT adaptively learns the correlations between proxy nodes and local nodes through a global encoder. During this process, the global information in proxy nodes compensate for the loss of information from cross-regional nodes, which thereby recovers the missing inter-client dependency. Intensive experiments on multiple datasets demonstrate that our FedHINT significantly outperforms the state-of-the-art methods, with an average decrease of 3.73 and 4.81 on MAE and RMSE, respectively.

Inter-Client Dependency Recovery with Hidden Global Components for Federated Traffic Prediction

Many methods have demonstrated promising results in zero-shot anomaly detection (ZSAD) by incorporating prompt learning (PL) to fine-tune Vision-Language Models. However, the prompt learners proposed in recent studies remain relatively simple, such as static learnable textual and/or visual prompts. Relying solely on the current PL paradigm restricts the ability to generate more precise prompts, thereby hindering improved ZSAD performance, particularly in detecting nuanced anomalies. To address this limitation, this paper proposes a high-order-aware prompt learning framework, termed HPL, which facilitates the detection of unseen anomalies through prompts fortified by hypergraph. Specifically, HPL models high-order correlations among patches through a dynamically constructed hypergraph structure. Then we propose to hypergraph semantic convolution to capture potential collaborative information (generic semantic information). Meanwhile, HPL introduces a Mixture-of-Experts prompt learner (MoEPLer), where the specialized experts within MoEPLer can generate multiple distinct prompts based on the modeled high-order correlations. Then, the final elaborate and dynamic prompts can be generated by synthetically considering each expert's prompt. This enables a comprehensive understanding of potential anomalous patterns, thereby facilitating ZSAD performance. Large-scale experiments conducted on 12 datasets, spanning natural, industrial, and medical domains, demonstrate that the validity of proposed HPL. The code will be made available upon acceptance.

Exploring High-order-aware Prompt Learning for Zero-shot Anomaly Detection

Training of large-scale models is both computationally intensive and often constrained by the availability of labeled data. Model merging offers a compelling alternative by directly integrating the weights of multiple source models without requiring additional data or extensive training. However, conventional model merging techniques, such as parameter averaging, suffer from unintentional merging of non-generalizable features, especially in non-IID scenarios where source models exhibit significant weight disparities. Alternatively, the model ensembling technique typically provides more stable and superior performance that aggregates multiple models by averaging outputs. However, it incurs higher inference costs and increased storage requirements. Previous studies showed the similarities between model merging and ensembling experimentally, but there is a lack of theoretical evidence and evaluation metrics. To bridge this gap, we introduce M-loss, a novel evaluation metric that quantifies the compatibility of merging source models using only unlabeled data. By measuring the discrepancy between parameter averaging and model ensembling at both layer and node levels, M-loss facilitates more effective merging strategies. Specifically, M-loss serves as a quantitative criterion showing the theoretical feasibility of model merging, and a guide for parameter significance in model pruning strategies. Our theoretical analysis and empirical evaluations demonstrate that incorporating M-loss into the merging process significantly improves the alignment between merged models and model ensembling, offering a scalable and efficient framework for accurate model consolidation.

M-Loss: Quantifying Model Merging Compatibility with Limited Unlabeled Data

Graph contrastive learning (GCL) has demonstrated great promise for learning generalizable graph representations from unlabeled data. However, conventional GCL approaches face two critical limitations: (1) the restricted expressive capacity of multilayer perceptron (MLP) based encoders, and (2) suboptimal negative samples that either from random augmentations—failing to provide effective 'hard negatives'—or generated hard negatives without addressing the semantic distinctions crucial for discriminating graph data. To this end, we propose Khan-GCL, a novel framework that integrates the Kolmogorov–Arnold Network (KAN) into the GCL encoder architecture, substantially enhancing its representational capacity. Furthermore, we exploit the rich information embedded within KAN coefficient parameters to develop two novel critical feature identification techniques that enable the generation of semantically meaningful hard negative samples for each graph representation. These strategically constructed hard negatives guide the encoder to learn more discriminative features by emphasizing critical semantic differences between graphs. Extensive experiments demonstrate that our approach achieves state-of-the-art performance compared to existing GCL methods across a variety of datasets and tasks.

Khan-GCL: Kolmogorov–Arnold Network Based Graph Contrastive Learning with Hard Negatives

Unsupervised industrial anomaly detection requires accurately identifying defects without labeled data. Traditional autoencoder-based methods often struggle with incomplete anomaly suppression and loss of fine details, as their single-pass decoding fails to effectively handle anomalies with varying severity and scale. We propose a recursive architecture for autoencoder (RcAE), which performs reconstruction iteratively to progressively suppress anomalies while refining normal structures. Unlike traditional single-pass models, this recursive design naturally produces a sequence of reconstructions, progressively exposing suppressed abnormal patterns. To leverage this reconstruction dynamics, we introduce a Cross Recursion Detection (CRD) module that tracks inconsistencies across recursion steps, enhancing detection of both subtle and large-scale anomalies. Additionally, we incorporate a Detail Preservation Network (DPN) to recover high-frequency textures typically lost during reconstruction. Extensive experiments demonstrate that our method significantly outperforms existing non-diffusion methods, and achieves performance on par with recent diffusion models with only 10% of their parameters and offering substantially faster inference. These results highlight the practicality and efficiency of our approach for real-world applications.

RcAE: Recursive Reconstruction Framework for Unsupervised Industrial Anomaly Detection

Schrödinger Bridge-based diffusion models have demonstrated promising performance in signal denoising. However, since ground truth signals are unavailable during the sampling process, neural networks must be employed to learn the mapping, which breaks the theoretical coupling between diffusion and sampling processes. This paper reveals a critical inconsistency between the theoretical diffusion path and the learned sampling trajectory across different frequency bands. This diffusion-sampling inconsistency directly undermines denoising effectiveness. To address this limitation, we propose the Frequency-Dependent Scheduled Schrödinger Bridge (FDSSB), which leverages power spectral density to adaptively schedule diffusion processes across frequencies. This mechanism assigns asynchronous diffusion schedules to different frequency components, correcting the diffusion schedule to better match the sampling process. As a result, FDSSB effectively mitigates the mismatch and enhances the consistency between diffusion and sampling processes. Extensive experiments demonstrate that FDSSB achieves state-of-the-art performance, with an average scale-invariant signal-to-noise ratio improvement of 7.9066 dB over competitive approaches.

Frequency-Dependent Scheduled Schrödinger Bridge for Underwater Acoustic Signal Denoising

Instruction-following is a critical capability of Large Language Models (LLMs). While existing works primarily focus on assessing how well LLMs adhere to user instructions, they often overlook scenarios where instructions contain conflicting constraints—a common occurrence in complex prompts. The behavior of LLMs under such conditions remains under-explored. To bridge this gap, we introduce ConInstruct, a benchmark specifically designed to assess LLMs' ability to detect and resolve conflicts within user instructions. Using this dataset, we evaluate LLMs' conflict detection performance and analyze their conflict resolution behavior. Our experiments reveal two key findings: (1) Proprietary LLMs exhibit strong conflict detection capabilities, with Claude-3.5-Sonnet and GPT-4o achieving average F1-scores of 86.6\% and 84.9\%, ranking first and third, respectively. (2) Despite their strong conflict detection abilities, LLMs rarely explicitly notify users about the conflicts or request clarification when faced with conflicting constraints. These results underscore a critical shortcoming in current LLMs and highlight an important area for future improvement when designing instruction-following LLMs.

ConInstruct: Evaluating Large Language Models on Conflict Detection and Resolution in Instructions

We propose a mesh-free policy iteration framework based on physics-informed neural networks (PINNs) for solving entropy-regularized stochastic control problems. The method iteratively alternates between soft policy evaluation and improvement using automatic differentiation and neural approximation, without relying on spatial discretization. We present a detailed $L^2$ error analysis that decomposes the total approximation error into three sources: iteration error, policy network error, and PDE residual error. The proposed algorithm is validated with a range of challenging control tasks, including high-dimensional linear-quadratic regulation in 5D and 10D, as well as nonlinear systems such as pendulum and cartpole problems. Numerical results confirm the scalability, accuracy, and robustness of our approach across both linear and nonlinear benchmarks.

Content not yet available

Next from AAAI 2026

GraphTextack: A Realistic Black-Box Node Injection Attack on LLM-Enhanced GNNs

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES