Singapore

Prompt Tuning (PT) is a widely used strategy for adapting pre-trained Vision-Language Models (VLMs) to various downstream tasks. Conventional PT methods evaluate performance separately on known (base) and unknown (new) classes. However, in real-world scenarios, models often encounter inputs without prior knowledge of their class domain. This challenge has motivated the development of Open-world Prompt Tuning (OPT), which requires models to first determine whether a sample belongs to base or new classes and then classify it accordingly. In this work, we carefully review existing OPT methods and identify three key limitations: (L1) incomplete evaluation metrics, (L2) time-consuming and memory-intensive OOD detection methods, and (L3) insufficiently comprehensive optimization strategies. To address these issues, we first tackle L1 by proposing two novel metrics to explicitly evaluate adaptability and generalization under the OPT setting, forming a more comprehensive evaluation framework. For L2, we propose a training-free OOD detection method called Entropy-weighted Rank-normalized Fusion (ERF), which first applies rank normalization to both the maximum and the sum of base-class probabilities, followed by an entropy-weighted fusion of the normalized values. For L3, we propose a plug-and-play Gated Dual-Merging (GDM) strategy to strengthen the classifier’s capability. GDM performs selective merging at the weight level based on an adaptive criterion and combines fine-tuned and LLM-boosted logits at the output level. Extensive experiments on three PT baselines across 11 datasets demonstrate the effectiveness of our proposed ERF and GDM.

AAAI 2026

Rethinking Open-world Prompt Tuning: A Systematic Framework for Evaluation and Optimization

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Vision transformers (ViTs) have recently been widely applied to 3D point cloud understanding, with masked autoencoding as the predominant pre-training paradigm. However, the challenge of learning dense and informative semantic features from point clouds via standard ViTs remains underexplored. We propose MaskClu, a novel unsupervised pre-training method for ViTs on 3D point clouds that integrates masked point modeling with clustering-based learning. MaskClu is designed to reconstruct both cluster assignments and cluster centers from masked point clouds, thus encouraging the model to capture dense semantic information. 
Additionally, we introduce a global contrastive learning mechanism that enhances instance-level feature learning by contrasting different masked views of the same point cloud. By jointly optimizing these complementary objectives, i.e., dense semantic reconstruction and instance-level contrastive learning. MaskClu enables ViTs to learn richer and more semantically meaningful representations from 3D point clouds. We validate the effectiveness of our method via multiple 3D tasks, including part segmentation, semantic segmentation, object detection, and classification, where MaskClusets achieves new competitive results. The code and models will be released.

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training

Driven by the wave of large language models, Video-Language Models (VLMs) have become a significant yet challenging technology to bridge the gap between videos and texts. Although previous VLM works have made significant progress, almost all of them implicitly assume that all the texts are predefined by the specific template. In real-world applications, such a strict assumption is impossible to satisfy since 1) predefining all the texts is extremely time-consuming and labor-intensive. 2) these predefined text inputs are too restrictive and user-unfriendly, limiting their applications. It is observed that given a video input, texts with similar semantics but different templates lead to various performances. To this end, in this paper, we propose a novel plug-and-play framework for various VLM-based methods to fully bridge videos and texts. Specifically, we first generate positive and negative texts from the original ones to target specific text components. Then, we propose an attribute-based text reasoning strategy to mine fine-grained textual semantics of generated texts. Finally, we utilize videos as guidance to conduct cross-modal bridging by designing a self-weighted loss. Extensive experiments show that the proposed method can serve as the plug-and-play module to effectively improve the performance of state-of-the-art VLMs.

Rethinking Video-Language Model from the Language Input Perspective

Circuit graph discovery has emerged as a fundamental approach to elucidating the skill mechanistic of language models. Despite the output faithfulness of circuit graphs, they suffer from atomic ablation, which causes the loss of causal dependencies between connected components. In addition, their discovery process, designed to preserve output faithfulness, inadvertently captures extraneous effects other than an isolated target skill. To alleviate these challenges, we introduce skill paths, which offer a more refined and compact representation by isolating individual skills within a linear chain of components. To enable skill path extracting from circuit graphs, we propose a three-step framework, consisting of decomposition, pruning, and post-hoc causal mediation. In particular, we offer a complete linear decomposition of the transformer model which leads to a disentangled computation graph. After pruning, we further adopt causal analysis techniques, including counterfactuals and interventions, to extract the final skill paths from the circuit graph. To underscore the significance of skill paths, we investigate three generic language skills—Previous Token Skill, Induction Skill, and In-Context Learning Skill—using our framework. Experiments support two crucial properties of these skills, namely stratification and inclusiveness.

Skill Path: Unveiling Language Skills from Circuit Graphs

Graph-level clustering (GLC), which aims to group entire graphs according to their structural and attribute-based similarities, represents a fundamental yet challenging task in various practical applications. Existing GLC methods primarily fall into two main paradigms: 1) deep graph clustering approaches based on Graph Neural Networks (GNNs), and 2) kernel-based methods that utilize predefined kernels to perform fine-grained structural comparison for clustering. However, GNN-based methods typically learn graph-level representations by aggregating node embeddings through pooling operations, which inevitably leads to substantial information loss and suboptimal clustering performance. In contrast, kernel methods, despite their theoretical expressiveness, suffer from prohibitive computational costs that hinder their scalability to large-scale settings. To solve these issues, we propose a novel graph learning framework named **A**nchor-driven **N**yström for Deep **G**raph-Level **C**lustering (**ANGC**), which computes graph similarity via kernel methods while retaining the scalability of GNNs. Specifically, we first employ GNNs to encode individual graphs into sets of node embeddings. Rather than relying on pooling operations, we compute graph similarities in a kernel space constructed from these embeddings. To enhance both scalability and representational power, we introduce learnable graph Nyström anchors, which support end-to-end optimization and significantly accelerate kernel computations. To further improve the discriminative capability of these anchors, we propose the concept of anchor response discrepancy, that is, the variation in a given anchor’s responses across different samples. By maximizing this discrepancy, the anchors are encouraged to strengthen inter-graph distinctions for better clustering. Extensive experiments demonstrate the effectiveness and superiority of ANGC over existing state-of-the-art methods.

Anchor-Driven Nyström for Deep Graph-Level Clustering

Cellular reprogramming, the artificial transformation of one cell type into another, has been attracting increasing research attention due to its therapeutic potential for complex diseases. However, identifying effective reprogramming strategies through classical wet-lab experiments is hindered by lengthy time commitments and high costs.

Although computational methods have been proposed to address this challenge, exact state-of-the-art techniques suffer from limited scalability owing to the notorious state space explosion problem. To overcome this limitation, we explore the use of deep reinforcement learning (DRL) for controlling holistic Boolean network models of complex biological systems, such as gene regulatory and signalling pathway networks. We formulate a novel control problem for Boolean network models operating under the asynchronous update mode, specifically tailored to the context of cellular reprogramming. To solve it, we devise GATTACA – a DRL-based computational framework explicitly designed for scalability, capable of handling large and complex network models where exact methods fall short.

To facilitate scalability of our framework, we consider our previously introduced concept of a pseudo-attractor and improve the procedure for effective identification of pseudo-attractor states. We incorporate graph neural networks with graph convolution operations into the artificial neural network approximator of the DRL agent’s action-value function. The new architecture allows us to leverage the available knowledge on the structure of a biological system and to indirectly, yet effectively, encode the system’s modelled dynamics into a latent representation.

Experiments on several large-scale, real-world biological networks from the literature demonstrate the scalability and effectiveness of our approach.

The GATTACA Framework: Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks

Class-incremental learning (CIL) enables models to continuously learn from streaming data while mitigating catastrophic forgetting of prior knowledge. Our research reveals that the CIL performance of pre-trained models (PTMs) varies significantly across different datasets, a phenomenon underexplored in existing studies. Through visualization, we observe that flatter loss landscapes correlate with superior CIL performance. This insight motivates us to enhance PTMs' CIL capability by promoting loss landscapes' flatness. Initially, we propose independently optimizing multiple adapter branches to equip PTMs with diverse learnable parameters, thereby improving stability during parameter updates. However, given computational and memory constraints, the number of adapters a PTM can accommodate is limited. To address this, we introduce a training strategy with randomized adapter amalgamation (RAA), compelling the model to maintain low loss across a broader and more continuous parameter space, significantly enhancing flatness. Furthermore, we refine existing sharpness-aware minimization techniques to further optimize the loss landscapes. Our extensive experiments and visualization results validate the efficacy of the method, resulting in the state-of-the-art (SOTA) performance.

Random Amalgamation of Adapters for Flatter Loss Landscapes: Towards Class-Incremental Learning with Better Stability

Graph fraud detection (GFD) on transaction networks is crucial for safeguarding financial systems. However, due to the limited perspective of existing graph neural networks (GNNs) in the single transaction view, sophisticated fraudsters can disguise themselves to exhibit weak fraud signals, appearing as borderline fraudsters. To address this challenge, we propose **MH-LGC**, a multi-view hypergraph fraud detection model with large language model (LLM) guided contrastive learning. MH-LGC tackles two key limitations of existing GNN-based GFD methods: **(1)** Due to the local aggregation mechanism, existing methods struggle to capture high-order trading patterns among distant fraudsters. MH-LGC introduces two temporal hyper-views as complements to the transaction view and employs a **Temporal Hypergraph Attention Network (THAN)** to integrate the three views. **(2)** Most GFD methods overlook the rich semantic cues embedded in transaction data. Although some general graph learning studies have explore LLM integration, the high computational overhead and task-specific fine-tuning make them impractical for GFD tasks. MH-LGC introduces a semantic view through a fine-tuning-free **LLM-Guided Contrastive learning (LGC)**, adopting a novel paradigm for integrating GNN and LLM to reduce the computational overhead of LLM. Extensive experiments on three real-world datasets demonstrate that MH-LGC outperforms twelve state-of-the-art baselines, with AUC improvements ranging from 1.10\% to 5.70\%.

Targeting Borderline Fraudsters: Multi-View Hypergraph Fraud Detection with LLM-Guided Contrastive Learning

An autonomous agent deployed to operate over extended
horizons in uncertain environ- ments will encounter
situations for which it was not designed. A class of these
situations involves an invalidation of agent goals and
limited guidance in establishing a new set of goals to
pursue. An agent will benefit from some mechanism that will
allow it to pursue new goals under these circumstances such
that the goals are broadly useful in its environ- ment and
take advantage of its existing skills while aligning with
societal norms. We pro- pose augmenting a goal reasoning
agent, i.e., an agent that can deliberate on and
self-select its goals, with a motivation system that can be
used to both constrain and motivate agent behavior. A
human-like motivation system coupled with a goal-self
concordant selec- tion technique allows the approach to be
framed as an optimization problem in which the agent
selects goals that have high utility while simultaneously
in harmony with its motiva- tions. Over the agent’s
operational lifespan its motivation system adjusts
incrementally to more closely reflect the reality of its
goal reasoning and goal pursuit experiences. Experi- ments
performed with an ablation testing technique comparing the
average utility of goals achieved in the presence and
absence of a motivation system suggest that the motivated
version of the system leads to pursuing more useful goals
than the baseline.

La VIDA: towards a motivated goal reasoning agent

Assumption‐Based Argumentation (ABA) is a powerful structured argumentation formalism, but exact computation of extensions under stable semantics is intractable for large frameworks. We present the first Graph Neural Network (GNN) approach to approximate credulous acceptance in ABA. To leverage GNNs, we model ABA frameworks via a dependency graph representation encoding assumptions, claims and rules as nodes, with heterogeneous edge labels distinguishing support, derive and attack relations. We propose two GNN architectures—ABAGCN and ABAGAT—that stack residual heterogeneous convolution or attention layers, respectively, to learn node embeddings. Our models are trained on the ICCMA 2023 benchmark, augmented with synthetic ABAFs, with hyperparameters optimised via Bayesian search. Empirically, both ABAGCN and ABAGAT outperform a state‐of‐the‐art GNN baseline that we adapt from the abstract argumentation iterature, achieving a node‐level F1 score of up to 0.71 on the ICCMA instances. Finally, we develop a sound polynomial time extension‐reconstruction algorithm driven by our predictor: it reconstructs stable extensions with F1 above 0.85 on small ABAFs and maintains an F1 of about 0.58 on large frameworks. Our work opens new avenues for scalable approximate reasoning in structured argumentation.

Heterogeneous Graph Neural Networks for Assumption-Based Argumentation

As Large Language Models (LLMs) are increasingly integrated in diverse applications, obtaining reliable measures of their predictive uncertainty has become critically important. A precise distinction between aleatoric uncertainty, arising from inherent ambiguities within input data, and epistemic uncertainty, originating exclusively from model limitations, is essential to effectively address each uncertainty source.
In this paper, we introduce Spectral Uncertainty, a novel approach to quantifying and decomposing uncertainties in LLMs. Leveraging the Von Neumann entropy from quantum information theory, Spectral Uncertainty provides a rigorous theoretical foundation for separating total uncertainty into distinct aleatoric and epistemic components. Unlike existing baseline methods, our approach incorporates a fine-grained representation of semantic similarity, enabling nuanced differentiation among various semantic interpretations in model responses. Empirical evaluations demonstrate that Spectral Uncertainty outperforms state-of-the-art methods in estimating both aleatoric and total uncertainty across diverse models and benchmark datasets.

Downloads

Next from AAAI 2026

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES