Singapore

In recent years, human-AI cognitive consistency has emerged as a crucial perspective for evaluating the perceptual quality and interpretability of AIGC (Artificial Intelligence Generated Content). This paper proposes a biologically inspired saliency prediction framework that models six core regions of the human visual system—namely V1, V2, V4, MT, LIP, and FEF—using liquid neurons to capture the dynamic saliency features aligned with human gaze behavior. To enable effective alignment between AIGC models and human cognitive mechanisms, we introduce a cross-domain dual-teacher distillation strategy and construct a large-scale multimodal dataset comprising natural images, eye-tracking data, AIGC-generated images, and their corresponding cross-attention maps. Furthermore, we propose HAMCI (Human-AI Mutual Cognitive Index), a novel metric designed to quantitatively assess the spatial and semantic alignment between predicted saliency maps and model attention distributions. The proposed method demonstrates promising performance across various saliency prediction and cognitive alignment tasks, with results comparable to or surpassing recent state-of-the-art methods in several benchmarks. The code and dataset will be released upon acceptance to facilitate future research on cognitively aligned AIGC evaluation.

AAAI 2026

A Brain-Inspired Saliency Prediction Framework for Human-AI Cognitive Consistency in AIGC Content via Multi-Region Liquid Neurons

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Humans easily apply learned skills to different situations,
a flexibility that AI systems still struggle to achieve.
Current AI models are often confined to their training
setup, leading to isolated developments and a narrow scope
of application. This largely restricts the creation of
flexible and general-purpose AI systems. Deep Model Reuse
presents a novel solution. Imagine tapping into a vast
library of pre-trained models, each a master in its
specialized domain. Our approach re-purposes these existing
models, extracting and transforming their knowledge for the
development of novel AI systems. In this talk, we explore
the essential techniques of this transformative process,
highlighting the shift towards versatile and efficient AI
that mirrors human cognition's adaptability.

We introduce three foundational pillars of deep model
reuse: understanding, composing, and refining. First, we
investigate the internal behavior of neural networks—using
language models as explainers and analyzing the
representation space of diffusion models—to uncover how and
what models have learned. Second, we develop methods to
transform and compose models through weight mapping,
knowledge distillation, and model dissection, enabling the
creation of new capabilities by reassembling existing
expertise. Third, we enhance reliability by editing model
behaviors and mitigating biases, ensuring robustness in
complex and dynamic environments.

We demonstrate the power of this paradigm in generative AI,
where model reuse leads to efficient diffusion models free
from spectral bias, improved compositional understanding in
video generation, and the repurposing of 2D/3D models for
3D/4D content creation. By shifting from training from
scratch to intelligently reusing and recombining models, we
move closer to adaptive, scalable, and human-like AI
systems—ushering in a new era of sustainable and general
intelligence.

Deep Model Reuse: Paving the Way for Efficient and Generalizable AI Systems

Indian poetry, known for its linguistic complexity and deep cultural resonance, has a rich and varied heritage spanning thousands of years. However, its layered meanings, cultural allusions, and sophisticated grammatical constructions often pose challenges for comprehension, especially for non-native speakers or readers unfamiliar with its context and language. Despite its cultural significance, existing works on poetry have largely overlooked Indian language poems. In this paper, we propose the $\textbf{Translation and Image Generation (TAI)}$ framework, leveraging Large Language Models (LLMs) and Latent Diffusion Models through appropriate prompt tuning. Our framework supports the United Nations Sustainable Development Goals of Quality Education (SDG 4) and Reduced Inequalities (SDG 10), by enhancing the accessibility of culturally rich Indian-language poetry to a global audience. It includes (1) a translation module that uses an Odds Ratio Preference Alignment Algorithm to accurately translate morphologically rich poetry into English; (2) an image generation module that employs a semantic graph to capture tokens, dependencies, and semantic relationships between metaphors and their meanings, to create visually meaningful representations of Indian poems. Our comprehensive experimental evaluation, including both human and quantitative assessments, demonstrates the superiority of $\textit{TAI}$ Diffusion in poem image generation tasks, outperforming strong baselines. To further address the scarcity of resources for Indian-language poetry, we introduce the $\textbf{Morphologically Rich Indian Language Poems \textit{MorphoVerse} Dataset}$, comprising 1,570 poems across 21 low-resource Indian languages. By addressing the gap in poetry translation and visual comprehension, this work aims to broaden accessibility and enrich the reader’s experience.

Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation

Accurate medical time series (MedTS) classification is essential for effective clinical diagnosis, yet remains challenging due to complex multi-channel temporal dependencies, information redundancy, and label scarcity.
While transformer-based models have shown promise in time series analysis, most are designed for forecasting tasks and fail to fully exploit the unique characteristics of MedTS.
In this paper, we introduce MedSpaformer, a transformer-based framework tailored for MedTS classification. It incorporates a sparse token-based dual-attention mechanism that enables global context modeling and token sparsification, allowing dynamic feature refinement by focusing on informative tokens while reducing redundancy.
This mechanism is integrated into a multi-granularity cross-channel encoding scheme to capture intra- and inter-granularity temporal dependencies and inter-channel correlations, enabling progressive refinement of task-relevant patterns in medical signals.
The sparsification design allows our model to flexibly accommodate inputs with variable lengths and channel dimensions. We also introduce an adaptive label encoder to extract label semantics and address cross-dataset label space misalignment. Together, these components enhance the model’s transferability across heterogeneous medical datasets, which helps alleviate the challenge of label scarcity.
Our model outperforms 13 baselines across 7 medical datasets under supervised learning. It also excels in few-shot learning and demonstrates zero-shot capability in both in-domain and cross-domain diagnostics.
These results highlight MedSpaformer's robustness and its potential as a unified solution for MedTS classification across diverse settings.

MedSpaformer: A Transferable Transformer with Multi-Granularity Token Sparsification for Medical Time Series Classification

Fine-tuning plays an essential role in improving the performance of large language models (LLMs) on specific tasks. A central challenge lies in designing data-efficient strategy to achieve better fine-tuning performance. Curriculum learning, which organizes data from easy to hard, has become a widely adopted technique in LLMs training. However, existing methods for curriculum learning focus only on the difficulty of samples, while neglecting their contribution to improving model performance, making them vulnerable when applied to fine-tuning LLMs. To address this, we propose Difficulty-Utility Curriculum Learning (DUCL), a curriculum learning framework that jointly considers difficulty and utility. DUCL introduces a novel scoring method, Difficulty-Utility Evaluation (DUE), and a soft scheduling strategy called Window Ordering, which together promote efficient and effective fine-tuning. Our method not only improves convergence and final performance with negligible computational overhead, but is also broadly applicable across a wide range of tasks, making it a practical and scalable solution for LLMs fine-tuning.

Difficulty Is Not Enough: Curriculum Learning for LLMs Fine-tuning Must Consider Utility

Parametric multi-objective optimization (PMO) addresses the challenge of solving an infinite family of multi-objective optimization problems, where optimal solutions must adapt to varying parameters. Traditional methods require re-execution for each parameter configuration, leading to prohibitive costs when objective evaluations are computationally expensive. To address this issue, we propose Parametric Pareto Set Learning with multi-objective Bayesian Optimization (PPSL-MOBO), a novel framework that learns a unified mapping from both preferences and parameters to Pareto-optimal solutions. PPSL-MOBO leverages a hypernetwork with Low-Rank Adaptation (LoRA) to efficiently capture parametric variations, while integrating Gaussian process surrogates and hypervolume-based acquisition to minimize expensive function evaluations. We demonstrate PPSL-MOBO's effectiveness on two challenging applications: multi-objective optimization with shared components, where certain design variables must be identical across solution families due to modular constraints, and dynamic multi-objective optimization, where objectives evolve over time. Unlike existing methods that cannot directly solve PMO problems in a unified manner, PPSL-MOBO learns a single model that generalizes across the entire parameter space. By enabling instant inference of Pareto sets for new parameter values without retraining, PPSL-BO provides an efficient solution for expensive PMO problems.

Parametric Pareto Set Learning for Expensive Multi-Objective Optimization

The widespread application of Large Language Models (LLMs) has motivated a growing interest in their capacity for processing dynamic graphs. Temporal motifs, as an elementary unit and an important local property of dynamic graphs which can directly reflect anomalies and unique phenomena, are essential for understanding their evolutionary dynamics and structural features. However, leveraging LLMs for temporal motif analysis on dynamic graphs remains relatively unexplored. In this paper, we systematically study LLM performance on temporal motif-related tasks. Specifically, we propose a comprehensive benchmark, LLMTM (Large Language Models in Temporal Motifs), which includes six tailored tasks across nine temporal motif types. We then conduct extensive experiments to analyze the impacts of different prompting techniques and LLMs on model performance. Informed by our benchmark findings, we develop a tool-augmented LLM agent that leverages meticulously engineered prompts to solve these tasks with high accuracy. Nevertheless, the high accuracy of the agent incurs a substantial cost. To address this trade-off, we propose a simple yet effective structure-aware dispatcher that considers both the dynamic graph's structural properties and the LLM's cognitive load to intelligently dispatch queries between the standard LLM prompting and the more powerful agent. Our experiments demonstrate that the structure-aware dispatcher effectively maintains high accuracy while reducing cost. Our data and code are publicly available on https://anonymous.4open.science/r/ghjfvhghjvk654165.

LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs

Transmitting and receiving electromagnetic wave signals reflected back to the ground can detect the structure of sub surface defects. However, the imaging process of ground- penetrating radar (GPR) is highly susceptible to interference from complex underground environments, leading to nonlinear attenuation and noise. This makes it challenging to directly localize and identify defect types from raw reflected radar waveform images. Currently, mainstream methods of manual radar signal gain and filtering heavily rely on expert experience, while common end-to-end generative models are typically designed for optical images. This paper proposes a defect-guided Multi-window Gabor Transform Network (MGT-Net) for GPR B-Scan image reconstruction which achieves automatic gain and defect enhancement of raw GPR signals. Firstly, a Multi-window Gabor Transform Module (MGTM) is designed to effectively represent and extract spatial-frequency features of defects at different locations and of various types. Secondly, a defect guidance network (DG-Net) is constructed to accurately direct the reconstruction of defect areas and enhance the saliency and discriminability of defect features. Additionally, we construct a large-scale GPR B-Scan image dataset (GRD) containing 41,613 images across 7 defect categories. Experimental results show the superior performance of MGT-Net, achieving state-of-the-art (SOTA) SSIM of 81.72% ± 3.5% and PSNR of 30.50 ± 0.442.

Multi-Window Gabor Transform Network for Ground Penetrating Radar B-Scan Image Reconstruction

We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a target state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and grid density, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80x80 grids.

Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents

The \textsc{Ride-Sharing Assignment Problem} (AAAI 2018) is a fundamental problem in intelligent transportation systems, urban mobility, and algorithmic decision-making. Given a set of $m$ vehicles with initial locations and $n$ requests ($n \leq mk$), each with a specified origin and destination, the goal is to assign at most $k$ requests to each vehicle and compute corresponding routes that minimize the total travel distance. The algorithmic approach depends on whether $n = mk$ or $n < mk$. In this paper, we present algorithms with provable approximation guarantees for both cases. When $n = mk$, we design a $\min\{\mathcal{O}(\sqrt{k}), \mathcal{O}(\sqrt{\frac{n}{k}})\}$-approximation algorithm, whereas previously the ratio $\mathcal{O}(\sqrt{k})$ was only proved for $k$ being a power of 2. When $n < mk$, we achieve an approximation ratio of $\mathcal{O}(\sqrt{k} \log \max\{n, m\})$, breaking the natural $\mathcal{O}(k)$ barrier. We also conduct experiments to evaluate the empirical performance of our algorithms. The results show that our solutions consistently outperform those produced by the previous existing algorithm.

Improved Algorithms for Trip-Vehicle Assignment in Ride-Sharing

The ability to self-revise is critical for AI agents. To maintain trust and foster positive perceptions, AI systems must correct their mistakes and adapt to users’ changing needs. We present a metacognitive architecture for self-revision in SAMI, an AI social agent deployed in Georgia Tech’s OMSCS program. Over the past ten semesters, SAMI has facilitated social connections for more than 11,000 students. Real-world deployments revealed frequent requests from students to revise the knowledge database, either to correct errors or to update their information. To address this need, we present a self-revision architecture that integrates Knowledge-Based AI (KBAI) and Generative AI (GenAI). The architecture (1) localizes the task requiring revision by introspecting on its self-model, (2) updates the knowledge database, and (3) communicates the revision process back to the user. We evaluate the framework using feedback cases derived from real student data and observed revision needs. This work introduces a novel metacognitive approach to improving explainability through the integration of KBAI and GenAI, with a clear path toward real-world deployment.

Downloads

Next from AAAI 2026

Deep Model Reuse: Paving the Way for Efficient and Generalizable AI Systems

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES