Singapore

Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc look ups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We introduce Next-Latent Prediction (NextLat), which extends standard next-token training with self-supervised predictions in the latent space. Specifically, NextLat trains a transformer to learn latent representations that are predictive of its next latent state given the next output token. Theoretically, we show that these latents provably converge to belief states, compressed information of the history necessary to predict the future. This simple auxiliary objective also injects a recurrent inductive bias into transformers, while leaving their architecture, parallel training, and inference unchanged. NextLat effectively encourages the transformer to form compact internal world models with its own belief states and transition dynamics—a crucial property absent in standard next-token prediction transformers. Empirically, across benchmarks targeting core sequence modeling competencies—world modeling, reasoning, planning, and language modeling—NextLat demonstrates significant gains over standard next-token training in downstream accuracy, representation compression, and lookahead planning. NextLat stands as a simple and efficient paradigm for shaping transformer representations toward stronger generalization.

AAAI 2026

Next-Latent Prediction Transformers Learn Compact World Models

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Post-training alignment is crucial for refining the reasoning capabilities of Large Language Models (LLMs). A dominant paradigm for this involves optimizing the model's policy using reinforcement learning, powered by techniques such as Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). The success of these methods, whether using an explicit reward model or optimizing directly on preference data, is critically dependent on the quality of the guiding signal. However, these signals are conventionally derived from task-specific outcomes, such as correctness in math or fluency in summarization. This approach often limits the model's ability to generalize its reasoning skills across diverse domains and can lead to reward hacking or model collapse. This paper challenges this outcome-based paradigm by introducing a conceptual framework, GRIT (Generalizable Reasoning via Intrinsic Traits). This novel framework aims to shift the emphasis from rewarding what the model answers to how it reasons. To accomplish this, we define a set of universal, task-agnostic traits of sound cognition inspired by human reasoning. These intrinsic traits are encoded as distinct reward components: (1) ensuring sequential logical coherence, (2) penalizing cyclic or redundant reasoning, (3) rewarding successful and integrated tool utilization, and (4) maintaining semantic alignment with the user's query. By fine-tuning an LLM to optimize for these intrinsic traits, we hypothesize that the model will develop a more robust and generalizable cognitive process.

Rethinking Reward Models! A Conceptual Framework for Enhancing LLM Reasoning through Intrinsic Traits

Online continual learning (OCL) methods adapt to changing environments without forgetting past knowledge. Similarly, online time series forecasting (OTSF) is a real-world problem where data evolve in time and success depends on both rapid adaptation and long-term memory. Indeed, time-varying and regime-switching forecasting models have been extensively studied, offering a strong justification for the use of OCL in these settings. Building on recent work that applies OCL to OTSF, this paper aims to strengthen the theoretical and practical connections between time series methods and OCL. First, we reframe neural network optimization as a parameter filtering problem, showing that natural gradient descent is a score-driven method and proving its information-theoretic optimality. Then, we show that using a Student’s t likelihood in addition to natural gradient induces a bounded update, which improves robustness to outliers. Finally, we introduce Natural Score-driven Replay (NatSR), which combines our robust optimizer with a replay buffer and a dynamic scale heuristic that improves fast adaptation at regime drifts. Empirical results demonstrate that NatSR achieves stronger forecasting performance than more complex state-of-the-art methods.

Online Continual Learning for Time Series: a Natural Score-driven Approach

Current Large Language Model (LLM) training paradigms, while effective at pattern matching and knowledge retrieval, often fall short of replicating the complex, adaptive and generalizable reasoning characteristic of human intelligence. We argue that this arises from a fundamental disconnect between the static, data-driven training of LLMs and the dynamic, lifelong learning process inherent to human cognitive development. To bridge this gap, we introduce Learn-Master-Teach Tuning (LMT) , a novel, end-to-end training framework that simulates the complete human `student-to-teacher' life-cycle. Our paradigm guides the model through a comprehensive developmental trajectory, from a beginner who is trying to adjust and learn a curriculum, to a seasoned educator capable of lifelong learning and knowledge synthesis. By situating learning within a comprehensive, cognitive-inspired framework, we explore two fundamental research questions: Can the deep simulation of a human persona, in this case, a developing academic, lead to a genuine replication of that persona's capabilities? And, does this student-teacher life-cycle offer a superior training paradigm for fostering robust and generalizable reasoning in LLMs? We present the complete LMT methodology and position it within the landscape of existing training paradigms, arguing that by emulating the human journey of learning, we can unlock new frontiers in artificial general intelligence such as lifelong learning and continual adaptation with history retention. Initial results using small language models in the challenging domain of differential equations already demonstrate significant improvements of about 33% over baselines indicating the transformative potential of the proposed human-pedagogy inspired framework.

Human-Pedagogy Inspired LLM Fine-Tuning Paradigm for Lifelong Leaning and Continual Adaptation

Given the recent tremendous success of large language models (LLMs), there has been an increasing trend in applying them to several down-stream tasks via fine-tuning (FT). However, there persist two significant challenges in FT - 1) curation of high-quality task-specific data and 2) expensive time-consuming model adaptation via gradient descent optimization. To mitigate these limitations, we leverage prior works in large-scale parameter generation for LLMs and propose Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) to open up a new paradigm of parameter-level meta learning, thereby serving a critical advance in the AI Scientist domain. SOLAR is an open-ended autonomous foundation-model-based agent which is capable of self-improvement by discovering and learning the rich meta-knowledge information present in large neural network weights, thereby enabling efficient adaptation of LLMs by parameter-level weight modifications to unseen domains as well. To achieve this, SOLAR has a multi-level reinforcement learning approach to train models for efficient understanding of parameter space and offers higher degree of flexibility compared to prior works on self-evolution by providing freedom to the model for choosing its own adaptation strategy thereby breaking the scaling solely through data regime. Early experiments demonstrate the superior performance of SOLAR in the common-sense reasoning domain as it outperforms task-specific FT by 23.6% on average and even some of the most recent works in parameter generation (by 10.4%), model merging (by 24.3%) and test-time learning (by 25.2%) as well out-of-domain tasks such as coding, social, logical and mathematical reasoning as well.

SOLAR : A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

Earth Observation (EO) systems generate continuous multimodal data streams at unprecedented scales. However, in this context, the literature offers solutions based on foundation models that operate within static training paradigms, which limit their effectiveness. Trained once on historical datasets and deployed without further learning, these models face critical issues when confronted with the dynamic nature of the environment, which includes emerging phenomena, sensor degradation, and evolving environmental patterns. This vision paper identifies three fundamental gaps: (1) the absence of memory-efficient anti-forgetting mechanisms at the foundation scale, (2) static cross-modal fusion strategies that cannot adapt to changing observational contexts, and (3) temporal representations that fail to distinguish cyclical patterns from distributional drift. Addressing these limitations requires convergence of foundation models, Continual Learning, and Streaming Machine Learning. This work envisions three key research directions: efficient model updating through selective replay and parameter regularization, explicit drift detection mechanisms, and context-dependent fusion strategies. These directions aim to enable EO systems that continuously learn from terabyte-per-day satellite streams while maintaining transfer learning capabilities and computational feasibility essential for operational deployment.

Towards Streaming Continual Learning for Earth Observation Multimodal Foundation Models

In streaming scenarios, models must learn continuously, detect concept drifts, and adapt without erasing previously acquired knowledge. However, existing research communities address these challenges in isolation. Continual Learning (CL) focuses on long-term retention and mitigating catastrophic forgetting, often without strict real-time constraints. Stream Learning (SL) emphasizes rapid, efficient adaptation to high-frequency data streams, but typically neglects forgetting. Recent efforts have tried to combine these paradigms, yet no clear algorithmic overlap exists. We argue that large in-context tabular models (LTMs) provide a natural bridge for Streaming Continual Learning (SCL). In our view, unbounded streams should be summarized on-the-fly into compact sketches that can be consumed by LTMs. This recovers the classical SL motivation of compressing massive streams with fixed-size guarantees, while simultaneously aligning with the experience-replay desiderata of CL. To clarify this bridge, we show how the SL and CL communities implicitly adopt a divide-to-conquer strategy to manage the tension between plasticity (performing well on the current distribution) and stability (retaining past knowledge), while also imposing a minimal complexity constraint that motivates diversification (avoiding redundancy in what is stored) and retrieval (re-prioritizing past information when needed). Within this perspective, we propose structuring SCL around two core principles of data selection for in-context learning: (1) distribution matching, which balances plasticity and stability, and (2) distribution compression, which controls memory size through diversification and retrieval mechanisms.

Bridging Streaming Continual Learning via In-Context Large Tabular Models

Modern industrial monitoring systems must detect anomalies in real time under evolving operating conditions and without reliance on labeled data. Traditional online anomaly detectors offer fast adaptation but struggle when normal behavior shifts or when rare anomalies are unintentionally learned as normal. On the other side, recently introduced foundation models for time series capture richer structure but are computationally expensive for continuous deployment. We propose a dual-learner anomaly detection framework that bridges a fast online learner based on Half-Space Trees with a time-series foundation model (MOMENT) acting as a background learner. A confidence-based routing mechanism determines, for each incoming instance, whether to trust the online model, defer to the foundation model, or combine both through confidence-weighted ensembling. The confidence estimation method is fully unsupervised and robust to drift, requiring no labels or sliding windows. We validate the approach on two real-world elevator (hoist) installations, demonstrating that the system operates efficiently in streaming conditions and matches or surpasses strong online baselines. Furthermore, we show that fine-tuning the foundation model on one installation provides measurable performance gains when transferred to a different installation, indicating that foundation-model adaptation can support cross-site knowledge transfer in industrial monitoring. The results highlight the promise of integrating online learning with foundation models to achieve both responsiveness and robustness in long-term industrial anomaly detection.

Online Learning Supported by Foundation Models for Anomaly Detection in Industrial Settings

Graph anomaly detection (GAD), which aims to identify rare observations in graphs, has attracted rapidly increasing attention in recent years due to its significance in a wide range of high-impact application domains such as abusive review detection and malicious behavior detection in online shopping applications, web attack detection, and suspicious activity detection in online/offline financial services. A foundation model on GAD refers to a generalist model trained on specific graph data, enabling it to generalize effectively across different domains and tasks. In recent years, such models have attracted increasing attention due to their ability to provide strong zero-shot and few-shot performance without task-specific retraining. By learning domain-invariant and transferable representations across tasks, a GAD foundation model can be readily adapted to new anomaly detection scenarios, making it applicable to a wide range of use cases such as privacy-preserving anomaly detection, transferable cybersecurity and threat detection, and cross-platform anomaly detection in social network.

In this tutorial, we aim to present a comprehensive review of deep learning methods specifically designed for GAD and foundation models for detecting abnormal activities on graphs. Specifically, we will first elaborate on the key concepts and taxonomies in GAD. Then review popular state-of-the-art deep anomaly detection methods from various perspectives of methodology design on graph data, including GNN backbone design, proxy task design, and anomaly measures. Then we will establish the connection between conventional methods and foundation models on GAD, highlighting how recent advancements build upon or differ from conventional approaches. Following this, we will provide a comprehensive overview of existing foundation models that have been proposed for detecting abnormal activities on graphs from cross-domain and cross-task, respectively. We will discuss their underlying principles, design choices, and effectiveness across various settings. The future directions will be finally presented to help researchers gain a deep understanding of this area and promote more high-quality research and real-world applications in the future.  The webiste of this tutorial is  https://sites.google.com/view/aaai26-tutorial-gad/home?read_current=1

Toward Foundation Models for Detecting Abnormal Activities on Graphs

How to find a natural grouping of a large real data set? Clustering requires a balance between abstraction and representation. To identify clusters, we need to abstract from superfluous details of individual objects, such as background or lighting in images. But we also need a rich representation that emphasizes the key features shared by groups of objects that distinguish them from other groups of objects. Each clustering algorithm implements a different trade-off between abstraction and representation. Classical K-means implements a high level of abstraction – details are simply averaged out – combined with a very simple representation – all clusters are Gaussians in the original data space. We will see how approaches to subspace and deep clustering support high-dimensional and complex data by allowing richer representations. However, with increasing representational expressiveness comes the need to explicitly enforce abstraction in the objective function to ensure that the resulting method performs clustering and not just representation learning. We will see how current deep clustering methods define and enforce abstraction through centroid-based and density-based clustering losses. Balancing the conflicting goals of abstraction and representation is challenging. Ideas from subspace clustering help by learning one latent space for the information that is relevant to clustering and another latent space to capture all other information in the data. The tutorial ends with an outlook on future research in clustering. Future methods will more adaptively balance abstraction and representation to improve performance, energy efficiency and interpretability.
This tutorial is for machine learning researchers and professionals interested in learning more about clustering high-dimensional data. Practitioners will receive an overview of different approaches to clustering high-dimensional data, along with insights into their benefits and limitations. This knowledge will enable them to select an appropriate method for their problem. Researchers will find starting points for contributing to the topic. We will illustrate foundational and current approaches with Python code examples. We will summarize the evaluation methodology and provide pointers to benchmark data. We will also highlight open problems that require further research. This tutorial is a starting point for actively contributing to this active and fascinating research topic. To illustrate, we will use real use cases from collaborative projects in biology, neuroscience, and archeology, in addition to benchmark data. Basic knowledge in machine learning, data mining, linear algebra and Python programming is beneficial but not required.

Website: https://dm.cs.univie.ac.at/research/aaai26/

Clustering High-dimensional Data: Balancing Abstraction and Representation

Computational Pathology Foundation Models (CPathFMs) have emerged as a transformative approach for automating histopathological analysis by leveraging self-supervised learning on large-scale, unlabeled whole-slide images (WSIs). These models, categorized into uni-modal and multi-modal frameworks, facilitate tasks such as segmentation, classification, biomarker discovery, and prognosis prediction. However, the development of CPathFMs faces significant challenges, including limited dataset availability, domain-specific adaptation requirements, and the absence of standardized evaluation benchmarks. This tutorial will provide a comprehensive overview of the current state of CPathFMs, covering key datasets, adaptation strategies such as contrastive learning and multi-modal integration, and a taxonomy of evaluation tasks. We will discuss how these models are trained, fine-tuned, and assessed, addressing the critical gaps in generalization, bias mitigation, and clinical applicability. Additionally, we will explore emerging research directions in fairness, transparency, security, and standardization of evaluation protocols. This tutorial will serve as an essential resource for researchers, clinicians, and AI practitioners looking to advance the field of AI-driven computational pathology.

Website: https://sites.google.com/view/aaai26tutorial-cpath/home

Premium content

Next from AAAI 2026

Rethinking Reward Models! A Conceptual Framework for Enhancing LLM Reasoning through Intrinsic Traits

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES