Singapore

Representation Finetuning (ReFT) has recently emerged as an efficient paradigm for adapting pretrained language models by editing hidden representations rather than model weights. However, our preliminary experiments reveal that ReFT is notably more sensitive to training data quality compared to traditional parameter-efficient finetuning methods, particularly to samples with incorrect labels, which can severely degrade performance. Inspired by prior work demonstrating that the hidden representations of generalizable neural networks exhibit low-dimensional manifold structures, we hypothesize that effective generalization in ReFT requires geometrically structured transformations between pre- and post-intervention representations. This implies that the intervention vectors representing these transformations should form a low-dimensional manifold, rendering the inconsistent transformations induced by label noise as detectable geometric outliers. To leverage this insight, we introduce Aligning Interventions on a learned Manifold (AIM), a representation-based data filtering method for ReFT, which identifies high-quality training samples by measuring the geometric consistency of their intervention vectors with respect to a robust reference manifold derived via principal component analysis on trusted data. Extensive experiments on both commonsense and arithmetic reasoning tasks confirm the effectiveness of AIM, showing consistent improvements over strong data selection baselines across multiple model scales.

AAAI 2026

AIM: Manifold-based Data Filtering for Representation Finetuning

deep neural architectures and foundation models

representation learning

data filtering

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Online change detection (OCD), which aims to quickly identify change points in streaming data, is vital in domains such as power system monitoring, wireless network sensing, and financial anomaly detection. Existing OCD methods often assume exact system knowledge, which is impractical due to estimation errors and environmental changes. Also, the limitations of existing optimization algorithms hinder efficient detection in large-scale systems. To address these issues, we propose RoS-Guard, a robust and optimal OCD algorithm with parallel GPU acceleration for uncertain systems. Unlike traditional approaches, RoS-Guard offers theoretical guarantees on optimality, robustness, and detection delay. Specifically, we derive analytical bounds on the expected false alarm rate and the worst-case average detection delay. Leveraging the decomposition of the mixed integer quadratic programming (MIQP) optimization problem, we developed a GPU-accelerated algorithm. Experiments demonstrate RoS-Guard’s effectiveness and significant speedup in large-scale scenarios.

RoS-Guard: Robust and Scalable Online Change Detection with Delay-Optimal Guarantees

Real-world event sequences are often generated under different mechanisms and thus have clustering structures. 
Nonetheless, in the modeling and prediction of event sequences, most existing TPPs treat different event sequences independently, ignoring the inherent clustering structures among them.
In this study, we design a novel semi-transductive temporal point process (ST-TPP) and learn it with a Gromov-Wasserstein barycentric (GWB) regularizer in the Maximum Likelihood Estimation (MLE) framework.
In particular, given a set of event sequences, our method learns a neural TPP together with cluster centers of sequences.
When computing the intensity function of an event sequence, the proposed neural TPP encodes the sequence history and the cluster center derived from other similar sequences jointly, leading to a semi-transductive modeling scheme.
In the learning phase, besides maximizing the likelihood of event sequences, we leverage data-centric and knowledge-based kernel matrices to regularize sequence embeddings and derive cluster centers, leading to the proposed GWB regularizer.
Experiments on various datasets demonstrate that the transductive modeling scheme of ST-TPP provides a novel approach to sharing information across different sequences, resulting in clustered sequence embeddings and competitive predictive performance.

ST-TPP: Learning Semi-Transductive Temporal Point Processes with Gromov-Wasserstein Barycentric Regularization

Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the role of speech tokenizer designs in LLM-centric SLMs, augmented by speech heads and speaker modeling.
We compare coupled, semi-decoupled, and fully decoupled speech tokenizers under a fair SLM framework and find that decoupled tokenization significantly improves alignment and synthesis quality. To address the information density mismatch between speech and text, we introduce multi-token prediction (MTP) into SLMs, enabling each hidden state to decode multiple speech tokens. This leads to up to $12\times$ faster decoding and a substantial drop in word error rate (from 6.07 to 3.01). Furthermore, we propose a speaker-aware generation paradigm and introduce RoleTriviaQA, a large-scale role-playing knowledge QA benchmark with diverse speaker identities. Experiments demonstrate that our methods enhance both knowledge understanding and speaker consistency.

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

With the widespread use of LLMs, preserving privacy in user prompts has become crucial, as prompts risk exposing private and sensitive data to cloud LLMs. Conventional techniques like homomorphic encryption (HE), secure multi-party computation, and federated learning (FL) are not well-suited to this scenario due to the lack of control over user participation in remote model interactions. In this paper, we propose PromptObfus, a novel method for desensitizing LLM prompts. The core idea of PromptObfus is "anti-adversarial" learning, which perturbs sensitive words in the prompt to obscure private information while retaining the stability of model predictions. Specifically, PromptObfus frames prompt desensitization as a masked language modeling task, replacing privacy-sensitive terms with a [MASK] token. A desensitization model is utilized to generate candidate replacements for each masked position. These candidates are subsequently selected based on gradient feedback from a surrogate model, ensuring minimal disruption to the task output. We demonstrate the effectiveness of our approach on three NLP tasks. Results show that PromptObfus effectively prevents privacy inference from remote LLMs while preserving task performance. Our code is publicly available at https://anonymous.4open.science/r/PromptObfus-BF36/.

Anti-adversarial Learning: Desensitizing Prompts for Large Language Model

As embodied agents operate in increasingly complex environments, the ability to perceive, track, and reason about individual object instances over time becomes essential, especially in tasks requiring sequenced interactions with visually similar objects. In these non-Markovian settings, key decision cues are often hidden in object-specific histories rather than the current scene. Without persistent memory of prior interactions, such as what has been interacted with, where it has been, or how it has changed, visuomotor policies may fail, repeat past actions, or overlook completed ones. To surface this challenge, we introduce LIBERO-Mem, a non-Markovian task suite for stress-testing robotic manipulation under object-level partial observability. It combines short- and long-horizon object tracking with temporally sequenced subgoals, requiring reasoning beyond the current frame. However, naïve vision-language-action (VLA) models struggle in such settings, with token scaling quickly becoming intractable-even for tasks spanning just a few hundred frames. We propose Embodied-SlotSSM, a slot-centric VLA framework built for temporal scalability. It maintains spatio-temporally consistent slot identities and leverages them through two mechanisms: (1) slot-state-space modeling for reconstructing short-term history, and (2) a relational encoder to align the input tokens with action decoding. Together, these components enable temporally grounded, context-aware action prediction. Experiments show Embodied-SlotSSM's baseline performance on LIBERO-Mem and general benchmarks, offering a scalable solution for non-Markovian reasoning in object-centric robotic policies.

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

Video generative models pre-trained on large-scale internet datasets have achieved remarkable success, excelling at producing realistic synthetic videos. However, they often generate clips based on static prompts (e.g., text or images), limiting their ability to model interactive and dynamic scenarios. In this paper, we propose \textbf{D}ynamic \textbf{W}orld \textbf{S}imulation (DWS), a novel approach to transform pre-trained video generative models into controllable world simulators capable of executing specified action trajectories. To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module that seamlessly integrates into any existing model. Instead of focusing on complex visual details, we demonstrate that consistent dynamic transition modeling is the key to building powerful world simulators. Building upon this insight, we further introduce a motion-reinforced loss that enhances action controllability by compelling the model to capture dynamic changes more effectively. Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models, achieving significant improvements in generating action-controllable, dynamically consistent videos across games and robotics domains. Moreover, to facilitate the applications of the learned world simulator in downstream tasks such as model-based reinforcement learning, we propose prioritized imagination to improve sample efficiency, demonstrating competitive performance compared with state-of-the-art methods.

Pre-Trained Video Generative Models as World Simulators

Direct Preference Optimization (DPO) has emerged as a simple and effective approach for aligning models with human preferences. However, existing DPO-based methods suffer from three key drawbacks: they rely on only a single positive-negative preference pair per question, restricting the diversity and richness of feedback; they often emphasize minimizing negative preference scores while neglecting to strengthen the positive preferences; and they depend on either human-annotated preferences or expert model outputs - both xpensive and difficult to scale. Moreover, the deterministic ranking assumptions of recent Group-based preference optimization methods break down in open-ended tasks such as Visual Question Answering (VQA), where multiple answers can be equally plausible but differ subtly in relevance or specificity. Given this subtle variance in preferences, we propose to perform ranking over groups of preferences rather than relying on fine-grained ranking of individual ones, which is often noisy and subjective. To address these challenges, we introduce Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking (SMPRO), a novel framework that (1) self-generates rich, diverse preference groups while eliminating the need for external annotations, (2) employs a fully differentiable ranking objective based on sorting networks to capture nuanced preference gradients across arbitrary numbers of preferences both within and across these groups, and (3) incorporates multiple positive preferences to enrich the positive preference group, capturing subtle distinctions among high-quality preferences. Extensive experiments across diverse visual tasks demonstrate that our approach achieves state-of-the-art performance in self-supervised setting. Specifically, our model surpasses existing baselines, achieving notable improvements such as 82.4% on MMBench, 63.2% on MM-Star, 94.6% on LLaVA-W, and 81.9% on AI2D. These results underscore the effectiveness of our approach in capturing richer preference signals and demonstrate its scalability for open-ended, ambiguous VQA tasks.

SMPRO: Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking

Large language models have gained widespread attention recently, but their potential security vulnerabilities, especially privacy leakage, are also becoming apparent. To test and evaluate for data extraction risks in LLM, we proposed CoSPED, short for Consistent Soft Prompt targeted data Extraction and Defense. We introduce several innovative components, including Dynamic Loss, Additive Loss, Common Loss, and Self Consistency Decoding Strategy, and tested to enhance the consistency of the soft prompt tuning process. Through extensive experimentation with various combinations, we achieved an extraction rate of 65.2% at a 50-token prefix comparison. Our comparisons of CoSPED with other reference works confirm our superior extraction rates. We evaluate CoSPED on more scenarios, achieving Pythia model extraction rate of 51.7% and introducing cross-model comparison. Finally, we explore defense through Rank-One Model Editing and achieve a reduction in the extraction rate to 1.6%, which proves that our analysis of extraction mechanisms can directly inform effective mitigation strategies against soft prompt-based attacks.

CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

In vision-language models (VLMs), the ability to perceive and interpret color and physical environment is crucial for achieving contextually accurate understanding and interaction. However, despite advances in multimodal modeling, there remains a significant lack of specialized datasets that rigorously evaluate a model's capacity to discern subtle color variations and spatial context---critical elements for situational comprehension and reliable deployment across real-world applications.
Toward that goal, we curate MegaCoin, a high-quality, human-labeled dataset based on \emph{real} images with various contextual attributes. MegaCoin consists of two parts: MegaCoin-Instruct, which serves as a supervised fine-tuning (SFT) dataset for VLMs; and MegaCoin-Bench, an annotated test set that can be used as a stand-alone QA dataset. MegaCoin provides three annotated features for 220,000 real images: foreground color, background color, and description of an object's physical environment, constituting 660k human annotations. In addition, MegaCoin can be applied to benchmark domain generalization (DG) algorithms. We explore benchmarking DG methods in the linear probing setup for VLM and show some new insights. Last but not least, we show that VLMs, including GPT-4o, have subpar color recognition capabilities, and fine-tuning with MegaCoin can result in improved performance on visual evaluation tasks. In certain cases, MegaCoin fine-tuned small-scale open-source models such as LLaVA and Bunny can outperform closed-source GPT-4o. We hope the utilities of MegaCoin can shed light on the directions VLMs can improve and provide a more complex platform for domain generalization algorithms.

MegaCoin: Enhancing Medium-Grained Color Perception for Vision-Language Models

The increasing complexity of modern AI systems exposes a significant assurance gap: safety evidence from practices like red-teaming and robustness testing remains fragmented, lacking a formal mechanism for composition and propagation throughout the development lifecycle. This prevents the construction of rigorous, dynamic safety cases essential for trustworthy AI. We introduce the Composable Assurance Framework (CAF), a novel engineering methodology that integrates safety assurance directly into MLOps workflows. At its core is the Formal Safety Assertion (FSA), a standardized, machine-readable structure that verifiably links safety properties—such as robustness scores or the absence of deceptive circuits—to specific AI artifacts. We then define a Composition Calculus, a set of formal rules governing how FSAs are propagated and aggregated as components are combined into a system. This approach transforms the development pipeline into an automated evidence-gathering engine, whose output is a dynamic Directed Acyclic Graph (DAG) of assertions that constitutes a living safety case. Through a prototype and a Retrieval-Augmented Generation (RAG) case study, we demonstrate how CAF automatically enforces a predefined safety policy, blocking non-compliant deployments.

Content not yet available

Next from AAAI 2026

RoS-Guard: Robust and Scalable Online Change Detection with Delay-Optimal Guarantees

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES