Singapore

Backdoor attacks pose a significant threat to Large Language Models (LLMs), where adversaries can embed hidden triggers to manipulate LLM&#39;s outputs. 
Most existing defense methods, primarily designed for classification tasks, are ineffective against the autoregressive nature and vast output space of LLMs, thereby suffering from poor performance and high latency. 
To address these limitations, we investigate the behavioral discrepancies between benign and backdoored LLMs in output space.
We identify a critical phenomenon which we term sequence lock: a backdoored model generates the target sequence with abnormally high and consistent confidence compared to benign generation. 
Building on this insight, we propose $\mathtt{ConfGuard}$, a lightweight and effective detection method that monitors a sliding window of token confidences to identify sequence lock. 
Extensive experiments demonstrate $\mathtt{ConfGuard}$ achieves a near 100\% true positive rate (TPR) and a negligible false positive rate (FPR) in the vast majority of cases.
Crucially, the $\mathtt{ConfGuard}$ enables real-time detection almost without additional latency, making it a practical backdoor defense for real-world LLM deployments.

AAAI 2026

ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models

peai: privacy & security

robustness & trustworthiness

peai: safety

fairness & equity

peai: bias

Backdoor attacks pose a significant threat to Large Language Models (LLMs), where adversaries can embed hidden triggers to manipulate LLM's outputs. 
Most existing defense methods, primarily designed for classification tasks, are ineffective against the autoregressive nature and vast output space of LLMs, thereby suffering from poor performance and high latency. 
To address these limitations, we investigate the behavioral discrepancies between benign and backdoored LLMs in output space.
We identify a critical phenomenon which we term sequence lock: a backdoored model generates the target sequence with abnormally high and consistent confidence compared to benign generation. 
Building on this insight, we propose $\mathtt{ConfGuard}$, a lightweight and effective detection method that monitors a sliding window of token confidences to identify sequence lock. 
Extensive experiments demonstrate $\mathtt{ConfGuard}$ achieves a near 100\% true positive rate (TPR) and a negligible false positive rate (FPR) in the vast majority of cases.
Crucially, the $\mathtt{ConfGuard}$ enables real-time detection almost without additional latency, making it a practical backdoor defense for real-world LLM deployments.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Moving infrared small target detection (IRSTD) plays a critical role in practical applications, such as surveillance of unmanned aerial vehicles (UAVs) and UAV-based search system. Moving IRSTD still remains highly challenging due to weak target features and complex background interference. Accurate spatio-temporal feature modeling is crucial for moving target detection, typically achieved through either temporal differences or spatio-temporal (3D) convolutions. Temporal difference can explicitly leverage motion cues but exhibits limited capability in extracting spatial features, whereas 3D convolution effectively represents spatio-temporal features yet lacks explicit awareness of motion dynamics along the temporal dimension. In this paper, we propose a novel moving IRSTD network (TDCNet), which effectively extracts and enhances spatio-temporal features for accurate target detection. Specifically, we introduce a novel temporal difference convolution (TDC) re-parameterization module that comprises three parallel TDC blocks designed to capture contextual dependencies across different temporal ranges. Each TDC block fuses temporal difference and 3D convolution into a unified spatio-temporal convolution representation. This re-parameterized module can effectively capture multi-scale motion contextual features while suppressing pseudo-motion clutter in complex backgrounds, significantly improving detection performance. Moreover, we propose a TDC-guided spatio-temporal attention mechanism that performs cross-attention between the spatio-temporal features extracted from the TDC-based backbone and a parallel 3D backbone. This mechanism models their global semantic dependencies to refine the current frame’s features, thereby guiding the model to focus more accurately on critical target regions. To facilitate comprehensive evaluation, we construct a new challenging benchmark, IRSTD-UAV, consisting of 15,106 real infrared images with diverse low signal-to-clutter ratio scenarios and complex backgrounds. Extensive experiments on IRSTD-UAV and public infrared datasets demonstrate that our TDCNet achieves state-of-the-art detection performance in moving target detection. Dataset and source code will be released upon acceptance.

Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection

Estimating Individual Treatment Effects (ITE) from observational data is challenging due to confounding bias. Most studies tackle this bias by balancing distributions globally, but ignore individual heterogeneity and fail to capture the local structure that represents the natural clustering among individuals, which ultimately compromises ITE estimation. While instance-level alignment methods consider heterogeneity, they similarly overlook the local structure information. To address these issues, we propose an end-to-end Multi-$\textbf{P}$rototype alignment method for $\textbf{ITE}$ estimation ($\textbf{PITE}$). PITE effectively captures local structure within groups and enforces cross-group alignment, thereby achieving robust ITE estimation. Specifically, we first define prototypes as cluster centroids based on similar individuals under the same treatment. To identify local similarity and the distribution consistency, we perform instance-to-prototype matching to assign individuals to the nearest prototype within groups, and design a multi-prototype alignment strategy to encourage the matched prototypes to be close across treatment arms in the latent space. PITE not only reduces distribution shift through fine-grained, prototype-level alignment, but also preserves the local structures of treated and control groups, which provides meaningful constraints for ITE estimation. Extensive evaluations on benchmark datasets demonstrate that PITE outperforms 13 state-of-the-art methods, achieving more accurate and robust ITE estimation.

PITE: Multi-Prototype Alignment for Individual Treatment Effect Estimation

Color polarization demosaicking (CPDM) aims to reconstruct full-resolution polarization images of four directions from the color-polarization filter array (CPFA) raw image. Due to the challenge of predicting numerous missing pixels and the scarcity of high-quality training data, existing network-based methods, despite effectively recovering scene intensity information, still exhibit significant errors in reconstructing polarization characteristics (degree of polarization, DOP, and angle of polarization, AOP). To address this problem, we introduce the image diffusion prior from text-to-image (T2I) models to overcome the performance bottleneck of network-based methods, with the additional diffusion prior compensating for limited representational capacity caused by restricted data distribution. To effectively leverage the diffusion prior, we explicitly model the polarization uncertainty during reconstruction and use uncertainty to guide the diffusion model in recovering high error regions. Extensive experiments demonstrate that the proposed method accurately recovers scene polarization characteristics with both high fidelity and strong visual perception.

Polarization Uncertainty-Guided Diffusion Model for Color Polarization Image Demosaicking

Affective Forecasting is an psychology task that involves predicting an individual's future emotional responses, often hampered by reliance on external factors leading to inaccuracies, and typically remains at a qualitative analysis stage. To address these challenges, we narrows the scope of Affective Forecasting by introducing the concept of Human-interaction-based Emotion Forecasting (EF). This task is set within the context of a two-party interaction, positing that an individual's emotions are significantly influenced by their interaction partner's emotional expressions and informational cues. This dynamic provides a structured perspective for exploring the patterns of emotional change, thereby enhancing the feasibility of emotion forecasting.

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction

Existing vision-and-language navigation models often deviate from the correct trajectory when executing instructions. However, these models lack effective error correction capability, hindering their recovery from errors. To address this challenge, we propose Self-correction Flywheel, a novel post-training paradigm. Instead of considering the model’s error trajectories on the training set as a drawback, our paradigm emphasizes their significance as a valuable data source. We have developed a method to identify deviations in these error trajectories and devised innovative techniques to automatically generate self-correction data for perception and action. These self-correction data serve as fuel to power the model’s continued training. The brilliance of our paradigm is revealed when we re-evaluate the model on the training set, uncovering new error trajectories. At this time, the self-correction flywheel begins to spin. Through multiple flywheel iterations, we progressively enhance our monocular RGB-based VLA navigation model CorrectNav. Experiments on R2R-CE and RxR-CE benchmarks show CorrectNav achieves new state-of-the-art success rates of 65.1% and 69.3%, surpassing prior best VLA navigation models by 8.2% and 16.4%. Real robot tests in various indoor and outdoor environments demonstrate \method's superior capability of error correction, dynamic obstacle avoidance, and long instruction following.

CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model

Black-box algorithms aim to optimize functions without access to their analytical structure or gradient information, making them essential when gradients are unavailable or computationally expensive to obtain. Traditional methods for black-box optimization (BBO) primarily utilize non-parametric models, but these approaches often struggle to scale effectively in large input spaces. Conversely, parametric approaches, which rely on neural estimators and gradient signals via backpropagation, frequently encounter substantial gradient estimation errors, limiting their reliability. Explicit Gradient Learning (EGL), a recent advancement, directly learns gradients using a first-order Taylor approximation and has demonstrated superior performance compared to both parametric and non-parametric methods. However, EGL inherently remains local and myopic, often faltering on highly non-convex optimization landscapes. In this work, we address this limitation by integrating global statistical insights from the evolutionary algorithm CMA-ES into the gradient learning framework, effectively biasing gradient estimates towards regions with higher optimization potential. Moreover, we enhance the gradient learning process by estimating the Hessian matrix, allowing us to correct the second-order residual of the Taylor series approximation. Our proposed algorithm, EvoGrad² (Evolutionary Gradient Learning with second-order approximation), achieves state-of-the-art results on the synthetic COCO test suite, exhibiting significant advantages in high-dimensional optimization problems. We further demonstrate EvoGrad²'s effectiveness in challenging real-world machine learning tasks, including adversarial training and code generation, highlighting its ability to generate more robust and high-quality solutions. Our results underscore EvoGrad\textsuperscript{2}'s potential as a powerful tool for researchers and practitioners facing complex, high-dimensional, and non-linear optimization problems.

EvoGrad: Evolutionary-Weighted Gradient and Hessian Learning for Black-Box Optimization

Trained on various human-authored corpora, Large Language Models (LLMs) have demonstrated a certain capacity to be prompted towards reflecting specific human-like traits (e.g., personality or values), benefiting downstream personalized LLMs and social simulations. However, current methods suffer from superficial elicitation: LLMs are only steered to mimic shallow, unstable stylistic patterns, failing to embody the desired traits precisely and consistently across diverse tasks like humans. To address this challenge, we propose IROTE, a novel in-context method for stable and transferable trait elicitation. Drawing on psychological theories that traits are formed through identity-related reflection, our method automatically generates and optimizes a textual self-reflection within prompts, which comprises self-perceived experience, to stimulate LLMs' trait-driven behavior. The optimization is performed by iteratively maximizing an information-theoretic objective that enhances the connections between LLMs' behavior and the target trait, while reducing noisy redundancy in reflection without any fine-tuning, leading to evocative and compact trait reflection. Extensive experiments across three human trait systems manifest that one single IROTE-generated self-reflection can induce LLMs' stable impersonation of the target trait across diverse downstream tasks beyond simple questionnaire answering, consistently outperforming existing strong baselines.

IROTE: Human-like Traits Elicitation of Large Language Model via In-Context Self-Reflective Optimization

Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for detecting unobserved confounding in nonlinear observational data under single-environment conditions. KRCD leverages reproducing kernel Hilbert spaces to model complex dependencies. By comparing standard and higher-order kernel regressions, we derive a test statistic whose significant deviation from zero indicates unobserved confounding. Theoretically, we prove two key results: First, in infinite samples, regression coefficients coincide if and only if no unobserved confounders exist. Second, finite-sample differences converge to zero-mean Gaussian distributions with tractable variance. Extensive experiments on synthetic benchmarks and the Twins dataset demonstrate that KRCD not only outperforms existing baselines but also achieves superior computational efficiency.

Detecting Unobserved Confounders: A Kernelized Regression Approach

Uncovering causal structures from observational data is crucial for understanding complex systems and making informed decisions. While reinforcement learning (RL) has shown promise in identifying these structures in the form of a directed acyclic graph (DAG), existing methods often lack efficiency, making them unsuitable for online applications. In this paper, we propose MARLIN, an efficient multi-agent RL-based approach for incremental DAG learning. MARLIN uses a DAG generation policy that maps a continuous real-valued space to the DAG space as an intra-batch strategy, then incorporates two RL agents—state-specific and state-invariant—to uncover causal relationships and integrates these agents into an incremental learning framework. Furthermore, the framework leverages a factored action space to enhance parallelization efficiency. Extensive experiments on synthetic and real datasets demonstrate that MARLIN outperforms state-of-the-art methods in terms of both efficiency and effectiveness.

MARLIN: Multi-Agent Reinforcement Learning for Incremental DAG Discovery

The emergence of accurate open large language models (LLMs) has sparked a push for advanced quantization techniques to enable efficient deployment on end-user devices. In this paper, we revisit the challenge of extreme LLM compression---targeting ultra-low-bit quantization for both activations and weights---from a Fourier frequency domain perspective. We propose SpecQuant, a two-stage framework that tackles activation outliers and cross-channel variance. In the first stage, activation outliers are smoothed and transferred into the weight matrix to simplify downstream quantization. In the second stage, we apply channel-wise low-frequency Fourier truncation to suppress high-frequency components while preserving essential signal energy, improving quantization robustness. Our method builds on the principle that most of the weight energy is concentrated in low-frequency components, which can be retained with minimal impact on model accuracy. To enable runtime adaptability, we introduce a lightweight truncation module during inference that adjusts truncation thresholds based on channel characteristics. On LLaMA-3 8B, SpecQuant achieves 4-bit quantization for both weights and activations, narrowing the zero-shot accuracy gap to only 1.5% compared to full precision, while delivering 2× faster inference and 3× lower memory usage.

Downloads

Next from AAAI 2026

Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads