Singapore

We identify a jailbreaking vulnerability in multiple open-source LLMs: by augmenting dangerous requests using certain ``distractors&quot; to obfuscate their intent, we elicit specific, actionable responses on a wide variety of harmful topics. We find that such an attack noticeably alters the contents of these models&#39; chains of thought, including changed frequencies of seemingly unrelated $n$-grams and heightened ethical scrutiny about harmful requests even when their response is ultimately jailbroken.

AAAI 2026

Distractor-Based Jailbreaking Attacks in Language Models and Associated Changes in Chain-of-Thought Content (Student Abstract)

We identify a jailbreaking vulnerability in multiple open-source LLMs: by augmenting dangerous requests using certain ``distractors" to obfuscate their intent, we elicit specific, actionable responses on a wide variety of harmful topics. We find that such an attack noticeably alters the contents of these models' chains of thought, including changed frequencies of seemingly unrelated $n$-grams and heightened ethical scrutiny about harmful requests even when their response is ultimately jailbroken.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Vision-Language Models (VLMs) are increasingly deployed
across downstream tasks, yet their training data often
encode social biases that surface in outputs. Unlike
humans, who interpret images through contextual and social
cues, VLMs process them through statistical associations,
often leading to reasoning that diverges from human
reasoning. By analyzing how a VLM reasons, we can
understand how inherent biases are perpetuated and can
adversely affect downstream performance. To examine this
gap, we systematically analyze social biases in five
open-source VLMs for an occupation prediction task, on the
FairFace dataset. Across 32 occupations and three different
prompting styles, we elicit both predictions and reasoning.
Our findings show that the biased reasoning patterns
systematically underlie intersectional disparities,
highlighting the need to align VLM reasoning with human
values before downstream deployment.

How Reasoning Influences Intersectional Biases in Vision–Language Models (Student Abstract)

Existing federated prompt learning methods for
vision-language models like CLIP rely solely on text-based
prompts and final-layer visual features, missing crucial
multiscale visual details and client-specific style
variations. This limits generalization across non-IID
distributions and novel classes. We introduce FedCSAP
(Federated Cross-Modal Style-Aware Prompt Generation),
which harnesses multiscale features from CLIP's vision
encoder alongside domain-aware style statistics from client
data. By fusing these visual representations with textual
context, FedCSAP generates adaptive, context-aware prompts
that enhance robustness across seen and unseen classes. Our
privacy-preserving approach operates through local training
and global aggregation, effectively handling heterogeneous
client distributions. Experiments on multiple image
classification datasets demonstrate that FedCSAP
significantly outperforms existing federated prompt
learning methods in both accuracy and generalization.

Federated Cross-Modal Style-Aware Prompt Generation (Student Abstract)

Hyperdimensional Computing (HDC) represents data as
high-dimensional
hypervectors that are robust and efficient for learning.
Existing methods often rely on pseudo-random hypervector
generation, which can suffer from poor orthogonality and
high variance across runs, ultimately slowing convergence.
These approaches typically require numerous iterations (20–
100) to achieve acceptable accuracy. We propose a method
that utilizes deterministic Sobol-based linear projections
and
rank-based retraining to construct more stable and
discriminative
hypervectors, thereby reducing class confusion. Unlike
pseudo-random initialization, our projections guarantee
reproducibility
and better coverage of the feature space. As a result,
our approach achieves up to 97% accuracy in only 5
iterations.
This makes our model up to 20× faster while simultaneously
improving accuracy.

Deterministic Hyperdimensional Learning with Rank Refinement (Student Abstract)

Reliable uncertainty quantification (UQ) is crucial for
deploying deep learning models in safety-critical domains.
Existing UQ methods often either rely on multi-pass
inference, which increases computational cost, or restrict
expressiveness by using only final layer embeddings. In
this work, we propose a lightweight evidential meta-model
that leverages multi-layer feature fusion from a frozen
classifier, capturing both low-level textures and
high-level semantics to better estimate uncertainty. To
further enhance epistemic fidelity, we integrate maximum
weight-entropy (Max-WEnt) regularization, which encourages
hypothesis diversity without altering the base network or
adding test-time overhead. Experiments across seven
benchmarks, including medical (BACH, HAM10000, BreakHIS)
and natural image datasets (SVHN, Fashion-MNIST,
ImageNet-C), demonstrate consistent improvements in AUROC
and calibration compared to prior post-hoc UQ methods. Our
findings show that combining multi-layer evidential
modelling with Max-WEnt provides a robust, efficient, and
practical framework for trustworthy AI in high-stakes
applications.

Weight Entropy-Maximised Evidential Metamodel for Uncertainty Quantification (Student Abstract)

Embodied agents must reason causally, as correlation-based
models fail under intervention and distribution shift. This
challenge arises in domains like robotics and cyber-physical
systems, where agents balance efficiency and comfort under
uncertainty. We introduce POLICYGRID, unifying causal
discovery and control by treating each action as both
decision
and experiment. Leveraging constraint-based search, neural
causal models, and language model priors with interventional
validation, POLICYGRID yields adaptive, interpretable
policies. Across synthetic, real-world, and live
deployments, it
achieves superior causal recovery (F1 = 0.89) and 2.8×
better multi-objective performance than correlation-based
baselines, demonstrating safe, generalizable
decision-making.

POLICYGRID: Causal Discovery for Adaptive Policy Optimization in Embodied Agents (Student Abstract)

Well log datasets are often scarce, which hinders the
development of machine learning models for reservoir
analysis, a common challenge in the oil and gas industry.
We present VAEc-tMC, a Conditional Variational Autoencoder
designed to generate synthetic well log data conditioned on
rock type. By embedding geological context into the
generative process, our model addresses a critical gap
overlooked by existing methods. Our approach integrates a
Student’s t-distribution loss, a smoothed Kullback–Leibler
divergence, and low-variance Monte Carlo method sampling to
improve robustness and fidelity. When used for data
augmentation, the synthetic data preserve key statistical
properties of real logs and improve downstream lithology
classification by about 80% in AUC, 62% in accuracy, and
71% in F1. These findings validate the model’s ability to
generate geologically consistent synthetic data, extending
its applicability to reservoir modeling and downstream ML
workflows in data scarce environments.

Lithology-Aware Conditional Variational Autoencoder for Synthetic Well Log Generation in Petroleum Reservoirs (Student Abstract)

Ensuring safety in deep reinforcement learning is
challenging, as formal methods that provide strong
guarantees often fail to scale to complex, high-dimensional
systems. We introduce RAMPS, a scalable shielding framework
that pairs a general-purpose, learned linear dynamics model
with a robust, multi-step Control Barrier Function (CBF)
for real-time safety interventions. Experiments show RAMPS
significantly reduces safety violations in high-dimensional
environments compared to state-of-the-art methods, without
sacrificing task performance.

Robust Adaptive Multi-Step Predictive Shielding (Student Abstract)

We present iDT-diet, an intelligent digital twin prototype designed to model the long-term influence of diet quality on health biomarkers and chronic conditions. The system integrates three novel components: (i) a random forest learning model enhanced with Choquet LASSO feature selection for capturing complex, nonlinear interactions in temporal health data; (ii) a translation module that converts predictive outputs into natural language narratives of physical and biomarker states; and (iii) a generative 3D visualization engine that produces dynamic, personalized digital twins reflecting evolving health trajectories. This integration uniquely links advanced machine learning, interpretable communication, and immersive visualization within a single framework. While the current implementation focuses on retrospective digital twin generation, the system architecture supports real-time data integration, enabling continuous monitoring, predictive simulation, and personalized recommendation delivery for diet and lifestyle management.

iDT-diet: Toward Personalized Health Forecasting-An Intelligent Digital Twin Model for Diet-Influenced Biomarker Trajectories (Student Abstract)

Causal discovery is the task of learning causal models, encoding causal relationships, from a source of information, such as a dataset containing observational data. While many algorithms have been developed to discover causal models under varied sets of assumptions, the case in which the dataset is affected by missing data remains significantly underexplored. Naively applying standard causal discovery algorithms to listwise, test-wise, or regression-wise deleted datasets, or imputing the missing data, can introduce spurious associations between variables and bias function estimation in functional causal models. This issue arises when the data is missing at random or not at random. It ultimately invalidates the theoretical guarantees of these algorithms and prevents finding the true underlying causal model, even in the large-sample limit. An established family of causal models is the Linear Non-Gaussian Acyclic Model (LiNGAM), which assumes linear functional relationships and non-Gaussian independent noise terms. We propose a new causal discovery algorithm for LiNGAM, capable of recovering the underlying causal structure and providing unbiased estimates of the model’s parameters, even when the data is affected by MNAR missingness.

Discovering Linear Non-Gaussian Models for All Categories of Missing Data (Student Abstract)

Although deep networks excel on RGB images, their performance degrades sharply under severe domain shifts—such as sketch recognition, where color and texture cues are missing. In this work, we propose a novel pipeline that leverages semantic cues extracted from sketches to guide the synthesis of photorealistic RGB images using diffusion-based generative models. Our framework operates by extracting two crucial cues from the input sketch: semantic captions via the BLIP model and structural outlines via Canny edge detection. These cues are then integrated using ControlNet to guide a Stable Diffusion model, ensuring the synthesized RGB image is both semantically consistent with the content and structurally faithful to the original sketch. We evaluated our synthesized images by benchmarking classification performance. We trained standard architectures (from convolutional to transformer-based) on Tiny-ImageNet subsets and tested them on sketches, their synthesized counterparts, and the original RGB images. Experimental results demonstrate that our approach produces realistic, identity-preserving images, which significantly improve classification accuracy and effectively bridge the semantic gap. While BLIP-based captioning and ControlNet-guided diffusion are established methods, our contribution lies in their integration into a unified, caption-guided pipeline that enhances sketch-to-RGB translation with improved semantic consistency. The proposed method generalizes well across architectures, providing a scalable and cost-efficient solution for sketch-based image synthesis.

Content not yet available

Next from AAAI 2026

How Reasoning Influences Intersectional Biases in Vision–Language Models (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES