Singapore

In this paper, we study the adversarial robustness of deep
neural networks (DNN) for classification against optimal
classifiers. We look at the smallest magnitude of possible
additive perturbations that can change a classifier&#39;s
output. We provide a matrix-theoretic explanation of the
adversarial fragility of DNNs for classification. In
particular, our theoretical results show that the
adversarial robustness of a neural network can degrade as
the input dimension d increases. Analytically, we show
that the adversarial robustness of neural networks can be
only 1/√d of the best possible adversarial
robustness of optimal classifiers. Our theories match
remarkably well with empirical results. The
matrix-theoretic explanation aligns with an earlier
information-theoretic feature-compression-based explanation
for the adversarial fragility of neural networks.

AAAI 2026

Feature Compression May Be the Root Cause of Adversarial Fragility in Neural Network Classifiers (Student Abstract)

In this paper, we study the adversarial robustness of deep
neural networks (DNN) for classification against optimal
classifiers. We look at the smallest magnitude of possible
additive perturbations that can change a classifier's
output. We provide a matrix-theoretic explanation of the
adversarial fragility of DNNs for classification. In
particular, our theoretical results show that the
adversarial robustness of a neural network can degrade as
the input dimension d increases. Analytically, we show
that the adversarial robustness of neural networks can be
only 1/√d of the best possible adversarial
robustness of optimal classifiers. Our theories match
remarkably well with empirical results. The
matrix-theoretic explanation aligns with an earlier
information-theoretic feature-compression-based explanation
for the adversarial fragility of neural networks.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large Language Models (LLMs) are increasingly employed for literature reviews, academic drafting, and scholarly writing. While their fluency accelerates knowledge synthesis, they frequently produce fabricated or erroneous references, known as citation hallucinations (CHs). Recent studies report hallucination rates ranging from 18% in GPT-4 to over 70% in other frontier models, with domain-specific rates as high as 88% in legal contexts. Benchmarks such as CiteME further highlight the gap between LLMs (4.2–18.5% accuracy) and human annotators (69.7%), while retrieval-augmented systems like CiteAgent demonstrate partial progress. This study examines methods for automatically detecting hallucinated citations. We present a benchmark of machine-generated references labelled with three fine-grained categories (valid, partially valid, and hallucinated), and propose a hybrid detection pipeline combining bibliographic retrieval, fuzzy similarity, and LLM-based verification. Preliminary experiments indicate improvements over exact matching baselines. We argue that scalable, real-time citation verification is a crucial step toward developing trustworthy LLM-based scholarly assistants and generating reproducible scientific knowledge, and outline directions for multilingual and domain-specific extensions.

Detecting Citation Hallucinations in Large Language Model Outputs (Student Abstract)

Encrypted traffic classification has become increasingly
important in network security. To address the difficulty of
existing architectures in collaboratively modeling
spatio-temporal features, we propose BiST-Mamba, a novel
dual-branch spatio-temporal Mamba network that
synchronously extracts spatio-temporal features. To the
best of our knowledge, this is the first work to introduce
VMamba into encrypted traffic classification. Preliminary
experiments on a small-scale dataset show that our accuracy
and F1 scores reach 92.74% and 83.43%, respectively. The
method achieves promising classification performance,
demonstrating the potential of the model for effective
spatio-temporal modeling.

BiST-Mamba: A Dual-branch Spatio-Temporal Mamba Network for Encrypted Traffic Classification (Student Abstract)

Current video understanding models struggle with temporal
reasoning and efficient processing while balancing detail
preservation with computational efficiency. We propose a
hierarchical memory system that segments videos into action
and scene units, combined with question-aware agentic
keyframe selection. Our method achieves 70.3% overall
accuracy on VideoMME short video benchmarks.

HARK: Hierarchical Agentic Retrieval with Keyframing for Video Understanding (Student Abstract)

We identify a jailbreaking vulnerability in multiple open-source LLMs: by augmenting dangerous requests using certain ``distractors" to obfuscate their intent, we elicit specific, actionable responses on a wide variety of harmful topics. We find that such an attack noticeably alters the contents of these models' chains of thought, including changed frequencies of seemingly unrelated $n$-grams and heightened ethical scrutiny about harmful requests even when their response is ultimately jailbroken.

Distractor-Based Jailbreaking Attacks in Language Models and Associated Changes in Chain-of-Thought Content (Student Abstract)

Vision-Language Models (VLMs) are increasingly deployed
across downstream tasks, yet their training data often
encode social biases that surface in outputs. Unlike
humans, who interpret images through contextual and social
cues, VLMs process them through statistical associations,
often leading to reasoning that diverges from human
reasoning. By analyzing how a VLM reasons, we can
understand how inherent biases are perpetuated and can
adversely affect downstream performance. To examine this
gap, we systematically analyze social biases in five
open-source VLMs for an occupation prediction task, on the
FairFace dataset. Across 32 occupations and three different
prompting styles, we elicit both predictions and reasoning.
Our findings show that the biased reasoning patterns
systematically underlie intersectional disparities,
highlighting the need to align VLM reasoning with human
values before downstream deployment.

How Reasoning Influences Intersectional Biases in Vision–Language Models (Student Abstract)

Existing federated prompt learning methods for
vision-language models like CLIP rely solely on text-based
prompts and final-layer visual features, missing crucial
multiscale visual details and client-specific style
variations. This limits generalization across non-IID
distributions and novel classes. We introduce FedCSAP
(Federated Cross-Modal Style-Aware Prompt Generation),
which harnesses multiscale features from CLIP's vision
encoder alongside domain-aware style statistics from client
data. By fusing these visual representations with textual
context, FedCSAP generates adaptive, context-aware prompts
that enhance robustness across seen and unseen classes. Our
privacy-preserving approach operates through local training
and global aggregation, effectively handling heterogeneous
client distributions. Experiments on multiple image
classification datasets demonstrate that FedCSAP
significantly outperforms existing federated prompt
learning methods in both accuracy and generalization.

Federated Cross-Modal Style-Aware Prompt Generation (Student Abstract)

Hyperdimensional Computing (HDC) represents data as
high-dimensional
hypervectors that are robust and efficient for learning.
Existing methods often rely on pseudo-random hypervector
generation, which can suffer from poor orthogonality and
high variance across runs, ultimately slowing convergence.
These approaches typically require numerous iterations (20–
100) to achieve acceptable accuracy. We propose a method
that utilizes deterministic Sobol-based linear projections
and
rank-based retraining to construct more stable and
discriminative
hypervectors, thereby reducing class confusion. Unlike
pseudo-random initialization, our projections guarantee
reproducibility
and better coverage of the feature space. As a result,
our approach achieves up to 97% accuracy in only 5
iterations.
This makes our model up to 20× faster while simultaneously
improving accuracy.

Deterministic Hyperdimensional Learning with Rank Refinement (Student Abstract)

Reliable uncertainty quantification (UQ) is crucial for
deploying deep learning models in safety-critical domains.
Existing UQ methods often either rely on multi-pass
inference, which increases computational cost, or restrict
expressiveness by using only final layer embeddings. In
this work, we propose a lightweight evidential meta-model
that leverages multi-layer feature fusion from a frozen
classifier, capturing both low-level textures and
high-level semantics to better estimate uncertainty. To
further enhance epistemic fidelity, we integrate maximum
weight-entropy (Max-WEnt) regularization, which encourages
hypothesis diversity without altering the base network or
adding test-time overhead. Experiments across seven
benchmarks, including medical (BACH, HAM10000, BreakHIS)
and natural image datasets (SVHN, Fashion-MNIST,
ImageNet-C), demonstrate consistent improvements in AUROC
and calibration compared to prior post-hoc UQ methods. Our
findings show that combining multi-layer evidential
modelling with Max-WEnt provides a robust, efficient, and
practical framework for trustworthy AI in high-stakes
applications.

Weight Entropy-Maximised Evidential Metamodel for Uncertainty Quantification (Student Abstract)

Embodied agents must reason causally, as correlation-based
models fail under intervention and distribution shift. This
challenge arises in domains like robotics and cyber-physical
systems, where agents balance efficiency and comfort under
uncertainty. We introduce POLICYGRID, unifying causal
discovery and control by treating each action as both
decision
and experiment. Leveraging constraint-based search, neural
causal models, and language model priors with interventional
validation, POLICYGRID yields adaptive, interpretable
policies. Across synthetic, real-world, and live
deployments, it
achieves superior causal recovery (F1 = 0.89) and 2.8×
better multi-objective performance than correlation-based
baselines, demonstrating safe, generalizable
decision-making.

POLICYGRID: Causal Discovery for Adaptive Policy Optimization in Embodied Agents (Student Abstract)

Well log datasets are often scarce, which hinders the
development of machine learning models for reservoir
analysis, a common challenge in the oil and gas industry.
We present VAEc-tMC, a Conditional Variational Autoencoder
designed to generate synthetic well log data conditioned on
rock type. By embedding geological context into the
generative process, our model addresses a critical gap
overlooked by existing methods. Our approach integrates a
Student’s t-distribution loss, a smoothed Kullback–Leibler
divergence, and low-variance Monte Carlo method sampling to
improve robustness and fidelity. When used for data
augmentation, the synthetic data preserve key statistical
properties of real logs and improve downstream lithology
classification by about 80% in AUC, 62% in accuracy, and
71% in F1. These findings validate the model’s ability to
generate geologically consistent synthetic data, extending
its applicability to reservoir modeling and downstream ML
workflows in data scarce environments.

Downloads

Next from AAAI 2026

Detecting Citation Hallucinations in Large Language Model Outputs (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES