Singapore

Large language models (LLMs) often generate
hallucinations—fluent yet factually incorrect
responses—that undermine reliability in knowledge-intensive
tasks.
Existing approaches for hallucination mitigation typically
rely on external retrieval modules or probability
heuristics, which either require additional resources or
lack interpretability. In this work, we propose a
diffusion-based hallucination detection framework (DHDF)
that leverages U-Net denoising to reconstruct consensus
answers from multiple LLM outputs. If the diffusion process
exhibits spurious convergence away from factual ground
truth, it provides a clear signal of hallucination. To
quantify factual correctness, we incorporate TruthfulQA
scores as a fact-grounded evaluation metric, distinguishing
well-aligned models (high scores) from hallucination-prone
models (low scores). Experimental results demonstrate that
convergence dynamics under diffusion, combined with
fact-grounded QA evaluation, offer an effective and
interpretable pathway for hallucination detection without
relying on external knowledge bases.

AAAI 2026

Diffusion for Combating the Hallucination in Large Language Models (Student Abstract)

fact-checking / misinformation detection (nlp focus)

(large) language models

deep generative models & autoencoders

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Extending LLM context windows is crucial for long range tasks. RoPE-based position interpolation (PI) methods like linear and frequency-aware scaling extend input lengths without retraining, while post-training quantization (PTQ) enables practical deployment. We show that combining PI with PTQ degrades accuracy due to coupled effects long context aliasing, dynamic range dilation, axis grid anisotropy, and outlier shifting that induce position-dependent logit noise. We provide the first systematic analysis of PI plus PTQ and introduce two diagnostics: Interpolation Pressure (per-band phase scaling sensitivity) and Tail Inflation Ratios (outlier shift from short to long contexts). To address this, we propose Q-ROAR, a RoPE-aware, weight-only stabilization that groups RoPE dimensions into a few frequency bands and performs a small search over per-band scales for W_Q, W_K, with an optional symmetric variant to preserve logit scale. The diagnostics guided search uses a tiny long-context dev set and requires no fine-tuning, kernel, or architecture changes. Empirically, Q-ROAR recovers up to 0.7% accuracy on standard tasks and reduces GovReport perplexity by more than 14%, while preserving short-context performance and compatibility with existing inference stacks.

Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs (Student Abstract)

Catastrophic forgetting remains a central challenge in
lifelong learning, where newly acquired knowledge
interferes with previously learned tasks, degrading
performance over time. Mitigation strategies such as
rehearsal and regularization have been proposed, but both
introduce limitations, either by retaining old data or by
constraining model updates in ways that may impair
learning. Complicating matters, recent findings show that
feature-space overlap between tasks can produce similar
performance drops even in models that memorize data, making
it difficult to distinguish true forgetting from
representational interference. Current accuracy-based
metrics fail to disentangle these effects, undermining
diagnostic clarity.
In this paper, we introduce the Overlap Index, an
incremental cluster validity index adapted from the
inter-cluster component of the iCONN index, which
quantifies overlap between feature representations in input
or latent space. We then introduce the Overshadowing and
Forgetting Index, an online meta-metric that leverages the
Overlap Index to attribute performance degradation to
catastrophic forgetting, class overshadowing, or both. Our
experimental results demonstrate that these tools enable
more precise online and batch-mode evaluation of continual
learning systems, paving the way for more targeted
mitigation strategies.

New Metrics for Disambiguating Feature Overlap and Catastrophic Forgetting in Incremental Learning Contexts (Student Abstract)

Traffic accidents pose a significant societal challenge,
with many fatalities being avoidable through timely
emergency response. We introduce IMPACT (Integrated
Multimodal Pipeline for Rapid Accident Causality Tracking),
a scalable AI framework designed for autonomous, rapid
traffic incident analysis using existing urban CCTV
infrastructure. IMPACT integrates a low-latency, CPU-based
classical computer vision module for efficient key-frame
filtering with the advanced causal reasoning of Multimodal
Large Language Models (MLLMs). Our pre-processing runs in
real-time (approx. 24 FPS) on a consumer-grade CPU (Intel
Core-i3 11th Gen.), and drastically reduces expensive MLLM
invocations by over 92% compared to naive sparse-sampling.
We also release code to support further research.

IMPACT: Integrated Multimodal Pipeline for Rapid Accident Causality Tracking (Student Abstract)

Falls are a major cause of injury and loss of inde-pendence
among older adults, making prevention a critical priority
for healthy aging. Early detection of fall risk through
screening can enable timely inter-ventions that reduce
these adverse outcomes. Tradi-tional clinical methods, such
as using the history of falls and simple
questionnaire-based screening, provide a quick and low-cost
means of assessment but often have poor predictive accuracy
and fail in presence of missing information. To support
cost-effective screening and intervention, there is a need
for tools that can accurately assess fall-risk in pres-ence
of missing information with better accuracy than current
approaches. In this study, we devel-oped a k-Nearest
Neighbors (KNN) model using da-ta from the 2,291 older
Singaporeans and achieved an AUC of 0.62 and a F1 score of
0.40. This model is capable of simultaneously imputing
missing fea-ture values while screening an individual for
fall-risk.

Assessing the Risk of Falls in Older Adults Living in the Community Using Machine Learning Models with Imputation (Student Abstract)

Mental health monitoring faces challenges from fragmented
data and opaque risk scores. We present BRI-MH, an in-
terpretable multimodal framework combining behavioral sig-
nals with cognitive features from large language models to
produce a weekly Behavioral Risk Index. Unlike prior work
with isolated or black-box scores, BRI-MH offers transpar-
ent, actionable insights and links continuous monitoring to
adaptive feedback and therapeutic support, bridging digital
phenotyping and clinical care

BRI-MH: Behavioral Risk Index for Mental Health — An Interpretable Multimodal LLM-Augmented Framework (Student Abstract)

Large Multimodal Models (LMMs) often hallucinate objects and struggle with compositional reasoning in complex visual scenes. Structured Scene Graph (SG) representations explicitly encoding objects, attributes, and relations can mitigate these issues, however finetuning risks catastrophic forgetting. Recent zero-shot approaches prompt LMMs with scene graphs, yet typically rely on a single SG generated in one step, limiting capture of holistic context and question-specific details. We introduce a Dual-Layer Scene Graph Chain-of-Thought DLSG-CoT framework that enriches reasoning by combining two structured SGs: a Global Scene Graph (G-SG) that offers comprehensive image context, and a Query-Specific Scene Graph (Q-SG) produced through a two-step process targeting information relevant to the input query. Extensive experiments demonstrate that DLSG-CoT substantially improves LMM performance on compositional and context-sensitive tasks

Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts (Student Abstract)

Automated negotiation, a form of interaction among
autonomous agents, plays a central role in multi-agent
systems, yet the application of large language models
(LLMs) in this domain remains underexplored. An LLM can
serve as a meta-strategist, adaptively selecting explicit
strategies for execution by external strategic tools based
on its capabilities.
We propose LLM negotiators equipped with explicit strategic
tools, including time-dependent and tit-for-tat negotiation
strategies. Our results show that strategic tool enhanced
negotiators achieve approximately 16% higher average
utility compared with baseline, latest LLM negotiators.

Strategic Tool Enhanced AI Agent for Multi-Issue Negotiation (Student Abstract)

The Completely Automated Public Turing test to Tell Computers and Humans Apart (CAPTCHA) is widely deployed on the web as a security mechanism to distinguish humans from automated bots. However, their robustness is being challenged by the rapid advancements in AI, with models capable of near-human level character recognition rendering CAPTCHA obsolete. This research aims to systematically study the effect of multiple image corruptions, including elastic transformations, blur, noise, and occlusions, on human readability and automated solvers in text-based CAPTCHA recognition. We conduct experiments on multimodal large language models (MLLMs), a traditional deep learning-based optical character recognition (OCR) system, and human subjects. Using an existing CAPTCHA dataset and artificially corrupted versions, we analyze the recognition performance of AI models and humans, identifying vulnerabilities and patterns of robustness. The findings will contribute to a better understanding of CAPTCHA vulnerabilities and explore potential methods to increase the robustness of CAPTCHA in the era of advanced AI models.

Improving CAPTCHA Robustness via Controlled Image Corruptions (Student Abstract)

In a game of Network Restoration Games with Quotas, there
is an underlying graph where a subset of its edges have to
be restored by a set of agents. Each agent has a creation
cost for each such edge, a traversal cost for every edge of
the graph, and in addition they have a quota on the number
of edges they have to restore. Then, given a set of edges
that fulfill the quota, the cost of an agent is the cost of
creating these edges, plus the cost of reaching them, i.e.,
the traversal cost. We prove that any cost-minimizing
allocation is swap-stable, i.e., there is no profitable
exchange of edges between any pair of agents, but computing
one is hard even on trees. We complement this by designing
an algorithm that finds a swap-stable allocation on trees
in polynomial time and we quantify its cost against the
optimal one.

Network Restoration Games with Quotas (Student Abstract)

We present a context-aware diffusion model for multivariate
time series generation in dynamic and partially observed
environments, with applications to data-center computing
node's telemetry and beyond. The model integrates
pretrained textual embeddings to represent feature
semantics, enabling flexible, context-guided generation and
improved adaptability to unseen or re-ordered input
features. Built on a transformer architecture, it employs
both time-wise and feature-wise masking to support missing
data during training and inference. We show that the model
is robust to permutations with respect to the feature
dimension, mantaining stable performance in settings where
input configurations vary. Empirical evaluations on HPC
sensor data illustrate the model’s versatility across
generation and imputation tasks. This work introduces a
modular and generalizable framework for time series
modeling in complex, high-dimensional systems which can
serve as a digital-twin for data-center's compute node
telemetry.

Downloads

Next from AAAI 2026

Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES