Singapore

“Refusals must be resilient, not brittle.” Yet guarding
refusals against adversarial phrasing and shifting user
contexts remains difficult: large language models (LLMs)
still yield to jailbreak prompts that evade safety filters
and surface harmful content. Despite gains from methods
like reinforcement learning from human feedback (RLHF) and
supervised fine-tuning (SFT), these global controls blur
refusal boundaries across domains such as violence, fraud,
and privacy, and frequently collapse under adversarial
variation. We propose Refusal Activation Steering (RAS), a
training-free, inference-time method that uses contrastive
activations to shift LLM responses, biasing generation
trajectories toward refusals without altering model
weights. The approach is modular and domain-targetable,
avoiding collateral refusals on benign queries while
strengthening activation-space boundaries for unsafe
content. On adversarial evaluations with an 8B
instruction-tuned model, we find that steering improves
refusal rate by 52% and reduces attack success rate by 40%,
establishing a lightweight and interpretable safety layer
for robust refusal consistency.

AAAI 2026

Always Refuse: Steering LLMs Against Jailbreaks with Contrastive Activations (Student Abstract)

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Deep learning has advanced medical imaging, but limited
interpretability hinders clinical adoption. Class
activation maps (CAMs) provide visual explanations, yet
methods such as Score-CAM are computationally expensive,
requiring a forward pass for each activation map and
limiting real-time applicability despite their high
fidelity. To overcome this limitation, LowRank-CAM is
proposed, which aggregates activation maps into a global
matrix and applies singular value decomposition (SVD) to
extract dominant spatial modes. The resulting top-r
attention masks, with r much smaller than K, replace
per-channel perturbations and require only r forward passes
through the classifier head. This low-rank formulation
substantially reduces complexity while preserving
class-discriminatory importance. Experiments on
musculoskeletal radiographs with Inception-v3 demonstrate
that LowRank-CAM achieves a 4.73× speedup over Score-CAM
while maintaining comparable visual clarity and diagnostic
relevance.

LowRank-CAM: A Computationally Efficient and Interpretable Framework for Medical Image Analysis (Student Abstract)

Quantum machine learning (QML) has attracted growing
interest for their ability to achieve superior performance
with significantly fewer parameters.
However, the high cost and scarcity of current hardware
push inference to cloud-hosted quantum devices, creating a
tension between verifiability and confidentiality.
This work proposes a novel framework that converts quantum
neural network operations into classical arithmetic
circuits that faithfully approximate genuine quantum
computations. By encrypting these circuits with
zero-knowledge proofs, it ensures computational validity
while concealing internal parameters. Experimental results
show that our classical circuits achieve fidelity above
0.9996 and total variation distance below 1% compared to
actual quantum computations, verifying the practicality of
trustworthy and privacy-preserving quantum inference.

zkQML: Verifiable and Privacy-Preserving Inference for Quantum Machine Learning (Student Abstract)

We present Graph Neural ODEs (GNODEs) for modeling tumor
microenvironment dynamics with mathematically guaranteed
stability and conservation properties. Unlike bulk ODEs
that miss spatial heterogeneity or discrete GNNs that
inadequately capture continuous biological processes,
GNODEs provide continuous-time evolution with explicit
adjacency-aware dynamics while maintaining provable
trajectory bounds. Our framework ensures: (1) existence and
uniqueness of solutions under dynamic graph topology, (2)
Lyapunov stability preventing unphysical states like
negative cell counts, and (3) exact conservation of
biological invariants through architectural constraints.
Benchmarking on synthetic tumor data demonstrates that
GNODE accurately captures resistant cell fraction dynamics
(0.282 predicted vs 0.242 true) while graph-free
alternatives fail completely (0.000), proving that
stability-constrained local interactions are essential for
modeling emergent resistance.

Graph Neural ODEs with Stability and Conservation Guarantees for Tumor Microenvironment Dynamics (Student Abstract)

Large language models (LLMs) often generate
hallucinations—fluent yet factually incorrect
responses—that undermine reliability in knowledge-intensive
tasks.
Existing approaches for hallucination mitigation typically
rely on external retrieval modules or probability
heuristics, which either require additional resources or
lack interpretability. In this work, we propose a
diffusion-based hallucination detection framework (DHDF)
that leverages U-Net denoising to reconstruct consensus
answers from multiple LLM outputs. If the diffusion process
exhibits spurious convergence away from factual ground
truth, it provides a clear signal of hallucination. To
quantify factual correctness, we incorporate TruthfulQA
scores as a fact-grounded evaluation metric, distinguishing
well-aligned models (high scores) from hallucination-prone
models (low scores). Experimental results demonstrate that
convergence dynamics under diffusion, combined with
fact-grounded QA evaluation, offer an effective and
interpretable pathway for hallucination detection without
relying on external knowledge bases.

Diffusion for Combating the Hallucination in Large Language Models (Student Abstract)

Extending LLM context windows is crucial for long range tasks. RoPE-based position interpolation (PI) methods like linear and frequency-aware scaling extend input lengths without retraining, while post-training quantization (PTQ) enables practical deployment. We show that combining PI with PTQ degrades accuracy due to coupled effects long context aliasing, dynamic range dilation, axis grid anisotropy, and outlier shifting that induce position-dependent logit noise. We provide the first systematic analysis of PI plus PTQ and introduce two diagnostics: Interpolation Pressure (per-band phase scaling sensitivity) and Tail Inflation Ratios (outlier shift from short to long contexts). To address this, we propose Q-ROAR, a RoPE-aware, weight-only stabilization that groups RoPE dimensions into a few frequency bands and performs a small search over per-band scales for W_Q, W_K, with an optional symmetric variant to preserve logit scale. The diagnostics guided search uses a tiny long-context dev set and requires no fine-tuning, kernel, or architecture changes. Empirically, Q-ROAR recovers up to 0.7% accuracy on standard tasks and reduces GovReport perplexity by more than 14%, while preserving short-context performance and compatibility with existing inference stacks.

Q-ROAR: Outlier-Aware Rescaling for RoPE Position Interpolation in Quantized Long-Context LLMs (Student Abstract)

Catastrophic forgetting remains a central challenge in
lifelong learning, where newly acquired knowledge
interferes with previously learned tasks, degrading
performance over time. Mitigation strategies such as
rehearsal and regularization have been proposed, but both
introduce limitations, either by retaining old data or by
constraining model updates in ways that may impair
learning. Complicating matters, recent findings show that
feature-space overlap between tasks can produce similar
performance drops even in models that memorize data, making
it difficult to distinguish true forgetting from
representational interference. Current accuracy-based
metrics fail to disentangle these effects, undermining
diagnostic clarity.
In this paper, we introduce the Overlap Index, an
incremental cluster validity index adapted from the
inter-cluster component of the iCONN index, which
quantifies overlap between feature representations in input
or latent space. We then introduce the Overshadowing and
Forgetting Index, an online meta-metric that leverages the
Overlap Index to attribute performance degradation to
catastrophic forgetting, class overshadowing, or both. Our
experimental results demonstrate that these tools enable
more precise online and batch-mode evaluation of continual
learning systems, paving the way for more targeted
mitigation strategies.

New Metrics for Disambiguating Feature Overlap and Catastrophic Forgetting in Incremental Learning Contexts (Student Abstract)

Traffic accidents pose a significant societal challenge,
with many fatalities being avoidable through timely
emergency response. We introduce IMPACT (Integrated
Multimodal Pipeline for Rapid Accident Causality Tracking),
a scalable AI framework designed for autonomous, rapid
traffic incident analysis using existing urban CCTV
infrastructure. IMPACT integrates a low-latency, CPU-based
classical computer vision module for efficient key-frame
filtering with the advanced causal reasoning of Multimodal
Large Language Models (MLLMs). Our pre-processing runs in
real-time (approx. 24 FPS) on a consumer-grade CPU (Intel
Core-i3 11th Gen.), and drastically reduces expensive MLLM
invocations by over 92% compared to naive sparse-sampling.
We also release code to support further research.

IMPACT: Integrated Multimodal Pipeline for Rapid Accident Causality Tracking (Student Abstract)

Falls are a major cause of injury and loss of inde-pendence
among older adults, making prevention a critical priority
for healthy aging. Early detection of fall risk through
screening can enable timely inter-ventions that reduce
these adverse outcomes. Tradi-tional clinical methods, such
as using the history of falls and simple
questionnaire-based screening, provide a quick and low-cost
means of assessment but often have poor predictive accuracy
and fail in presence of missing information. To support
cost-effective screening and intervention, there is a need
for tools that can accurately assess fall-risk in pres-ence
of missing information with better accuracy than current
approaches. In this study, we devel-oped a k-Nearest
Neighbors (KNN) model using da-ta from the 2,291 older
Singaporeans and achieved an AUC of 0.62 and a F1 score of
0.40. This model is capable of simultaneously imputing
missing fea-ture values while screening an individual for
fall-risk.

Assessing the Risk of Falls in Older Adults Living in the Community Using Machine Learning Models with Imputation (Student Abstract)

Mental health monitoring faces challenges from fragmented
data and opaque risk scores. We present BRI-MH, an in-
terpretable multimodal framework combining behavioral sig-
nals with cognitive features from large language models to
produce a weekly Behavioral Risk Index. Unlike prior work
with isolated or black-box scores, BRI-MH offers transpar-
ent, actionable insights and links continuous monitoring to
adaptive feedback and therapeutic support, bridging digital
phenotyping and clinical care

BRI-MH: Behavioral Risk Index for Mental Health — An Interpretable Multimodal LLM-Augmented Framework (Student Abstract)

Large Multimodal Models (LMMs) often hallucinate objects and struggle with compositional reasoning in complex visual scenes. Structured Scene Graph (SG) representations explicitly encoding objects, attributes, and relations can mitigate these issues, however finetuning risks catastrophic forgetting. Recent zero-shot approaches prompt LMMs with scene graphs, yet typically rely on a single SG generated in one step, limiting capture of holistic context and question-specific details. We introduce a Dual-Layer Scene Graph Chain-of-Thought DLSG-CoT framework that enriches reasoning by combining two structured SGs: a Global Scene Graph (G-SG) that offers comprehensive image context, and a Query-Specific Scene Graph (Q-SG) produced through a two-step process targeting information relevant to the input query. Extensive experiments demonstrate that DLSG-CoT substantially improves LMM performance on compositional and context-sensitive tasks

Downloads

Next from AAAI 2026

LowRank-CAM: A Computationally Efficient and Interpretable Framework for Medical Image Analysis (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

LowRank-CAM: A Computationally Efficient and Interpretable Framework for Medical Image Analysis (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads