Singapore

With the widespread application of Large Language Models (LLMs), it has become a significant concern to ensure their safety and prevent harmful responses. While current safe-alignment methods based on instruction fine-tuning and Reinforcement Learning from Human Feedback (RLHF) can effectively reduce harmful responses from LLMs, they often require high-quality datasets and heavy computational overhead during model training. Another way to align language models is to modify the logit of tokens in model outputs without heavy training. Recent studies have shown that contrastive decoding can enhance the performance of language models by reducing the likelihood of confused tokens. However, these methods require the manual selection of contrastive models or instruction templates, limiting the degree of contrast. To this end, we propose Adversarial Contrastive Decoding (ACD), an optimization-based framework to generate two opposite soft system prompts, the Safeguarding Prompt (SP) and the Adversarial Prompt (AP), for prompt-based contrastive decoding. The SP aims to promote safer outputs while the AP aims to exploit the harmful parts of the model, providing a strong contrast to align the model with safety. ACD only needs to apply a lightweight prompt tuning on a rather small anchor dataset without training the target model. Experiments conducted on extensive models and benchmarks demonstrate that the proposed method achieves much better safety performance than previous model training-free decoding methods without sacrificing its original generation ability.

AAAI 2026

Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions

ml: privacy

gtep: adversarial learning

robustness & trustworthiness

peai: safety

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

With the growing use of Large Language Model (LLM)-based
Question-Answering (QA) systems in education, it is
critical to evaluate their performance across individual
pipeline components. In this work, we introduce EduMod-LLM,
a modular function-calling LLM pipeline, and present a
comprehensive evaluation along three key axes: function
calling strategies, retrieval methods, and generative
language models. Our framework enables fine-grained
analysis by isolating and assessing each component. We
benchmark function-calling performance across LLMs, compare
our novel structure-aware retrieval method to vector-based
and LLM-scoring baselines, and evaluate various LLMs for
response synthesis. This modular approach reveals specific
failure modes and performance patterns, supporting the
development of interpretable and effective educational QA
systems. Our findings demonstrate the value of modular
function calling in improving system transparency and
pedagogical alignment.

EduMod-LLM: A Modular Approach for Designing Flexible and Transparent Educational Assistants

Automated Essay Scoring (AES) and Automatic Essay Feedback
(AEF) systems aim to reduce the workload of human raters in
educational assessment. However, most existing systems
prioritize numeric scoring accuracy over feedback quality
and are primarily evaluated on school-level writing. This
paper presents Multi-Agent Argumentation and Grammar
Integrated Critiquer (MAGIC), a framework using five
specialized agents to evaluate prompt adherence,
persuasiveness, organization, vocabulary, and grammar for
both holistic scoring and detailed feedback generation. To
support evaluation at the college level, we collated a
dataset of Graduate Record Examination (GRE) practice
essays with expert-evaluated scores and feedback. MAGIC
achieves substantial to near-perfect scoring agreement with
humans on the GRE data, outperforming baseline LLM models
while providing enhanced interpretability through its
multi-agent approach. For feedback quality evaluation, we
employ human annotators using a structured rubric and
report inter-annotator agreement.

MAGIC: Multi-Agent Argumentation and Grammar Integrated Critiquer

AI-supported tools have entered K-12 classrooms in recent
years to reshape student learning and skill-building. We
are particularly interested in AI’s application in literacy
subjects, such as English, where students are expected to
hone their critical thinking and public speaking skills
through AI interactions. This report details the pilot
implementation of Debate Guru, an AI-enhanced debate
education platform, across two secondary schools with
varying instructional contexts. Over the course of a summer
school course, educators integrated Debate Guru in one of
two ways: 1) by using Debate Guru’s complete curriculum; or
2) by combining platform resources with their own
instruction, such as a literary text. The pilot was
implemented with approximately 50 8th-11th grade students.
Findings suggest significant increases in students’
confidence, argumentative reasoning, and engagement.
Teachers reported high usability and strong pedagogical
value, while students responded positively to interactive
features and AI feedback integration.

“Debate Guru”: Honing Public Speaking Skills Among Secondary School Students with AI Tutoring Systems

Effective classroom teaching requires instructors to be
responsive to their students, such as by pivoting their
lectures in real-time to address common misconceptions that
their students may have developed. Classroom response
systems such as multiple-choice "clicker" systems are one
method by which instructors can gauge their students’
understanding during classroom lectures, but open-ended
questions that prompt students to engage in
self-explanation are better suited to promoting critical
thinking. Additionally, analyzing students’ natural
language responses typically requires time-consuming manual
analysis, which makes it challenging to implement in a
classroom setting. To address this challenge, we present an
LLM-driven method for automatically assessing students'
responses and generating an aggregated summary of LLM-based
evaluations for their self-explanations during
undergraduate classroom lectures. Our approach extracts
relevant knowledge components for a given question, tags
students’ responses according to whether they correctly
address each knowledge component, and generates class-level
summaries that highlight common misconceptions and gaps in
knowledge to support instructors in pivoting their lectures
in real time. We evaluate the system’s effectiveness at
these tagging and summarization tasks on data from an
undergraduate computer science course, using quantitative
and qualitative metrics such as relevance, sufficiency,
hallucination rate, and alignment with instructional goals
and desired feedback format gathered through instructor
interviews. Results suggest that the explanation-based
classroom response system can accurately analyze students’
natural language explanations.

An Explanation-Based Classroom Response System for Real-Time Analysis of Undergraduate Students’ Natural Language Explanations

We study the problem of (approximate) maximin share
(MMS) allocation of indivisible items among a set of agents.
We focus on the graphical valuation model, previously stud-
ied in (Christodoulou et al. 2023), in which the input is given
by a graph where edges correspond to items, and vertices
correspond to agents. An edge may have non-zero marginal
value only for its incident vertices. We study additive, XOS
and subadditive valuations and we present positive and neg-
ative results for (approximate) MMS fairness, and also for
(approximate) pair-wise maximin share (PMMS) fairness.

Exact and Approximate Maximin Share Allocations in Multi-Graphs

Context-based Offline Meta Reinforcement Learning (COMRL) has shown promising results in improving the cross-task generalization ability of meta-policies. However, current methods often lead to entangled task representations, in which each latent dimension is influenced by multiple causal factors that govern variations in environment dynamics and reward mechanisms. This entanglement can degrade generalization performance, particularly when multiple causal factors vary simultaneously across tasks. To address this limitation, we propose CAusally disentangled TAsk representation Learning (CATAL) method for COMRL that aims to improve the generalization ability of the meta-policy, where each latent dimension in the task representations aligns to a single causal factor.Theoretically, we show that under mild conditions, the task representations learned by CATAL are causally disentangled. Empirically, extensive results on multi-task MuJoCo benchmarks show that CATAL consistently outperforms existing COMRL baselines in both in-distribution and out-of-distribution generalization.

CATAL: Causally Disentangled Task Representation Learning for Offline Meta-Reinforcement Learning

Recent advances in software vulnerability detection have been driven by Language Model (LM)-based approaches. However, these models remain vulnerable to adversarial attacks that exploit lexical and syntax perturbations, allowing critical flaws to evade detection. Existing black-box attacks on LM-based vulnerability detectors primarily rely on isolated perturbation strategies, limiting their ability to efficiently explore the adversarial code space for optimal perturbations. To bridge this gap, we propose HogVul, a black-box adversarial code generation framework that integrates both lexical and syntax perturbations under a unified dual-channel optimization strategy driven by Particle Swarm Optimization (PSO). By systematically coordinating two-level perturbations, HogVul effectively expands the search space for adversarial examples, enhancing the attack efficacy. Extensive experiments on four benchmark datasets demonstrate that HogVul achieves an average attack success rate improvement of 26.05% over state-of-the-art baseline methods. These findings highlight the potential of hybrid optimization strategies in exposing model vulnerabilities.

HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors

This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines which includes statements on the bounds of generalisation and approximation errors, and sample complexity. For classification problems, this approach allows us to learn bounded geometric structures around given data points and hence solve the global model learning problem in an efficient way by exploiting convexity properties of the related optimisation problem in a Reproducing Kernel Hilbert Space (RKHS). In this way, we can reduce classification problems to determining the closest bounded geometric structure from a given data point. Further advantages that come with our solution is that our approach does not require clients to perform multiple epochs of local optimisation using stochastic gradient descent, nor require rounds of communication between client/server for optimising the global model. We highlight that numerous experiments have shown that the proposed method is a competitive alternative to the state-of-the-art.

Geometrically Inspired Kernel Machines for Collaborative Learning Beyond Gradient Descent

Small language models (SLMs) are increasingly deployed on edge devices, making their safety alignment crucial yet challenging. 
Current shallow alignment methods that rely on direct refusal of malicious queries fail to provide robust protection, particularly against adversarial jailbreaks. 
While deliberative safety reasoning alignment offers deeper alignment for defending against sophisticated attacks, effectively implanting such reasoning capability in SLMs with limited capabilities remains an open challenge. 
Moreover, safety reasoning incurs significant computational overhead as models apply reasoning to nearly all queries, making it impractical for resource-constrained edge deployment scenarios that demand rapid responses. 
We propose EASE, a novel framework that enables practical and Efficient Safety Alignment for Small languagE models. 
Our approach first identifies the optimal safety reasoning teacher that can effectively distill safety reasoning capabilities to SLMs.
We then align models to selectively activate safety reasoning for dangerous adversarial jailbreak queries while providing direct responses to straightforward malicious queries and general helpful tasks. 
This selective mechanism enables small models to maintain robust safety guarantees against sophisticated attacks while preserving computational efficiency for benign interactions. Experimental results demonstrate that EASE reduces jailbreak attack success rates by up to 17% compared to shallow alignment methods while reducing inference overhead by up to 90% compared to deliberative safety reasoning alignment, making it practical for SLMs real-world edge deployments.

EASE: Practical and Efficient Safety Alignment for Small Language Models

This paper introduces a novel system for in-home cognitive health assessment using ambient sensors and a machine learning technology that can robustly detect mild cognitive impairment (MCI) despite its noisy and sparsely limited available data. The learned model can transparently explain the aspects of individuals' daily lives led to the prediction, while reliably predicting MCI, providing more insights to healthcare workers for further clinical interventions. We developed the robust transparent machine learning model, based on fusion adaptive resonance theory (Fusion ART) neural network to learn individuals' daily patterns of activity from continuous sensor data in terms of a suite of digital biomarkers reflecting four key domains: physical activity, daily routines, cognitive engagement, and sleep patterns. 
Based on a longitudinal study of over one hundred participants, deployed with non-intrusive sensors in their homes to undergo parallel clinical evaluation across a period of five years, our model successfully identified individuals with MCI, achieving high predictive accuracy regardless the noisy and sparse availability of data. As a transparent neural network, the learned model can also serve as classification rules to distinguish MCI from normal cognition (NC) cases based on the digital biomarkers. These results demonstrate that passively collected, sensor-derived digital biomarkers can be leveraged to indicate cognitive status and potentially providing clinically meaningful insights on the impairment conditions. We also discuss the practical challenges and lessons learned from this real-world deployment to inform future large-scale implementations of such AI-driven health monitoring systems.

Downloads

Next from AAAI 2026

EduMod-LLM: A Modular Approach for Designing Flexible and Transparent Educational Assistants

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES