Singapore

Voice cloning technology poses significant privacy threats by enabling unauthorized speech synthesis from limited audio samples. Existing defenses based on imperceptible adversarial perturbations are vulnerable to common audio preprocessing such as denoising and compression. We propose SceneGuard, a training-time voice protection method that applies scene-consistent audible background noise to speech recordings. Unlike imperceptible perturbations, SceneGuard leverages naturally occurring acoustic scenes (e.g., airport, street, park) to create protective noise that is contextually appropriate and robust to countermeasures. We evaluate SceneGuard on text-to-speech training attacks, demonstrating 5.5% speaker similarity degradation with extremely high statistical significance (p &lt; 10^{-15}, Cohen&#39;s d = 2.18) while preserving 98.6% speech intelligibility (STOI = 0.986). Robustness evaluation shows that SceneGuard maintains or enhances protection under five common countermeasures including MP3 compression, spectral subtraction, lowpass filtering, and downsampling. Our results suggest that audible, scene-consistent noise provides a more robust alternative to imperceptible perturbations for training-time voice protection. The source code are available at: https://github.com/richael-sang/SceneGuard.

AAAI 2026

SceneGuard: Training-Time Voice Protection with Scene-Consistent Audible Background Noise

Voice cloning technology poses significant privacy threats by enabling unauthorized speech synthesis from limited audio samples. Existing defenses based on imperceptible adversarial perturbations are vulnerable to common audio preprocessing such as denoising and compression. We propose SceneGuard, a training-time voice protection method that applies scene-consistent audible background noise to speech recordings. Unlike imperceptible perturbations, SceneGuard leverages naturally occurring acoustic scenes (e.g., airport, street, park) to create protective noise that is contextually appropriate and robust to countermeasures. We evaluate SceneGuard on text-to-speech training attacks, demonstrating 5.5% speaker similarity degradation with extremely high statistical significance (p < 10^{-15}, Cohen's d = 2.18) while preserving 98.6% speech intelligibility (STOI = 0.986). Robustness evaluation shows that SceneGuard maintains or enhances protection under five common countermeasures including MP3 compression, spectral subtraction, lowpass filtering, and downsampling. Our results suggest that audible, scene-consistent noise provides a more robust alternative to imperceptible perturbations for training-time voice protection. The source code are available at: https://github.com/richael-sang/SceneGuard.

workshop paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Artificial intelligence offers powerful methods for audio
processing and analysis but complex workflows and required
programming skills often limit access for students and
domain experts like marine bioacousticians and soundscape
ecologists. We present an application “Sound-AI”, a code
free and interactive tool that lowers these barriers by
providing users to construct and explore complete AI
pipeline for audio data analysis. Starting from raw
recordings, users can choose from various feature
extraction techniques (MFCC, OpenL3), apply dimensionality
reduction method (PCA, t-SNE, UMAP), and optionally perform
unsupervised clustering (K-Means, GMM, DBSCAN). The results
are displayed in an interactive 2D visualization where user
can compare multiple plots by varying multiple techniques
i.e. t-SNE vs PCA. Interactive plots allow selection of
points or clusters of interest, visualize spectrograms in
desired frequency range, and play audio clip of associated
points. An integrated ‘Help’ feature provides explanation
of each method (i.e. what it is, how it works and practical
use in different domains like bioacoustics), fostering both
conceptual understanding and practical skill. For
precomputed features or embeddings, this tool also supports
training and evaluating various machine learning algorithms
with visual feedback. By merging accessibility,
interactivity, pedagogy, and domain relevance, “Sound-AI”
demystifies AI methods for interdisciplinary education and
supporting research in audio analysis.

Sound-AI: A Pedagogical Tool for Exploring AI in Audio and Bioacoustic Research

This paper considers the development of an AI-based
provably-correct mathematical proof tutor. While Large
Language Models (LLMs) allow seamless communication in
natural language, they are error prone. Theorem provers
such as Lean allow for provable-correctness, but these are
hard for students to learn. We present a proof-of-concept
system (LeanTutor) by combining the complementary strengths
of LLMs and theorem provers. LeanTutor is composed of three
modules: (i) an autoformalizer/proof-checker, (ii) a
next-step generator, and (iii) a natural language feedback
generator. To evaluate the system, we introduce PeanoBench,
a dataset of 371 Peano Arithmetic proofs in human-written
natural language and formal language, derived from the
Natural Numbers Game.

LeanTutor: Towards a Verified AI Mathematical Proof Tutor

Educational question generation (EQG) is a crucial
component of intelligent educational systems, significantly
aiding self-assessment, active learning, and personalized
education. While EQG systems have emerged, existing
datasets typically rely on predefined, carefully edited
texts, failing to represent real-world classroom content,
including lecture speech with a set of complementary
slides. To bridge this gap, we collect a dataset of
educational questions based on lectures from real-world
classrooms. On this realistic dataset, we find that current
methods for EQG struggle with accurately generating
questions from educational videos, particularly in aligning
with specific timestamps and target answers. Common
challenges include selecting informative contexts from
extensive transcripts and ensuring generated questions
meaningfully incorporate the target answer. To address the
challenges, we introduce a novel framework utilizing large
language models (LLMs) for dynamically selecting and
rewriting contexts based on target timestamps and answers
in lecture videos. First, our framework selects contexts
from both lecture transcripts and video keyframes based on
answer relevance and temporal proximity. Then, we integrate
the contexts selected from both modalities and rewrite them
into answer-containing knowledge statements, to enhance the
logical connection between the contexts and the desired
answer. This approach significantly improves the quality
and relevance of the generated questions.

Context Selection and Rewriting for Video-based Educational Question Generation

Educational assessment requires understanding student
problem-solving processes, not just final answers. Current
AI-driven analytics focus on static outcomes, missing
valuable insights from temporal dynamics. We present
Explain-from-Stroke, a practical framework that captures
invisible learning processes by integrating handwriting
dynamics with vision-language models. Our approach extracts
temporal features—writing speed, pauses, and
revisions—providing supplementary context for generating
meaningful insights into hidden aspects of student
reasoning. Deployed with real classroom data from a
Japanese secondary school, our system demonstrates 18.2\%
improvement in cognitive depth analysis over static
approaches. This work provides educators with accessible
process-oriented analysis that reveals invisible learning
processes using standard tablet technology.

Explain-from-Stroke: Capturing Invisible Learning Processes Through Handwriting Dynamics Analysis

As generative AI rapidly enters higher education, its
cognitive, motivational, and social impacts across various
disciplines remain underexplored. This qualitative study
investigates the impact of disciplinary epistemologies and
individual digital literacy on AI-assisted academic English
reading among Chinese EFL undergraduates. Guided by
Cognitive Load Theory and Self-Determination Theory, we
studied 45 students at a Hong Kong university across
hard/pure and soft/applied fields. Open-ended
questionnaires and 32 interviews were analyzed via hybrid
thematic analysis, with intercoder agreement above 85% and
member-checking. Participants in soft/applied fields more
often reported AI reducing extraneous load and supporting
deeper semantic elaboration, whereas reports in hard/pure
fields more frequently described surface-level support
(e.g., glossing terminology). Excessive reliance was
associated with cognitive offloading and an illusory sense
of mastery, shaped by digital literacy and metacognitive
awareness. Socially, AI sometimes displaced routine
exchanges but, when integrated into group contexts,
facilitated higher-order collaboration. The study
elaborates applications of CLT and SDT by showing how
disciplinary and individual factors shape AI’s cognitive
and motivational roles. Practically, it proposes
discipline-sensitive design principles, metacognitive
prompts, clear usage boundaries, and interaction-focused
affordances, pointing to deployable interventions (e.g.,
self-check tasks, staged AI fade-out scaffolds, reflective
prompts). Ethical approval and consent were obtained.

Generative AI as a Cognitive Co-Participant: Disciplinary Modulation of EFL Academic Reading Load and Motivation

Large Language Models (LLMs) have shown immense potential
in education, automating tasks like quiz generation and
content summarization. However, generating effective
presentation slides introduces unique challenges due to the
complexity of multimodal content creation and the need for
precise, domain-specific information. Existing LLM-based
solutions often fail to produce reliable and informative
outputs, limiting their educational value. To address these
limitations, we introduce SlideBot - a modular, multi-agent
slide generation framework that integrates LLMs with
retrieval, structured planning, and code generation.
SlideBot is organized around three pillars:
informativeness, ensuring deep and contextually grounded
content; reliability, achieved by incorporating external
sources through retrieval; and practicality, which enables
customization and iterative feedback through instructor
collaboration. It incorporates evidence-based instructional
design principles from Cognitive Load Theory (CLT) and the
Cognitive Theory of Multimedia Learning (CTML), using
structured planning to manage intrinsic load and consistent
visual macros to reduce extraneous load and enhance
dual-channel learning. Within the system, specialized
agents collaboratively retrieve information, summarize
content, generate figures, and format slides using LATEX,
aligning outputs with instructor preferences through
interactive refinement. Evaluations from domain experts and
students in AI and biomedical education show that SlideBot
consistently enhances conceptual accuracy, clarity, and
instructional value. These findings demonstrate SlideBot’s
potential to streamline slide preparation while ensuring
accuracy, relevance, and adaptability in higher education.

SlideBot: A Multi-Agent Framework for Generating Informative, Reliable, Multi-Modal Presentations

In computer-supported collaborative learning environments,
analyzing student dialogue is essential for understanding
collaborative problem-solving behaviors and supporting
effective learning. Prior work often treats all dialogue
interactions uniformly, failing to capture how specific
dialogue interaction differentially impact learning
experiences and outcomes. To address this limitation, we
introduce a dialogue-based learning analytics framework
that integrates weighted temporal clustering of dialogue
with large language model-based interpretation. Our
framework identifies student interaction patterns most
predictive of group learning gains and uses these insights
to enable early prediction of learning outcomes and
generate pedagogically meaningful interpretation. We
evaluate our framework on collaborative dialogue from
middle school students engaged in a collaborative
game-based learning environment. Our results show that our
framework achieves 83.1\% accuracy in learning outcome
prediction. In addition, expert evaluations and case
studies demonstrate that the identified weighted dialogue
patterns reflect key collaborative problem-solving
behaviors recognized as important in collaborative
learning. By surfacing high-impact interaction patterns and
enabling prioritized interpretation generation, our
framework provides a promising approach for accurately
analyzing students’ collaborative dialogue.

A Dialogue-Based Learning Analytics Framework for Collaborative Game-Based Learning

Automated scoring of written constructed responses
typically relies on separate models per task, straining
computational resources, storage, and maintenance in
real-world education settings. We propose
\textsc{UniMoE-Guided}, a knowledge-distilled multi-task
Mixture-of-Experts (MoE) approach that transfers expertise
from multiple task-specific large models (teachers) into a
single compact, deployable model (student). The student
combines (i) a shared encoder for cross-task
representations, (ii) a gated MoE block that balances
shared and task-specific processing, and (iii) lightweight
task heads. Trained with both ground-truth labels and
teacher guidance, the student matches strong task-specific
models while being far more efficient to train, store, and
deploy. Beyond efficiency, the MoE layer improves transfer
and generalization: experts develop reusable skills that
boost cross-task performance and enable rapid adaptation to
new tasks with minimal additions and tuning. On nine
NGSS-aligned science-reasoning tasks (seven for
training/evaluation and two held out for adaptation),
\textsc{UniMoE-Guided} attains performance comparable to
per-task models while using ~6x less storage than
maintaining separate students, and 87x less than the
20B-parameter teacher. The method offers a practical path
toward scalable, reliable, and resource-efficient automated
scoring for classroom and large-scale assessment systems.

Generalizable and Efficient Automated Scoring with a Knowledge-Distilled Multi-Task Mixture-of-Experts

Theories of embodied learning emphasize that learning
processes are grounded in bodily actions and interactions
with the environment, suggesting that movements play a
fundamental role in problem solving, decision making, and
learning. This perspective holds particular relevance for
making-based learning settings, where patterns of movement
and spatial engagement can reveal strategic expertise.
Prior research has examined distinctions between students
who learned and did not learn, but manual coding of actions
presents scalability and real-time application challenges.
To address this gap, we develop a computer vision–based
analysis pipeline for automated detection and
characterization of hand movements during complex assembly
tasks. We apply this approach to video data of students
engaged in the assembly of a differential gearbox,
quantifying metrics such as amount and speed of movement.
Results indicate that learners show fewer right-hand
movements than novices and exhibit reduced movement speed,
with a progressive decline in speed as the task unfolds.
Non-learners, by contrast, display more uneven hand
movement speed. These findings highlight measurable
differences in actions of learners and non-learners, and
therefore have implications for learning support.
Specifically, the ability to computationally distinguish
movement profiles can inform the design of adaptive
learning interventions, providing real-time performance
assessment and targeted feedback for making-based learning.

Thinking Through the Hands: An Exploratory Study of Hand Movements to Assess Students Problem-Solving in Mechanistic Reasoning Tasks

Membership inference attacks (MIAs) test whether a data point was part of a model's training set, posing serious privacy risks. Existing methods often depend on shadow models or heavy query access, which limits their practicality. We propose GP-MIA, an efficient and interpretable approach based on Gaussian process (GP) meta-modeling. Using post-hoc metrics such as accuracy, entropy, dataset statistics, and optional sensitivity features (e.g. gradients, NTK measures) from a single trained model, GP-MIA trains a GP classifier to distinguish members from non-members while providing calibrated uncertainty estimates. Experiments on synthetic data, real-world fraud detection data, CIFAR-10, and WikiText-2 show that GP-MIA achieves high accuracy and generalizability, offering a practical alternative to existing MIAs.

Premium content

Next from AAAI 2026

Sound-AI: A Pedagogical Tool for Exploring AI in Audio and Bioacoustic Research

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES