Singapore

The Completely Automated Public Turing test to Tell Computers and Humans Apart (CAPTCHA) is widely deployed on the web as a security mechanism to distinguish humans from automated bots. However, their robustness is being challenged by the rapid advancements in AI, with models capable of near-human level character recognition rendering CAPTCHA obsolete. This research aims to systematically study the effect of multiple image corruptions, including elastic transformations, blur, noise, and occlusions, on human readability and automated solvers in text-based CAPTCHA recognition. We conduct experiments on multimodal large language models (MLLMs), a traditional deep learning-based optical character recognition (OCR) system, and human subjects. Using an existing CAPTCHA dataset and artificially corrupted versions, we analyze the recognition performance of AI models and humans, identifying vulnerabilities and patterns of robustness. The findings will contribute to a better understanding of CAPTCHA vulnerabilities and explore potential methods to increase the robustness of CAPTCHA in the era of advanced AI models.

AAAI 2026

Improving CAPTCHA Robustness via Controlled Image Corruptions (Student Abstract)

hai: user experience and usability

app: web

cv: adversarial attacks & robustness

hai: human-computer interaction

cv: multi-modal vision

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

In a game of Network Restoration Games with Quotas, there
is an underlying graph where a subset of its edges have to
be restored by a set of agents. Each agent has a creation
cost for each such edge, a traversal cost for every edge of
the graph, and in addition they have a quota on the number
of edges they have to restore. Then, given a set of edges
that fulfill the quota, the cost of an agent is the cost of
creating these edges, plus the cost of reaching them, i.e.,
the traversal cost. We prove that any cost-minimizing
allocation is swap-stable, i.e., there is no profitable
exchange of edges between any pair of agents, but computing
one is hard even on trees. We complement this by designing
an algorithm that finds a swap-stable allocation on trees
in polynomial time and we quantify its cost against the
optimal one.

Network Restoration Games with Quotas (Student Abstract)

We present a context-aware diffusion model for multivariate
time series generation in dynamic and partially observed
environments, with applications to data-center computing
node's telemetry and beyond. The model integrates
pretrained textual embeddings to represent feature
semantics, enabling flexible, context-guided generation and
improved adaptability to unseen or re-ordered input
features. Built on a transformer architecture, it employs
both time-wise and feature-wise masking to support missing
data during training and inference. We show that the model
is robust to permutations with respect to the feature
dimension, mantaining stable performance in settings where
input configurations vary. Empirical evaluations on HPC
sensor data illustrate the model’s versatility across
generation and imputation tasks. This work introduces a
modular and generalizable framework for time series
modeling in complex, high-dimensional systems which can
serve as a digital-twin for data-center's compute node
telemetry.

Context-Aware Diffusion for Telemetry Time Series with Permutation-Stable Feature Modeling (Student Abstract)

Ensuring proper use of personal protective equipment (PPE),
especially helmets, is critical for workplace safety.
Conventional object detectors often fail to distinguish
whether a helmet is worn correctly, and existing approaches
relying on ROI cropping or single-model pipelines are prone
to localization errors and false alarms. Moreover, most
prior studies do not guarantee real-time operation under
lightweight deployment constraints. To address these
challenges, we propose a lightweight YOLO11-based object
detector combined with a pose estimation model, achieving
both higher F1 score and lower false alarm rates while
maintaining real-time performance.

A Lightweight Safety Helmet Compliance Detection via Multimodal Fusion (Student Abstract)

Interdependent directed networks model real-life systems,
like trade flows and social interactions, where asymmetric
edges drive one-way cascades and mutual dependencies
amplify vulnerabilities. Dismantling these networks to
minimize the largest mutually strongly connected component
(MSCC) is an NP-hard problem. We propose Dismantling
Directed Interdependent Networks (DDIN), a novel
combination of Reinforcement Learning (RL) and Graph Neural
Networks (GNN) framework, to address this problem. Our
contributions include (i) a directed GraphSAGE encoder
separating in/out aggregations for asymmetry, (ii)
multi-relational attention fusing layer semantics, and
(iii) sum-tree prioritized n-step Deep Q-Network (DQN) for
efficient policy search in sparse states. Evaluated on
three directed multiplexes (FAO Trade, Homo Genetic,
Sanremo 2016), DDIN achieves 17-22% lower AUDC values
compared to heuristics like High Degree Attack (HDA) and
Directed Collective Influence (DCI).

DDIN: Reinforcement Learning with Asymmetric GNNs for Dismantling Directed Interdependent Networks (Student Abstract)

Building temperature prediction is crucial for energy
optimization and control in smart cities. We present a
hybrid framework combining XGBoost with physics-informed
neural networks (PINN) in a multi-stage sequential scaling
approach. Starting from single-zone, single-day
predictions, we progressively scale to multi-zone,
multi-year forecasts using real-world data from Google’s
Smart Building Simulator. Our method incorporates physics
enhanced features, temporal encodings, and inter-zone
interactions, achieving mean absolute errors (MAE) as low
as 0.169°F for weekly multi-zone predictions. For longer
horizons, we employ ensemble strategies, demonstrating
robust performance up to 2.5 years. Compared to pure
XGBoost or PINN baselines, our hybrid framework
consistently improves long-term prediction fidelity.
This work advances urban AI by enabling accurate long-term
building dynamics modeling
for downstream control tasks and bridges machine learning
with physics-based modeling approaches.

Bridging Machine Learning and Physics for Scalable Long-Term Building Temperature Prediction (Student Abstract)

Deep learning is increasingly applied to intraoperative and
surgical video analysis to enable real-time workflow
recognition, and decision support for improved surgical
precision. A key direction is modeling surgical activity as
triplets of instrument, action, and target, which provide a
richer representation of procedures. However, existing
approaches often depend on bounding-box annotations or lack
temporal context. We propose TWiST (Temporal Weakly
Supervised Triplet detection), a framework that combines
weakly supervised instrument localization, temporal
attention for triplet prediction, and grounding of triplets
with detected instruments. Our experiments show that TWiST
outperforms prior weakly supervised baselines.

TWiST: Temporal Weakly-Supervised Triplets Recognition in Surgical Videos (Student Abstract)

In this work, C2R-KD is proposed, applying a
Complex-to-Real projection to map complex domain features
into the real domain. C2R-KD mitigates complex-real domain
mismatch to strengthen the representational capacity of the
student model and further improves the knowledge
distillation model performance through the hybrid
distillation of features and logits simultaneously.
Experimental result demonstrates higher accuracy than the
conventional KD across all test environments.

C2R-KD: Complex to Real Knowledge Distillation (Student Abstract)

Efficient spam detection in resource-constrained
environments remains challenging due to class imbalance,
noisy text, and the computational demands of large
Transformer models. We introduce a novel coreset selection
framework based on a unified Entropy–Class-Balanced
Uncertainty-Density Ranking (CBUDR) scheme. Our method
prioritizes highly informative and uncertain samples while
ensuring diversity and class balance within the selected
subset. The framework flexibly supports multiple selection
strategies, including Top-K, Bottom-K, and adaptive
class-wise schemes, enabling robust performance even when
training on as little as 5% of the dataset. Extensive
experiments on benchmark datasets (UCI SMS, UTKML Twitter,
LingSpam) show that our ranking scheme achieves competitive
accuracy, precision, and recall while significantly
reducing computational cost. These results demonstrate that
carefully designed coreset strategies can surpass full-data
performance in both balanced and imbalanced settings,
highlighting the potential for deployment on low-power
devices and mobile platforms.

Adaptive Coreset Selection via Uncertainty-Density for Efficient Spam Detection (Student Abstract)

This paper introduces a multimodal masked autoencoder (MMAE) that jointly denoise and classifies signals by fusing time-domain IQ sequences and constellation diagrams within a cross-attentive transformer. The approach treats noise as a learnable modality to enhance robustness. A dynamic masking curriculum combines with domain-adversarial training and a hybrid loss function to promote domain-invariant features. Experimentation on RadioML 2018.01A and RadioML22 demonstrates superior accuracy across different SNR conditions while using substantially less labeled data than state-of-the-art approaches.

Fusing Time-Domain and Constellation Views: A Multimodal MAE for Wireless Signals (Student Abstract)

Longitudinal behavioral research relies on consistent
measurement across time, yet real-world constraints force
survey instruments to evolve, creating analytical
discontinuities that compromise validity. This challenge
intensifies during crises when researchers must rapidly
incorporate new behavioral domains while preserving
historical comparability. We address this problem through a
dual-path architecture that maintains analytical continuity
despite instrument changes. Using 15 waves of vaccination
surveys as a testbed, we demonstrate how modern AI
techniques can bridge both temporal gaps (from missing
data) and semantic gaps (from question evolution).
Our approach leverages LLM-generated semantic embeddings of
survey questions, enabling the Deep \& Cross Network to
model responses as a joint function of item meaning,
individual characteristics, and temporal context. The
framework demonstrates exceptional resilience to missing
data with semantic embeddings proving critical for bridging
questionnaire evolution. To address data sparsity
constraints, we develop cluster-informed synthetic data
generation via hierarchical prompting that produces
synthetic responses with strong distributional fidelity and
delivers substantial performance gains through mixed
real-synthetic training while reproducing empirical cluster
dynamics.

Downloads

Next from AAAI 2026

Network Restoration Games with Quotas (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Network Restoration Games with Quotas (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads