Singapore

Deep learning is increasingly applied to intraoperative and
surgical video analysis to enable real-time workflow
recognition, and decision support for improved surgical
precision. A key direction is modeling surgical activity as
triplets of instrument, action, and target, which provide a
richer representation of procedures. However, existing
approaches often depend on bounding-box annotations or lack
temporal context. We propose TWiST (Temporal Weakly
Supervised Triplet detection), a framework that combines
weakly supervised instrument localization, temporal
attention for triplet prediction, and grounding of triplets
with detected instruments. Our experiments show that TWiST
outperforms prior weakly supervised baselines.

AAAI 2026

TWiST: Temporal Weakly-Supervised Triplets Recognition in Surgical Videos (Student Abstract)

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

In this work, C2R-KD is proposed, applying a
Complex-to-Real projection to map complex domain features
into the real domain. C2R-KD mitigates complex-real domain
mismatch to strengthen the representational capacity of the
student model and further improves the knowledge
distillation model performance through the hybrid
distillation of features and logits simultaneously.
Experimental result demonstrates higher accuracy than the
conventional KD across all test environments.

C2R-KD: Complex to Real Knowledge Distillation (Student Abstract)

Efficient spam detection in resource-constrained
environments remains challenging due to class imbalance,
noisy text, and the computational demands of large
Transformer models. We introduce a novel coreset selection
framework based on a unified Entropy–Class-Balanced
Uncertainty-Density Ranking (CBUDR) scheme. Our method
prioritizes highly informative and uncertain samples while
ensuring diversity and class balance within the selected
subset. The framework flexibly supports multiple selection
strategies, including Top-K, Bottom-K, and adaptive
class-wise schemes, enabling robust performance even when
training on as little as 5% of the dataset. Extensive
experiments on benchmark datasets (UCI SMS, UTKML Twitter,
LingSpam) show that our ranking scheme achieves competitive
accuracy, precision, and recall while significantly
reducing computational cost. These results demonstrate that
carefully designed coreset strategies can surpass full-data
performance in both balanced and imbalanced settings,
highlighting the potential for deployment on low-power
devices and mobile platforms.

Adaptive Coreset Selection via Uncertainty-Density for Efficient Spam Detection (Student Abstract)

This paper introduces a multimodal masked autoencoder (MMAE) that jointly denoise and classifies signals by fusing time-domain IQ sequences and constellation diagrams within a cross-attentive transformer. The approach treats noise as a learnable modality to enhance robustness. A dynamic masking curriculum combines with domain-adversarial training and a hybrid loss function to promote domain-invariant features. Experimentation on RadioML 2018.01A and RadioML22 demonstrates superior accuracy across different SNR conditions while using substantially less labeled data than state-of-the-art approaches.

Fusing Time-Domain and Constellation Views: A Multimodal MAE for Wireless Signals (Student Abstract)

Longitudinal behavioral research relies on consistent
measurement across time, yet real-world constraints force
survey instruments to evolve, creating analytical
discontinuities that compromise validity. This challenge
intensifies during crises when researchers must rapidly
incorporate new behavioral domains while preserving
historical comparability. We address this problem through a
dual-path architecture that maintains analytical continuity
despite instrument changes. Using 15 waves of vaccination
surveys as a testbed, we demonstrate how modern AI
techniques can bridge both temporal gaps (from missing
data) and semantic gaps (from question evolution).
Our approach leverages LLM-generated semantic embeddings of
survey questions, enabling the Deep \& Cross Network to
model responses as a joint function of item meaning,
individual characteristics, and temporal context. The
framework demonstrates exceptional resilience to missing
data with semantic embeddings proving critical for bridging
questionnaire evolution. To address data sparsity
constraints, we develop cluster-informed synthetic data
generation via hierarchical prompting that produces
synthetic responses with strong distributional fidelity and
delivers substantial performance gains through mixed
real-synthetic training while reproducing empirical cluster
dynamics.

Semantic Embedding and Synthetic Augmentation for Longitudinal Survey Prediction (Student Abstract)

Modern generative and diffusion models produce highly realistic images that can mislead human perception and even sophisticated automated detection systems. Most detection methods operate in RGB space and thus analyze only three spectral channels. We propose HSI-Detect, a two-stage pipeline that reconstructs a 31-channel hyperspectral image from a standard RGB input and performs detection in the hyperspectral domain. Expanding the input representation into denser spectral bands amplifies manipulation artifacts that are often weak or invisible in the RGB domain, particularly in specific frequency bands. We evaluate HSI-Detect across FaceForensics++ dataset and show the consistent improvements over RGB-only baselines, illustrating the promise of spectral-domain mapping for Deepfake detection.

Exposing DeepFakes via Hyperspectral Domain Mapping (Student Abstract)

Recent advances in Stable Diffusion have extended its applications beyond image generation, such as zero-shot segmentation. In this work, we propose a training-free method that leverages both self- and cross-attention maps to achieve fine-grained hair segmentation. The proposed approach achieves promising fine-grained results without additional training.

Spatially-Guided Self-Attention Refinement for Zero-Shot Hair Segmentation (Student Abstract)

When evaluating large language models (LLMs) for question
answering tasks, a common protocol is multiple-choice
question-answering (MCQA), where the model selects from a
fixed set of choices.
In contemporary robustness testing, researchers typically
perturb instructions or introduce confusion into factual
statements; however, model behavior also hinges on choice
compliance: whether models remain within the canonical set
A-D.
We formalize this setting by asking whether the model
continues to respect the interface's rules when the problem
presents a tempting alternative.
Our approach is interface-preserving: we append a single
selectable option E while keeping the question and A-D
unchanged.
Then, we introduce three types of malicious option
injection to assess LLMs' robustness.
Experimental results highlight the vulnerability of LLMs on
contradict type content of the additional option E.
Our evaluation framework can effectively serve as a
low-cost audit of rule adherence on existing datasets and
black-box models, surfaces off-policy items, and supports
interpretable model comparison for deployment.

Obedience or Vigilance? How Large Language Models React to Malicious Multiple-Choice Options (Student Abstract)

Retrieval-augmented generation (RAG) is the backbone of
knowledge-intensive NLP, yet its progress is hindered by a
long-standing asymmetry: Generators are refined while
retrievers remain static, and full end-to-end optimization
is prohibitively unstable. We present BPO-RAG, a bi-level
preference-learning framework that redefines the training
paradigm by jointly optimizing retrieval and generation
with a single supervision signal, pairwise preferences.
Stage~1 (Retrieval Preference Optimization) learns to
select superior evidence sets, while Stage~2 (Generation
Preference Optimization) aligns answer generation with the
same evidence, closing the gap between what to read and
what to write. This recipe without label requires no reward
model or online RL, integrates seamlessly with standard RAG
pipelines, and transforms preferences into a unifying
training currency. Across open-domain QA benchmarks,
BPO-RAG consistently advances retrieval quality and yields
more accurate, faithful answers, surpassing strong RAG
baselines with remarkable stability. By coupling retrieval
and generation under a unified preference framework,
BPO-RAG establishes a practical and principled path toward
the next generation of reliable, modular, and trustworthy
knowledge-intensive language models.

Bi-Level Preference Optimization for Retrieval-Augmented Generation (Student Abstract)

Esports is growing rapidly, yet the data available to
researchers is limited due to the game company policies.
Consequently, vision-based approaches utilizing game
screens are gaining attention as a practical alternative.
We focus on the League of Legends minimap and address the
challenges of champion detection when extracting champion
information from the minimap. The challenges in this domain
include small objects, rapid movement, and frequent
occlusions.
We propose a transfer-learning-based object detection
pipeline that combines synthetic data with a subset of replay
data. Synthetic data enables the rapid generation of
diverse scenarios and improves training scalability, while
replay data reduces the data distribution gap. This approach
achieves 0.588 mean average precision, improving over
replay-only by 0.261 and synthetic-only by 0.312, with 6.4 ms
latency. Furthermore, we constructed a dataset
encompassing all champions, enabling comparative analysis
of detection models and supporting reproducible
benchmarking for various application studies.

Synthetic-to-Real Transfer Learning for League of Legends Minimap Object Detection (Student Abstract)

We present Magnol.AI Copilot, an extension of the Magnol.AI digital biomarker platform that integrates multimodal large language models (LLMs) to transform digital health technology (DHT) trial dashboards into conversational systems. Copilot augments the platform with a multi-agent orchestration layer and vision-enabled LLMs that interpret visualizations, tabular summaries, and textual metadata. The
system enables natural language queries and automatic generation of contextual insights, allowing researchers to interact with wearable data through dialogue rather than static inspection. A case study with an actigraphy device demonstrates Copilot’s ability to identify nightly compliance gaps and provide contextual explanations, reducing cognitive load compared to manual dashboard review. This work presents
a novel integration of IoMT infrastructure with multimodal LLMs, advancing digital biomarker research toward conversational and accessible DHT trial platforms.

Content not yet available

Downloads

Next from AAAI 2026

C2R-KD: Complex to Real Knowledge Distillation (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Content not yet available

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

C2R-KD: Complex to Real Knowledge Distillation (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads