Singapore

This paper introduces a multimodal masked autoencoder (MMAE) that jointly denoise and classifies signals by fusing time-domain IQ sequences and constellation diagrams within a cross-attentive transformer. The approach treats noise as a learnable modality to enhance robustness. A dynamic masking curriculum combines with domain-adversarial training and a hybrid loss function to promote domain-invariant features. Experimentation on RadioML 2018.01A and RadioML22 demonstrates superior accuracy across different SNR conditions while using substantially less labeled data than state-of-the-art approaches.

AAAI 2026

Fusing Time-Domain and Constellation Views: A Multimodal MAE for Wireless Signals (Student Abstract)

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Longitudinal behavioral research relies on consistent
measurement across time, yet real-world constraints force
survey instruments to evolve, creating analytical
discontinuities that compromise validity. This challenge
intensifies during crises when researchers must rapidly
incorporate new behavioral domains while preserving
historical comparability. We address this problem through a
dual-path architecture that maintains analytical continuity
despite instrument changes. Using 15 waves of vaccination
surveys as a testbed, we demonstrate how modern AI
techniques can bridge both temporal gaps (from missing
data) and semantic gaps (from question evolution).
Our approach leverages LLM-generated semantic embeddings of
survey questions, enabling the Deep \& Cross Network to
model responses as a joint function of item meaning,
individual characteristics, and temporal context. The
framework demonstrates exceptional resilience to missing
data with semantic embeddings proving critical for bridging
questionnaire evolution. To address data sparsity
constraints, we develop cluster-informed synthetic data
generation via hierarchical prompting that produces
synthetic responses with strong distributional fidelity and
delivers substantial performance gains through mixed
real-synthetic training while reproducing empirical cluster
dynamics.

Semantic Embedding and Synthetic Augmentation for Longitudinal Survey Prediction (Student Abstract)

Modern generative and diffusion models produce highly realistic images that can mislead human perception and even sophisticated automated detection systems. Most detection methods operate in RGB space and thus analyze only three spectral channels. We propose HSI-Detect, a two-stage pipeline that reconstructs a 31-channel hyperspectral image from a standard RGB input and performs detection in the hyperspectral domain. Expanding the input representation into denser spectral bands amplifies manipulation artifacts that are often weak or invisible in the RGB domain, particularly in specific frequency bands. We evaluate HSI-Detect across FaceForensics++ dataset and show the consistent improvements over RGB-only baselines, illustrating the promise of spectral-domain mapping for Deepfake detection.

Exposing DeepFakes via Hyperspectral Domain Mapping (Student Abstract)

Recent advances in Stable Diffusion have extended its applications beyond image generation, such as zero-shot segmentation. In this work, we propose a training-free method that leverages both self- and cross-attention maps to achieve fine-grained hair segmentation. The proposed approach achieves promising fine-grained results without additional training.

Spatially-Guided Self-Attention Refinement for Zero-Shot Hair Segmentation (Student Abstract)

When evaluating large language models (LLMs) for question
answering tasks, a common protocol is multiple-choice
question-answering (MCQA), where the model selects from a
fixed set of choices.
In contemporary robustness testing, researchers typically
perturb instructions or introduce confusion into factual
statements; however, model behavior also hinges on choice
compliance: whether models remain within the canonical set
A-D.
We formalize this setting by asking whether the model
continues to respect the interface's rules when the problem
presents a tempting alternative.
Our approach is interface-preserving: we append a single
selectable option E while keeping the question and A-D
unchanged.
Then, we introduce three types of malicious option
injection to assess LLMs' robustness.
Experimental results highlight the vulnerability of LLMs on
contradict type content of the additional option E.
Our evaluation framework can effectively serve as a
low-cost audit of rule adherence on existing datasets and
black-box models, surfaces off-policy items, and supports
interpretable model comparison for deployment.

Obedience or Vigilance? How Large Language Models React to Malicious Multiple-Choice Options (Student Abstract)

Retrieval-augmented generation (RAG) is the backbone of
knowledge-intensive NLP, yet its progress is hindered by a
long-standing asymmetry: Generators are refined while
retrievers remain static, and full end-to-end optimization
is prohibitively unstable. We present BPO-RAG, a bi-level
preference-learning framework that redefines the training
paradigm by jointly optimizing retrieval and generation
with a single supervision signal, pairwise preferences.
Stage~1 (Retrieval Preference Optimization) learns to
select superior evidence sets, while Stage~2 (Generation
Preference Optimization) aligns answer generation with the
same evidence, closing the gap between what to read and
what to write. This recipe without label requires no reward
model or online RL, integrates seamlessly with standard RAG
pipelines, and transforms preferences into a unifying
training currency. Across open-domain QA benchmarks,
BPO-RAG consistently advances retrieval quality and yields
more accurate, faithful answers, surpassing strong RAG
baselines with remarkable stability. By coupling retrieval
and generation under a unified preference framework,
BPO-RAG establishes a practical and principled path toward
the next generation of reliable, modular, and trustworthy
knowledge-intensive language models.

Bi-Level Preference Optimization for Retrieval-Augmented Generation (Student Abstract)

Esports is growing rapidly, yet the data available to
researchers is limited due to the game company policies.
Consequently, vision-based approaches utilizing game
screens are gaining attention as a practical alternative.
We focus on the League of Legends minimap and address the
challenges of champion detection when extracting champion
information from the minimap. The challenges in this domain
include small objects, rapid movement, and frequent
occlusions.
We propose a transfer-learning-based object detection
pipeline that combines synthetic data with a subset of replay
data. Synthetic data enables the rapid generation of
diverse scenarios and improves training scalability, while
replay data reduces the data distribution gap. This approach
achieves 0.588 mean average precision, improving over
replay-only by 0.261 and synthetic-only by 0.312, with 6.4 ms
latency. Furthermore, we constructed a dataset
encompassing all champions, enabling comparative analysis
of detection models and supporting reproducible
benchmarking for various application studies.

Synthetic-to-Real Transfer Learning for League of Legends Minimap Object Detection (Student Abstract)

We present Magnol.AI Copilot, an extension of the Magnol.AI digital biomarker platform that integrates multimodal large language models (LLMs) to transform digital health technology (DHT) trial dashboards into conversational systems. Copilot augments the platform with a multi-agent orchestration layer and vision-enabled LLMs that interpret visualizations, tabular summaries, and textual metadata. The
system enables natural language queries and automatic generation of contextual insights, allowing researchers to interact with wearable data through dialogue rather than static inspection. A case study with an actigraphy device demonstrates Copilot’s ability to identify nightly compliance gaps and provide contextual explanations, reducing cognitive load compared to manual dashboard review. This work presents
a novel integration of IoMT infrastructure with multimodal LLMs, advancing digital biomarker research toward conversational and accessible DHT trial platforms.

Magnol.AI Copilot: Multimodal LLMs for Conversational Insight Generation

Language models are powerful artifacts, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely interlinked 100-million-triple knowledge base (KB) built for $14,000 from GPT-4.1, using the GPTKB methodology for massive-recursive LLM knowledge materialization. This demo focuses on three use cases: (1) link-traversal-based LLM knowledge exploration, (2) SPARQL-based structured LLM knowledge querying, (3) comparative exploration of the strengths and weaknesses of LLM knowledge. Massive-recursive LLM knowledge materialization is a groundbreaking opportunity both for the systematic analysis of LLM knowledge, as well as for automated KB construction.

GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge

LLMs are increasingly being deployed as chatbots, but today’s interfaces offer little to no friction: users interact through seamless conversations that conceal when the model is drifting, hallucinating or failing. This lack of transparency fosters blind trust, even as models produce unstable or repetitive outputs. We introduce an interactive demo that surfaces and mitigates cognitive fatigue, a failure mode where LLMs gradually lose coherence during auto-regressive generation. Our system, Chatsparent, instruments real-time, token-level signals of fatigue, including attention-to-prompt decay, embedding drift, and entropy collapse, and visualizes them as a unified fatigue index. When fatigue thresholds are crossed, the interface allows users to activate lightweight interventions such as attention resets, entropy-regularized decoding, and self-reflection checkpoints. The demo streams live text and fatigue signals, allowing users to observe when fatigue arises, how it affects output quality, and how interventions restore stability. By turning passive chatbot interaction into an interactive diagnostic experience, our system empowers users to better understand LLM behavior while improving reliability at inference time. The demo video is available at https://youtu.be/ktqkZyYWDDE.

Chatsparent: An Interactive System for Detecting and Mitigating Cognitive Fatigue in LLMs

We present SafeLens, a lightweight segment-level video moderation system that fuses speech, text, and visual frames to produce hateful content detection for each segment. For every segment, SafeLens returns a structured prediction: label, prediction confidence, reasons for flag, harm categories. The structured predictions are optimized for triage, appeals, and downstream enforcement. The system is modular (pluggable speech, text, and visual processing modules back-ends and a mid-size policy Language Language Model (LLM) agent with parameter-efficient tuning). In the live demo, attendees can upload or select clips, scrub the timeline to flag hateful segments, inspect rationales, and vary the policy LLM agent to benchmark the hateful content moderation performance.

Video: https://www.youtube.com/watch?v=B1dYceLSnXA

Downloads

Next from AAAI 2026

Semantic Embedding and Synthetic Augmentation for Longitudinal Survey Prediction (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Semantic Embedding and Synthetic Augmentation for Longitudinal Survey Prediction (Student Abstract)

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads