Singapore

Visual neural decoding is an important research topic at the intersection of cognitive neuroscience and machine learning. While recent progress has been made in EEG-based neural decoding, reconstructing dynamic visual content remains challenging. In the field of EEG decoding, current models either utilize pre-trained encoders for feature extraction or employ graph neural networks to represent the spatio-temporal information embedding, resulting in poor model representation and high complexity. We propose EVOKE -- an innovative framework for zero-shot decoding of high-fidelity videos from EEG signals. EVOKE employs Implicit Neural Representations to perform complete spatial modeling of EEG and continuously decouples information in the EEG-INR perceptual space. Additionally, we construct a Hierarchical-aware Attention Module (HAM) to decode EEG from three feature anchors: visual, semantic, motion, and progressively control task inference. The Motion Attention Flow (MAF) we developed overcomes the limitations of capturing motion features in dynamic stimuli, creating a more robust representation that enhances reconstruction consistency. Comprehensive experiments prove that SOTA performance of EVOKE (0.353 SSIM, 0.715 CLIP-pcc). We provide an effective method for converting brain activity into rich visual experiences and set a new benchmark for brain multimodal generation.

AAAI 2026

EVOKE: Efficient and High-Fidelity EEG-to-Video Reconstruction via Decoupling Implicit Neural Representation

image & video synthesis cv: language and vision

cv: representation learning for vision cv: computational photography

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Multi-annotator learning (MAL) aims to model annotator-specific labeling patterns. However, existing methods face a critical challenge: they simply skip updating annotator-specific model parameters when encountering missing labels—a common scenario in real-world crowdsourced datasets where each annotator labels only small subsets of samples. This leads to inefficient data utilization and overfitting risks. To this end, we propose a novel similarity-weighted semi-supervised learning framework (SimLabel) that leverages inter-annotator similarities to generate weighted soft labels for missing annotations, enabling the utilization of unannotated samples rather than skipping them entirely. We further introduce a confidence-based iterative refinement mechanism that combines maximum probability with entropy-based uncertainty to prioritize predicted high-quality pseudo-labels to impute missing labels, jointly enhancing similarity estimation and model performance over time. For evaluation, we contribute a new multimodal multi-annotator dataset, AMER2, with high and more variable missing rates, reflecting real-world annotation sparsity and enabling evaluation across different sparsity levels. Extensive experiments validate the effectiveness of our method.

SimLabel: Similarity-Weighted Semi-supervision for Multi-annotator Learning with Missing Labels

Modern image super-resolution (SR) networks match photographic detail but remain impractical for memory- and latency-constrained devices. Post-training quantization (PTQ) compresses such models without full retraining; yet standard PTQ treats weight and activation quantizers independently and therefore ignores their coupled behaviour in the frequency domain. A Fourier analysis reveals a consistent asymmetry: quantizing weights blurs colour and shape at low frequencies, while quantizing activations erodes edges and textures at high frequencies. Because these distortions interact multiplicatively rather than add linearly, separate optimisation leaves considerable perceptual quality unrealised. We introduce HarmoQ, a harmonised PTQ framework that accounts for this coupling. HarmoQ first applies a closed-form spectral residual projection, editing weights once to cancel high-frequency artefacts caused by activation clipping. It then enforces an equal-error scaling law that analytically balances mean-squared error between weights and activations, followed by an adaptive boundary refinement routine that fine-tunes clipping thresholds while periodically restoring that balance.

HarmoQ: Harmonized Post-Training Quantization for High-Fidelity Image Super-Resolution

Market making (MM) through Reinforcement Learning (RL) has attracted significant attention in financial trading. With the development of Large Language Models (LLMs), more and more attempts are being made to apply LLMs to financial areas. A simple, direct application of LLM as an agent shows significant performance. Such methods are hindered by their slow inference speed, while most of the current research has not studied LLM distillation for this specific task. To address this, we first propose the normalized fluorescent probe to study the mechanism of the LLM’s feature. Based on the observation found by our investigation, we propose Cooperative Market Making(CMM), a novel framework that decouples LLM features across three orthogonal dimensions: layer, task, and data. Various student models collaboratively learn simple LLM features along with different dimensions, with each model responsible for a distinct feature to achieve knowledge distillation. Furthermore, CMM introduces an H´ajek-MoE to integrate the output of the student models by investigating the contribution of different models in a kernel function-generated common feature space. Extensive experimental results on four real-world market datasets demonstrate the superiority of CMM over the current distillation method and RL-based market-making strategies. The code is provided in the supplementary material.

Two Heads Are Better than One: Distilling Large Language Model Features into Small Models with Feature Decomposition and Mixture

Reflective imaging enables the mirror imagings and physical entities to possess identical attributes, e.g., color and shape. Current mirror detection (MD) methods primarily rely on designing functional components to establish the correlation and disparities between the imagings and entities, thereby identifying the mirror regions. However, the exploration of extended scenes with dynamic content changes is rarely investigated. Therefore, we propose the MirrorSAM designed for MD based on the Segment Anything Model (SAM). Specifically, due to the varying reflections produced by mirrors in different positions and the complex visual space that interferes with localization, we design the hierarchical mixture of direction experts (HMDE) in the low-rank space to reduce biases towards entities in SAM and dynamically adjust experts based on the input scene. We observe differences in depth between mirrors and adjacent areas, and propose the depth token calibration (DTC), which introduces a learnable depth token to generate the depth map and serve as an error correction factor. We further formulate the selective pixel-prototype contrastive (SPPC) loss, selecting partially confusable samples to promote the decoupling of mirror and non-mirror representations. Extensive experiments conducted on four mirror benchmarks and two settings demonstrate that our approach surpasses state-of-the-art methods with few trainable parameters and FLOPs. We further extend to four transparent surface benchmarks to validate generalization.

Seeing Beyond Illusion: Generalized and Efficient Mirror Detection

Currently, almost all traditional infrared small target detection methods work on the assumption that training and test sets always belong to a same domain, and training samples are sufficient. However, in real applications, a new detection task could often have no sufficient training samples from a special domain. In this situation, adopting the auxiliary data from big-sample domains is usually believed to be one of the most potential solutions. However, exceeding expectations, it is found that simply adding auxiliary samples cannot often be always effective, even causing performance decline, due to existing infrared domain shift. To overcome this unexpected problem, we propose the first infrared moving small target detection framework with domain-auxiliary supports by Learning to Overlook Domain Discrepancy (Loddis). This framework consists of three primary processing stages: correlation weakening, domain confusing, and target consistency contrastive learning. Breaking through traditional learning paradigm, through auxiliary data, it enables the model to focus more on targets themselves, and less on image backgrounds, minimizing the sensitivity to domain discrepancy. The extensive experiments on 6 different-domain datasets show the effectiveness and superiority of proposed Loddis framework. Codes will be open after acceptance.

Domain-Auxiliary Infrared Moving Small Target Detection by Learning to Overlook Domain Discrepancy

Aligning the decision-making process of deep learning models with that of experienced sonographers is essential for ultrasound-based reliable disease diagnosis. Although existing methods have made significant progress in this aspect, their alignments are primarily associational rather than causal, leading to pseudo-correlations between features and diagnostic results. Such a biased diagnosis blindly models the sonographer's diagnostic skills and attention to specific patterns, which we argue hardly produces an AI diagnoser that is comparable to human experts. To address this issue, we propose a causality-based diagnostic framework to align the model's diagnostic behaviors with those of experts. Specifically, by delving into both conspicuous and inconspicuous confounders within the ultrasound images, the back-door and front-door adjustment causal learning modules are proposed to promote unbiased learning by mitigating potential pseudo-correlations. In addition, we integrate causal inference into a well-designed dual-branch model with feature interaction bridges for compatibility with multimodal ultrasound inputs. To fully evaluate our method, we conduct comparative studies on different diseases and ultrasound modalities. In particular, we publish a carefully constructed multimodal ultrasound dataset for breast lesion diagnosis and segmentation. Sufficient comparative and ablation studies on this dataset emphasize that our method outperforms state-of-the-art methods.

Towards Ultrasound-based Reliable Disease Diagnosis Using Causal Inference

Entity hallucination poses a major challenge in radiology report generation (RRG), particularly for 3D CT scans where complex spatial contexts amplify factual errors. To address this, medical entity phrases serve as key carriers for multi-modal prompting, integrating expert knowledge into the vision-language model. Current methods use unified cross-attention for volume-phrase alignment, failing to account for anatomical specificity during the alignment process. In this work, we introduce the Dual-stream Entity Alignment Reporting network (DEAR) that separately models organ and lesion entities to resolve anatomical bias. Specifically, the dual-stream entity aligner is designed to partition medical entity phrases into organ and lesion streams, feeding them into separate cross-attention blocks in parallel to achieve fine-grained volume–phrase alignment. For structurally regular and spatially stable organ entities, an organ-guided cross-attention (OGCA) block is proposed to enforce structural consistency by retrieving the top-k voxel tokens via volume–phrase similarity and preserving spatial connectivity through morphological dilation. Meanwhile, a lesion-guided cross-attention (LGCA) block is introduced for structurally irregular and spatially variable lesion entities, enhancing anomaly sensitivity through phrase-weighted attention and refining discriminative boundaries via 3D residual Laplacian filtering. Experiments demonstrate that DEAR significantly reduces entity hallucinations and improves clinical factuality in 3D RRG benchmarks.

Mitigating Entity Hallucinations in 3D Radiology Report Generation via Dual-Stream Alignment

Machine unlearning, as a post-hoc processing technique, has gained widespread adoption in addressing challenges like bias mitigation and robustness enhancement. However, existing non-privacy unlearning-based solutions persist in using a binary data removal framework designed for privacy-driven motivation, even when repurposed for fairness or robustness improvements. This leads to significant utility loss, a phenomenon known as “over-unlearning”. While over-unlearning has been largely described in many studies as primarily causing utility degradation, we investigate deeper insights in this work through counterfactual leave-one-out analysis. 
Based on insights, we introduce a soft weighting strategy that assigns tailored weights to each sample by solving a convex quadratic programming problem analytically, which enables fine-grained model adjustments to address the over-unlearning. We demonstrate that the proposed soft-weighted scheme can be seamlessly integrated into most existing unlearning algorithms.
Extensive experiments show that in fairness- and robustness-driven tasks, the soft-weighted scheme significantly outperforms hard-weighted schemes in fairness/robustness metrics and alleviates the decline in utility metric, thereby enhancing unlearning algorithm as an effective correction solution.

Beyond Binary Erasure: Soft-Weighted Unlearning for Fairness and Robustness

As an emerging distributed paradigm, Federated Learning (FL) facilitates collaborative training among multiple clients without sharing the raw data. However, the classic FL still faces significant challenges due to feature/model heterogeneity and catastrophic forgetting, which seriously hamper effective knowledge transfer and cause the forgetting of previously acquired knowledge. To address these challenges, we propose FBCL, a novel generalizable heterogeneity-aware Federated features and Basic-matrix Consistency Learning to balance intra-domain discriminability and inter-domain generalization. For the heterogeneity issue, we align the similarity of feature distribution and construct the high-dimensional basic matrix by using irrelevant unlabeled data, thereby overcoming communication barriers and learning generalizable representations while maintaining strict privacy preservation. For the catastrophic forgetting issue during local updating, we introduce constraints in high-dimensional features to retain inter-domain knowledge, and then extract accurate knowledge by distilling old models to preserve worthy historical information. Using real-world unlabeled public datasets, extensive experiments validate the superiority of the proposed FBCL, which outperforms state-of-the-art methods on various scenarios of image classification.

Generalizable Heterogeneity-aware Federated Feature and Basic-matrix Consistency Learning

Introducing high-quality references can largely alleviate the uncertainty in blind face image restoration tasks, yet the equivocal utilization of reference priors makes it still a struggle to well preserve the human identity. We attribute the identity inconsistency to two deficiencies of existing reference-based face restoration methods, namely the inability to effectively determine which features need to be transferred, and the failure to preserve the structure and details of the selected features. This work mainly focuses on these two issues, and we present a novel blind face image restoration method that considers reference selection, transfer, and reconstruction (RefSTAR) to introduce proper features from reference images. Specifically, we construct a reference selection (RefSel) module, which can generate accurate masks to select reference features. For training the RefSel module, we construct a RefSel-HQ dataset through a mask generation pipeline, which contains annotated masks for 10,000 ground truth-reference pairs. To guarantee the exact introduction of selected reference features, a feature fusion paradigm is designed for reference feature transferring, and a Mask-Compatible Cycle-Consistency Loss is redesigned based on reference reconstruction to further ensure the presence of selected reference image features in the output image. Experiments on various backbone models demonstrate superior performance, showing better identity preservation ability and reference feature transfer quality. Source code, dataset, and models will be available.

Downloads

Next from AAAI 2026

SimLabel: Similarity-Weighted Semi-supervision for Multi-annotator Learning with Missing Labels

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

SimLabel: Similarity-Weighted Semi-supervision for Multi-annotator Learning with Missing Labels

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads