Singapore

Recent advances in large-scale code generation models, trained in a self-supervised manner on extensive unlabeled code corpora, have led to notable progress in generating high-quality code. Despite their success in generative tasks, these decoder-only models often underperform on code understanding tasks such as code search and clone detection, due to the generation-oriented nature of their training objectives. While training a large encoder-only model from scratch on massive code data may enhance understanding performance, this approach is typically resource-intensive and time-consuming. In this paper, we explore a more efficient alternative by transferring knowledge from pre-trained decoder-only code generation models to code understanding tasks. We investigate effective strategies for enabling decoder-only architectures to learn meaningful code representations suitable for comprehension. To this end, we propose CL4D, a contrastive learning framework tailored to strengthen the representation capabilities of decoder-only models. Extensive experiments on benchmark datasets demonstrate that our approach achieves competitive or superior performance compared to existing methods on tasks such as code search and clone detection. The results indicate that CL4D improves the semantic alignment of code representations by reducing the distance between semantically similar code snippets.

AAAI 2026

Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning

program synthesis from natural language

(large) language models

code generation

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

*Counterfactual regret minimization (CFR)* algorithms are a foundational class of methods for solving imperfect-information games, with the time average of their iterates converging to a Nash equilibrium in two-player zero-sum games. Prior state-of-the-art variants, *Discounted CFR (DCFR)* and *Predictive CFR$^+$ (PCFR$^+$)*, achieve the fastest known practical performance by improving convergence rates over vanilla CFR through discounting early iterations, with a fixed discounting scheme. More recently, *Dynamic DCFR (DDCFR)* introduced agent-learned dynamic discounting schemes to further accelerate convergence, at the cost of increased complexity. To address this, we propose *Hyperparameter Schedules (HSs)*, a remarkably simple, training-free framework that dynamically adjusts CFR discounting over time. HSs aggressively downweight early updates and gradually transition to trusting late-stage strategies, leading to substantially faster convergence with less than 15 lines of code modification. We show that HSs derived from just three small extensive-form games generalize effectively to 17 diverse games (including large-scale realistic poker) in both extensive-form and normal-form settings, without any game-specific tuning. Our method establishes a new state of the art for solving two-player zero-sum games.

Faster Game Solving via Hyperparameter Schedules

Pre-trained Vision Transformer (ViT) models have achieved impressive performance across various computer vision tasks. However, most existing pre-trained models are built on fixed datasets and lack the flexibility to incorporate new pre-training data. When additional data becomes available, previous models must typically be retrained on both old and new data, which is costly and impractical, especially in privacy-sensitive or resource-constrained environments. Moreover, direct fine-tuning on downstream tasks does not provide mechanisms to adapt to the specific data distributions of those tasks, and it only supports fixed model sizes. To address these challenges, we propose \textbf{Adaptive-Learngene}, a novel framework in which the ancestry model is trained solely on newly available data, and a new component, termed a learngene, is extracted and added to a global learngene pool that expands incrementally. This design enables a dynamically evolving pool of learngenes without requiring access to previous data. For each new downstream task, the Task-Adaptive Learngene Selector (TALS) retrieves a sparse combination of learngenes that best match to the data distribution of the target task. TALS requires only a small amount of downstream data for this selection, enabling descendant models of different sizes to be efficiently initialized and tailored to specific data distributions and resource constraints. Extensive experiments on diverse downstream tasks demonstrate that our method matches or outperforms existing approaches while offering superior scalability, adaptability, and efficiency in dynamic learning environments.

Adaptive-Learngene: Continual Expansion and Task-Aware Selection of Learngenes for Dynamic Environments

Long-Form Video Question Answering (LVQA) poses challenges beyond traditional visual question answering (VQA), which is often limited to static images or short video clips. While current vision-language models (VLMs) perform well in those settings, they struggle with answering complex queries in LVQA over long videos involving multi-step temporal reasoning and causality. Vanilla approaches, which simply sample frames uniformly and feed them to a VLM along with the question, incur significant token overhead, forcing severe downsampling of long videos. As a result, the model often misses fine-grained visual structure, subtle event transitions, or key temporal cues—ultimately leading to incorrect answers. To address these limitations, recent works have explored query-adaptive frame sampling, hierarchical keyframe selection, and agent-based iterative querying. However, these methods remain fundamentally heuristic: they lack explicit temporal representations and cannot enforce or verify logical event relationships (e.g., "before X," "after Y"). As a result, there are no formal guarantees that the sampled context actually encodes the compositional or causal logic demanded by the question. To address these foundational gaps, we introduce NeuS-QA, a training-free, plug-and-play neuro-symbolic pipeline for LVQA. NeuS-QA translates a natural language question into a formal temporal logic expression, constructs a video automaton from frame-level semantic propositions, and applies model checking to rigorously identify video segments that satisfy the question's logical requirements. Only these logic-verified segments are submitted to the VLM, thus improving interpretability, reducing hallucinations, and enabling compositional reasoning without modifying or fine-tuning the model. Experiments on the LongVideoBench and CinePile long-form VQA benchmarks show that NeuS-QA significantly improves performance by over 10%, particularly on questions involving event ordering, causality, and multi-step compositional reasoning.

NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning

Recent advances in large language models (LLMs) have shown great potential to accelerate drug discovery. However, the specialized nature of biochemical data often necessitates costly domain-specific fine-tuning, posing critical challenges. First, it hinders the application of more flexible general-purpose LLMs in cutting-edge drug discovery tasks. More importantly, it limits the rapid integration of the vast amounts of scientific data continuously generated through experiments and research. Compounding these challenges is the fact that real-world scientific questions are typically complex and open-ended, requiring reasoning beyond pattern matching or static knowledge retrieval. To address these challenges, we propose CLADD, a retrieval-augmented generation (RAG)-empowered agentic system tailored to drug discovery tasks. Through the collaboration of multiple LLM agents, CLADD dynamically retrieves information from biomedical knowledge bases, contextualizes query molecules, and integrates relevant evidence to generate responses - all without the need for domain-specific fine-tuning. Crucially, we tackle key obstacles in applying RAG workflows to biochemical data, including data heterogeneity, ambiguity, and multi-source integration. We demonstrate the flexibility and effectiveness of this framework across a variety of drug discovery tasks, showing that it outperforms general-purpose and domain-specific LLMs as well as traditional deep learning approaches.

RAG-Enhanced Collaborative LLM Agents for Drug Discovery

Object hallucination remains a critical challenge in Large Vision-Language Models (LVLMs), where models generate content inconsistent with visual inputs. Existing language-decoder based mitigation approaches often regulate visual or textual attention independently, overlooking their interaction as two key causal factors. To address this, we propose Owl (Bi-mOdal attention reWeighting for Layer-wise hallucination mitigation), a causally-grounded framework that models hallucination process via a structural causal graph, treating decomposed visual and textual attentions as mediators. We introduce VTACR (Visual-to-Textual Attention Contribution Ratio), a novel metric that quantifies the modality contribution imbalance during decoding. Our analysis reveals that hallucinations frequently occur in low-VTACR scenarios, where textual priors dominate and visual grounding is weakened. To mitigate this, we design a fine-grained attention intervention mechanism that dynamically adjusts token- and layer-wise attention guided by VTACR signals.
Finally, we propose a dual-path contrastive decoding strategy: one path emphasizes visually grounded predictions, while the other amplifies hallucinated ones -- letting visual truth shine and hallucination collapse. Experimental results on the POPE and CHAIR benchmarks show that Owl achieves significant hallucination reduction, setting a new SOTA in faithfulness while preserving vision-language understanding capability. Our code is available at https://github.com/CikZ2023/OWL

Causally-Grounded Dual-Path Attention Intervention for Object Hallucination Mitigation in LVLMs

Federated graph learning (FGL) is a distributive framework for graph representation learning that prioritizes privacy preservation. The right to be forgotten embodies the ethical principle of prioritizing user autonomy over data usage. In the context of FGL, upholding this right requires the method to remove specific entities and their associated knowledge within local subgraphs (Meta Unlearning) and the complete erasure of the entire client (Client Unlearning).
We are the first to systematically define the above two unlearn requests in federated graph unlearning.
Several studies have attempted to address this challenge, but key limitations persist: incomplete unlearning support and residual knowledge permeation. 
To this end, we propose a \textbf{P}rototype-guided \textbf{A}dversarial \textbf{G}raph \textbf{E}raser for universal federated graph unlearning (\textbf{PAGE}), the first unified federated graph unlearning framework that extend to comprehensive unlearning requests. 
For meta unlearning, we employ the prototype gradients guide initial local unlearn, while adversarial graphs eliminate residual knowledge across the influenced clients. For client unlearning, PAGE exclusively utilizes adversarial graph generation to purge a departed client's influence from the remaining participants.
PAGE outperforms existing methods on 8 benchmark datasets. It improves prediction accuracy by 5.08\% (client unlearn) and 1.50\% (meta-unlearn), with up to 11.84\% gain on large-scale graphs.
Furthermore, ablation studies confirm its efficacy as a plug-in for other meta unlearn methods, boosting prediction performance up to 4.49\% and unlearning performance up to 7.22\%.

PAGE: A Unified Approach for Federated Graph Unlearning

Greedy search methods like Greedy Best-First Search (GBFS) and Enforced Hill-Climbing (EHC) often struggle when faced with Uninformed Heuristic Regions (UHRs) like heuristic local minima or plateaus. In this work, we theoretically and empirically compare two popular methods for escaping UHRs in breadth-first search (BrFS) and restarting random walks (RRWs). We first derive the expected runtime of escaping a UHR using BrFS and RRWs, based on properties of the UHR and the random walk procedure, and then use these results to identify when RRWs will be faster in expectation than BrFS. We then evaluate these methods for escaping UHRs by comparing standard EHC, which uses BrFS to escape UHRs, to variants of EHC called EHC-RRW, which use RRWs for that purpose. EHC-RRW is shown to have strong expected runtime guarantees in cases where EHC has previously been shown to be effective. We also run experiments with these approaches on PDDL planning benchmarks to better understand their relative effectiveness for escaping UHRs.

Breadth-First Search vs. Restarting Random Walks for Escaping Uninformed Heuristic Regions

We introduce a biologically inspired, multi-layer neural architecture built from Rectified Spectral Units (ReSUs). Each ReSU projects a recent window of its input history onto a canonical direction learned by the canonical correlation analysis (CCA) of previously observed past-future input pairs and then rectifies either the positive or negative component. Because synaptic weights are obtained via past-future CCA on the pre-synaptic activity, ReSU networks offer a potentially local, self-supervised algorithm for the progressive construction of increasingly complex features. To assess both computational power and biological fidelity, we trained a two-layer ReSU network in a self-supervised regime on translating natural scenes. First-layer units, each driven by a single pixel, developed temporal filters matching those of \textit{Drosophila} post-photoreceptor neurons (L1/L2 and L3), including their empirically measured adaptation to signal‑to‑noise‑ratio. Second-layer units, pooling spatially over the first layer, became direction-selective, reminiscent of T4 motion-detecting cells, with learned synaptic weights approximating known patterns in the \textit{Drosophila} connectome. These results demonstrate that ReSU networks may provide: (i) a principled framework for modeling sensory circuits, (ii) a back-prop-free self-supervised paradigm for constructing deep artificial neural networks.

A Network of Biologically Inspired Rectified Spectral Units (ReSUs) Learns Hierarchical Features Without Error Backpropagation

In recent years, electroencephalography (EEG)-based visual decoding research has become a key direction for revealing brain processing mechanisms and realizing brain-computer interfaces. This emerging field has attracted extensive attention in the fields of brain science, cognitive neuroscience, and artificial intelligence. Among various approaches, contrastive learning has demonstrated strong performance in aligning multi-modal data, effectively enabling unified representations across modalities. However, during human visual perception, images are often subject to varying degrees of blurring due to the uneven distribution of retinal photoreceptor cells and the limited speed of lens accommodation. To address the mismatch between EEG and visual representations, we propose a novel visual decoding framework inspired by human perceptual blurring. Specifically, multi-level Gaussian blurring is applied to the visual stimuli to simulate human visual characteristics, followed by a feature selection module to construct robust visual representations. For EEG decoding, we design a lightweight and efficient network employing positively constrained spatial convolutions to identify channels associated with visual processing. The EEG and visual features are then aligned using contrastive learning. We evaluate the proposed framework on the Things-EEG dataset. Experimental results show significant improvements in the zero-shot brain-to-image retrieval task, achieving a top-1 accuracy of 80\% and a top-5 accuracy of 96.9\%, surpassing previous state-of-the-art methods by margins of 29.1\% and 17.2\%, respectively. These findings highlight the potential of incorporating perceptual properties into EEG-based visual decoding.

Leveraging Visual Blur Perception Characteristics for EEG Decoding

Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training–inference gap and lack the capacity for fine-grained token selection across multiple dimensions—such as queries, key-values (KV), and heads—leading to suboptimal performance and 
acceleration gains.
In this paper, we introduce \texttt{OmniSparse}, a training-aware fine-grained sparse attention of long-video MLLMs, which is applied in both training and inference with dynamic token budget allocation. Specifically, OmniSparse contains three adaptive and complementary mechanisms: (1) query selection as lazy-active classification, aiming to retain active queries that capture broader semantic similarity, while discarding most of lazy ones that focus on limited local context and exhibit high functional redundancy with their neighbors, (2) KV selection with head-level dynamic budget allocation, where a shared budget is determined based on the flattest head and applied uniformly across all heads to ensure attention recall after selection, and (3) KV cache slimming to alleviate head-level redundancy, which selectively fetches visual KV cache according to the head-level decoding query pattern.
Experimental results demonstrate that OmniSparse can achieve comparable performance with full attention, achieving 2.7$\times$ speedup during prefill and 2.4$\times$ memory reduction for decoding.

Content not yet available

Next from AAAI 2026

Faster Game Solving via Hyperparameter Schedules

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES