Singapore

Generating code from a natural language programming task is one of the most successful applications of Large Language Models (LLMs). Yet, the generated program may be buggy. Without an oracle, such as an existing, correct implementation or a formal specification, can we somehow estimate how likely the generated program is correct?

In this paper, we propose a measure of incorrectness, called incoherence, that can be estimated efficiently in the absence of an oracle and provides a lower bound on the error, i.e., the probability that the LLM-generated program for that specification is incorrect.
In our experiments, our incoherence-based methodology can automatically identify about two-thirds of incorrect programs without reports of false positives for the average task.

In fact, an oracle-based evaluation of LLMs can be reliably replaced by an incoherence-based evaluation. In particular, we find a very strong agreement between the ranking of LLMs by the number of programs deemed correct via an oracle (pass@1) and the ranking of LLMs by the number of programs deemed correct via incoherence.

AAAI 2026

Incoherence as Oracle-less Measure of Error in LLM-Based Code Generation

incoherence measure

probabilistic correctness

oracle-less evaluation

llm-based code generation

error estimation

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Evolutionary algorithms (EAs) are optimization algorithms that simulate natural selection and genetic mechanisms. Despite advancements, existing EAs have two main issues: (1) they rarely update next-generation individuals based on global correlations, thus limiting comprehensive learning; (2) it is challenging to balance exploration and exploitation, excessive exploitation leads to premature convergence to local optima, while excessive exploration results in an excessively slow search. Existing EAs heavily rely on manual parameter settings, inappropriate parameters might disrupt the exploration-exploitation balance, further impairing model performance. To address these challenges, we propose a novel evolutionary algorithm framework called Graph Neural Evolution (GNE). Unlike traditional EAs, GNE represents the population as a graph, where nodes correspond to individuals, and edges capture their relationships, thus effectively leveraging global information. Meanwhile, GNE utilizes spectral graph neural networks (GNNs) to decompose evolutionary signals into their frequency components and designs a filtering function to fuse these components. High-frequency components capture diverse global information, while low-frequency components capture more consistent information. This explicit frequency filtering strategy directly controls global-scale features through frequency components, overcoming the limitations of manual parameter settings and making the exploration-exploitation control more interpretable and effective. Extensive evaluations on nine benchmark functions (e.g., Sphere, Rastrigin, and Rosenbrock) demonstrate that GNE consistently outperforms both classical algorithms (GA, DE, CMA-ES) and advanced algorithms (SDAES, RL-SHADE) under various conditions, including original, noise-corrupted, and optimal solution deviation scenarios. GNE achieves solution quality several orders of magnitude better than other algorithms (e.g., 3.07e-20 mean on Sphere vs. 1.51e-07).

Learn from Global Correlations: Enhancing Evolutionary Algorithm via Spectral GNN

Backdoor attacks pose a persistent security risk to deep neural networks (DNNs) due to their stealth and durability. While recent research has explored leveraging model unlearning mechanisms to enhance backdoor concealment, existing attack strategies still leave persistent traces that may be detected through static analysis. In this work, we introduce the first paradigm of revocable backdoor attacks, where the backdoor can be proactively and thoroughly removed after the attack objective is achieved. We formulate the trigger optimization in revocable backdoor attacks as a bilevel optimization problem: by simulating both backdoor injection and unlearning processes, the trigger generator is optimized to achieve a high attack success rate (ASR) while ensuring that the backdoor can be easily erased through unlearning. To mitigate the optimization conflict between injection and removal objectives, we employ a deterministic partition of poisoning and unlearning samples to reduce sampling-induced variance, and further apply the Projected Conflicting Gradient (PCGrad) technique to resolve the remaining gradient conflicts. Experiments on CIFAR-10 and ImageNet demonstrate that our method maintains ASR comparable to state-of-the-art backdoor attacks, while enabling effective removal of backdoor behavior after unlearning. This work opens a new direction for backdoor attack research and presents new challenges for the security of machine learning systems.

Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning

Written Multi-Party Conversations (WMPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility. However, these datasets raise privacy concerns and often reflect platform-specific properties.
For example, interactions between speakers may be limited due to rigid platform structures (e.g., threads, tree-like discussions), which yield overly simplistic interaction patterns (e.g., one-to-one ``reply-to'' links). This work explores the feasibility of generating synthetic WMPCs with instruction-tuned Large Language Models (LLMs) by providing deterministic constraints such as dialogue structure and participants’ stance. We investigate two complementary strategies of leveraging LLMs in this context: (i.) LLMs as WMPC generators, where we task the LLM to generate a whole WMPC at once and (ii.) LLMs as WMPC parties, where the LLM generates one turn of the conversation at a time (made of speaker, addressee and message), provided the conversation history. We next introduce an analytical framework to evaluate compliance with the constraints, content quality, and interaction complexity for both strategies. Finally, we assess the level of obtained WMPCs via human and LLM-as-a-judge evaluations. We find stark differences among LLMs, with only some being able to generate high-quality WMPCs. We also find that turn-by-turn generation yields better conformance to constraints and higher linguistic variability than generating WMPCs in one pass. Nonetheless, our structural and qualitative evaluation indicates that both generation strategies can yield high-quality WMPCs.

Don’t Stop the Multi-Party! On Generating Synthetic Written Multi-Party Conversations with Constraints

With the widespread adoption of multi-view data in numerous fields, multi-view unsupervised feature selection (MUFS) has made notable strides in both feature pruning and missing-view completion. Nonetheless, existing MUFS methods typically rely on centralized servers, which cannot meet real-world demands for privacy preservation and distributed learning, and they often suffer from suboptimal solution and weak convergence guarantees. To address these challenges, IMUFFS, an incomplete multi-view unsupervised federated feature selection via cooperative particle swarm optimization (CPSO) and tensor-aligned learning (TAL) is proposed. Specifically, each client executes CPSO-TAL at two stages: (i) an external optimization phase that involves a CPSO, inspired by the co-evolutionary mechanism of hybrid breeding optimization algorithm, performing a global search in the feature space, and (ii) an internal optimization phase that leverages TAL with imputation and CP decomposition, where CP decomposition reduces dimensionality by decomposing the original tensor into a sum of core components, to learn low-dimensional embeddings, while simultaneously updating anchor graphs and view preference weights, thereby harmonizing imputation and representation learning. On the server side, a federated aggregation strategy using adaptive normalized mutual information (NMI) weighting combines the locally optimized feature selection (FS) weights and NMI scores from clients, ensuring privacy while improving the quality of FS and convergence. Extensive experiments on multiple datasets demonstrate that IMUFFS consistently outperforms state-of-the-art methods, yielding more effective and robust FS and enhancing better missing-view completion.

Incomplete Multi-View Unsupervised Federated Feature Selection via Cooperative Particle Swarm Optimization and Tensor-Aligned Learning

Modern AI services must continually adapt to newly joined domains, yet delivering high-quality customized models is hampered by label sparsity, domain shifts, and tight budgets. We formulate this challenge as the learning system expansion problem and introduce HaT, an efficient heterogeneity-aware knowledge-transfer framework. HaT first selects a small set of high-quality source models with minimal overhead, and then fuses their imperfect predictions through a sample-wise attention mixer. Later, it adaptively distills the fused knowledge into target models via a knowledge dictionary. Extensive experiments on different tasks and modalities show that HaT outperforms state-of-the-art baselines by up to 16.5\% accuracy, and saves 31.1\% training time and up to 93.0\% traffic.

Learning Systems Expansion with Efficient Heterogeneity-aware Knowledge Transfer

Promptable segmentation models such as SAM have established a powerful paradigm, enabling strong generalization to unseen objects and domains with minimal user input, including points, bounding boxes, and text prompts. Among these, bounding boxes stand out as particularly effective, often outperforming points while significantly reducing annotation costs. However, current training and evaluation protocols typically rely on synthetic prompts generated through simple heuristics, offering limited insight into real-world robustness. In this paper, we investigate the robustness of promptable segmentation models to natural variations in bounding box prompts. First, we conduct a controlled user study and collect thousands of real bounding box annotations. Our analysis reveals substantial variability in segmentation quality across users for the same model and instance, indicating that SAM-like models are highly sensitive to natural prompt noise. Then, since exhaustive testing of all possible user inputs is computationally prohibitive, we reformulate robustness evaluation as a white-box optimization problem over the bounding box prompt space. We introduce BREPS, a method for generating adversarial bounding boxes that minimize or maximize segmentation error while adhering to naturalness constraints. Finally, we benchmark state-of-the-art models across 10 datasets, spanning everyday scenes to medical imaging. All code and data will be released upon publication.

BREPS: Bounding-Box Robustness Evaluation of Promptable Segmentation

Visual impairment is a common condition worldwide, and cortical electrical stimulation is one of the approaches to aid in visual restoration. However, existing methods suffer from limited precision, flexibility, and generalization in generating the desired visual perception. In this paper, we propose a novel deep learning-based algorithm for cortical electrical stimulation, named ``MindSight," aimed at enhancing the clarity and accuracy of induced visual perceptions. Our framework introduces three key innovations: (1) A differentiable biophysical model simulating cortical state transitions under electrical stimulation, enabling end-to-end training; (2) A dual-path training architecture combining neural decoding fidelity with phosphene simulation constraints; (3) An attention-guided background gated network for input filtration and, a multi-channel activation constraint to ensure the effectiveness of electrical stimulation. We validated our approach through novel experiments with macaque monkeys, demonstrating superior performance in visual perception tasks. These results highlight the potential of our approach in assisting individuals with visual impairments.

MindSight: A Bio-Inspired Neural Architecture for Visual Restoration via Cortical Electrical Stimulation

Retrieval-Augmented Generation (RAG) has revolutionized Large Language Models' ability to access external knowledge, but current graph-based RAG approaches face critical limitations in managing hierarchical knowledge: they impose rigid compression quotas per layer that damage local graph structures, and they focus primarily on topological structure while neglecting semantic coherence. We introduce T-Retriever, a novel framework that reformulates attributed graph retrieval as tree-based retrieval using a semantic and structure guided encoding tree. Our approach integrates two key innovations: (1) Adaptive Compression Encoding, which eliminates artificial layer-specific compression quotas in favor of a global optimization strategy that preserves the graph's natural hierarchical organization, and (2) Semantic-Structural Entropy (S²-Entropy), which jointly optimizes for both topological cohesion and semantic consistency when creating hierarchical partitions. Extensive experiments across diverse graph reasoning benchmarks demonstrate that T-Retriever significantly outperforms state-of-the-art RAG methods.

T-Retriever: Tree-based Hierarchical Retrieval Augmented Generation for Textual Graphs

The surrogate gradient (SG) method has shown significant promise in enhancing the performance of deep spiking neural networks (SNNs), but it also introduces vulnerabilities to adversarial attacks. Although spike coding strategies and neural dynamics parameters have been extensively studied for their impact on robustness, the critical role of gradient magnitude, which reflects the model's sensitivity to input perturbations, remains underexplored. In SNNs, the gradient magnitude is primarily determined by the interaction between the membrane potential distribution (MPD) and the SG function. In this study, we investigate the relationship between the MPD and SG and its implications for improving the robustness of SNNs. Our theoretical analysis reveals that reducing the proportion of membrane potential lying within the gradient-available range of the SG function effectively mitigates the sensitivity of SNNs to input perturbations. Building upon this insight, we propose a novel MPD-driven surrogate gradient regularization (MPD-SGR) method, which enhances robustness by explicitly regularizing the MPD based on its interaction with the SG function. Extensive experiments across multiple image classification benchmarks and diverse network architectures confirm that the MPD-SGR method significantly enhances the resilience of SNNs to adversarial perturbations and exhibits strong generalizability across diverse network configurations, SG function variants, and spike encoding schemes.

MPD-SGR: Robust Spiking Neural Networks with Membrane Potential Distribution-Driven Surrogate Gradient Regularization

Recent advances in transformer-based text-to-motion generation have significantly improved motion quality. However, achieving both real-time performance and long-horizon scalability remains an open challenge. In this paper, we present MOGO (Motion Generation with One-pass), a novel autoregressive framework for efficient and scalable 3D human motion generation. MOGO consists of two key components. First, we introduce MoSA-VQ, a motion scale-adaptive residual vector quantization module that hierarchically discretizes motion sequences through learnable scaling parameters, which dynamically regulate the information flow at each layer to produce compact yet expressive multi-level representations. Second, to fully utilize the high-quality motion representations, we further design the RQHC-Transformer, a residual quantized hierarchical causal transformer that structurally aligns with the multi-level latent hierarchy produced by MoSA-VQ. Each level is decoded by a dedicated transformer block, enabling efficient multi-scale generation in a single forward pass. Compared to diffusion-based and LLM-based approaches, it achieves lower inference latency while maintaining high motion quality. Notably, our hierarchical latent modeling—through the synergy of MoSA-VQ and RQHC-Transformer—empowers MOGO with seamless and coherent infinite-length generation. By iteratively extending motion from any given frame and allowing control signals to be updated at arbitrary points, the model produces stable transitions and responds adaptively to new conditions, enabling real-time, controllable long-horizon synthesis with strong temporal consistency. Extensive experiments on HumanML3D and KIT-ML validate the quality and efficiency of our approach, while evaluation on the unseen CMP dataset demonstrates strong zero-shot generalization capabilities.

Content not yet available

Next from AAAI 2026

Learn from Global Correlations: Enhancing Evolutionary Algorithm via Spectral GNN

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES