Singapore

While Large Language Models (LLMs) excel at code generation, their inherent tendency toward verbatim memorization of training data introduces critical risks like copyright infringement, insecurity emission, and deprecated API utilization, etc. A straightforward yet promising defense is unlearning, i.e., erasing or down-weighting the offending snippets through post-training. However, we find its application to source code often tends to spill over, damaging the basic knowledge of programming languages learned by the LLM and degrading the overall capability. To ease this challenge, we propose PROD for precise source code unlearning. PROD surgically zeroes out the prediction probability of the prohibited tokens, and renormalizes the remaining distribution so that the generated code stays correct. By excising only the targeted snippets, PROD achieves precise forgetting without much degradation of the LLM&#39;s overall capability. To facilitate in-depth evaluation against PROD, we establish an unlearning benchmark consisting of three downstream tasks (i.e., unlearning of copyrighted code, insecure code, and deprecated APIs), and introduce Pareto Dominance Ratio (PDR) metric, which indicates both the forget quality and the LLM utility. Our comprehensive evaluation demonstrates that PROD achieves superior overall performance between forget quality and model utility compared to existing unlearning approaches across three downstream tasks, while consistently exhibiting improvements when applied to LLMs of varying series. PROD also exhibits superior robustness against adversarial attacks without generating or exposing the data to be forgotten. These results underscore that our approach not only successfully extends the application boundary of unlearning techniques to source code, but also holds significant implications for advancing reliable code generation.

AAAI 2026

Large Language Model Unlearning for Source Code

machine unlearning

large language model

code generation

While Large Language Models (LLMs) excel at code generation, their inherent tendency toward verbatim memorization of training data introduces critical risks like copyright infringement, insecurity emission, and deprecated API utilization, etc. A straightforward yet promising defense is unlearning, i.e., erasing or down-weighting the offending snippets through post-training. However, we find its application to source code often tends to spill over, damaging the basic knowledge of programming languages learned by the LLM and degrading the overall capability. To ease this challenge, we propose PROD for precise source code unlearning. PROD surgically zeroes out the prediction probability of the prohibited tokens, and renormalizes the remaining distribution so that the generated code stays correct. By excising only the targeted snippets, PROD achieves precise forgetting without much degradation of the LLM's overall capability. To facilitate in-depth evaluation against PROD, we establish an unlearning benchmark consisting of three downstream tasks (i.e., unlearning of copyrighted code, insecure code, and deprecated APIs), and introduce Pareto Dominance Ratio (PDR) metric, which indicates both the forget quality and the LLM utility. Our comprehensive evaluation demonstrates that PROD achieves superior overall performance between forget quality and model utility compared to existing unlearning approaches across three downstream tasks, while consistently exhibiting improvements when applied to LLMs of varying series. PROD also exhibits superior robustness against adversarial attacks without generating or exposing the data to be forgotten. These results underscore that our approach not only successfully extends the application boundary of unlearning techniques to source code, but also holds significant implications for advancing reliable code generation.

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large Language Models (LLMs) demonstrate impressive capabilities in natural language understanding and generation, but incur high communication overhead and privacy risks in cloud deployments, while facing compute and memory constraints when confined to edge devices.Cloud–edge inference has emerged as a promising paradigm for improving privacy in LLM services by retaining sensitive computations on local devices.However, existing cloud–edge inference approaches apply uniform privacy protection without considering input sensitivity, resulting in unnecessary perturbation and degraded utility even for non-sensitive tokens. To address this limitation, we propose Privacy-aware Routing for Inference with Semantic Modulation (PRISM), a context-aware framework that dynamically balances privacy and inference quality. PRISM executes in four stages: (1) the edge device profiles entity-level sensitivity; (2) a soft gating module, also on the edge, selects an execution mode -cloud, edge, or collaboration; (3) for collaborative paths, the edge applies adaptive two-layer local differential privacy based on entity risks; and (4) the cloud LLM generates a semantic sketch from the perturbed prompt, which is then refined by the edge-side small language model (SLM) using local context.Our results show that PRISM consistently achieves superior privacy-utility trade-offs in various scenarios, reducing energy consumption and latency to 40–50\% of baseline methods such as Uniform and Selective LDP, while maintaining high output quality under strong privacy constraints. These findings are validated through comprehensive evaluations involving realistic prompts, actual energy measurements, and heterogeneous cloud–edge model deployments.

PRISM: Privacy-Aware Routing for Adaptive Cloud–Edge LLM Inference via Semantic Sketch Collaboration

Graph learning faces major challenges under noisy and sparse supervision, where corrupted labels mislead representation learning and impair generalization. Prior work proposes robust training strategies such as correction, reweighting, and denoising to reduce the influence of noisy labels. However, most methods still optimize directly on training nodes using their possibly corrupted labels as supervision signals. In this work, we propose a prototype-guided framework that replaces direct label supervision over training nodes with semantic supervision derived from class-level prototypes. Each prototype is formed by aggregating representations of nodes sharing the same observed label and serves as a semantic anchor for guiding the classifier. To address the inherent supervision sparsity introduced by limited prototype instances, we introduce a dual-branch mixup strategy that integrates prototypes with high-confidence nodes through intra- and inter-class interpolation, which enhances supervision coverage and improves representation continuity. We further constrain the spatial variance of these samples to promote intra-class compactness. Theoretically, we demonstrate that the constructed prototypes remain aligned with true class semantics under bounded noise rates. Experiments on node classification tasks confirm the effectiveness of our approach under label noise and limited supervision.

Prototype-Guided Supervision for Graph Learning with Noisy and Sparse Labels

Federated Edge Learning (FEL) has emerged as a promising approach for enabling edge devices to collaboratively train machine learning models while preserving data privacy. Despite its advantages, practical FEL deployment faces significant challenges related to device constraints and device-server interactions, necessitating heterogeneous, user-adaptive model training with limited and uncertain communication. While knowledge cache-driven federated learning offers a promising FEL solution for demanding edge environments, its logits-based interaction design provides poor richness of exchanged information for on-device model optimization. To tackle this issue, we introduce DistilCacheFL, a novel personalized FEL architecture that enhances the exchange of optimization insights while delivering state-of-the-art performance with efficient communication. DistilCacheFL incorporates the benefits of both dataset distillation and knowledge cache-driven federated learning by storing and organizing distilled data as knowledge in the server-side knowledge cache, allowing devices to periodically download and utilize personalized knowledge for local model optimization. Moreover, a device-centric cache sampling strategy is introduced to tailor transferred knowledge for individual devices within controlled communication bandwidth. Extensive experiments on five datasets covering image recognition, audio understanding, and mobile sensor data mining tasks demonstrate that (1) DistilCacheFL significantly outperforms state-of-the-art methods regardless of model structures, data distributions, and modalities. (2) DistilCacheFL can train splendid personalized on-device models with at least 
28.6 improvement in communication efficiency.

Re-architecting Personalized Federated Learning for Demanding Edge Environments

Conformal prediction constructs a set of labels instead of a single point prediction, while providing a probabilistic coverage guarantee. Beyond the coverage guarantee, adaptiveness to example difficulty is an important property. It means that the method should produce larger prediction sets for more difficult examples, and smaller ones for easier examples. Existing evaluation methods for adaptiveness typically analyze coverage rate violation or average set size across bins of examples grouped by difficulty. However, these approaches often suffer from imbalanced binning, which can lead to inaccurate estimates of coverage or set size. To address this issue, we propose a binning method that leverages input transformations to sort examples by difficulty, followed by uniform-mass binning. Building on this binning, we introduce two metrics to better evaluate adaptiveness. These metrics provide more reliable estimates of coverage rate violation and average set size due to balanced binning, leading to more accurate adaptivity assessment. Through experiments, we demonstrate that our proposed metric correlates more strongly with the desired adaptiveness property compared to existing ones. Furthermore, motivated by our findings, we propose a new adaptive prediction set algorithm that groups examples by estimated difficulty and applies group-conditional conformal prediction. This allows us to determine appropriate thresholds for each group. Experimental results on both (a) an Image Classification (ImageNet) (b) a medical task (visual acuity prediction) show that our method outperforms existing approaches according to the new metrics.

Quantifying and Improving Adaptivity in Conformal Prediction Through Input Transformations

Tensor network structure search (TN-SS) aims to automatically discover optimal network topologies and rank configurations for efficient tensor decomposition in high-dimensional data representation. Despite recent advances, existing TN-SS methods face significant limitations in computational tractability, structure adaptivity, and optimization robustness across diverse tensor characteristics. Current approaches struggle with three fundamental challenges: single-scale optimization that misses multi-scale structures, discrete search spaces that prevent smooth structure evolution, and separation of structure and parameter optimization that creates computational inefficiency. We propose RGTN (\textbf{R}enormalization \textbf{G}roup guided \textbf{T}ensor \textbf{N}etwork search), a novel physics-inspired framework that fundamentally transforms tensor network structure search through multi-scale renormalization group flows. Unlike existing methods that search through discrete structure spaces at fixed scales, RGTN implements a dynamic scale-transformation strategy where network structures evolve continuously across resolution levels. The key innovation lies in introducing learnable edge gates that enable topology modification during optimization, combined with intelligent structure proposals based on physical quantities—node tension measuring local stress and edge information flow quantifying connectivity importance. By starting optimization at coarse scales with exponentially reduced complexity and progressively refining toward finer scales, RGTN discovers more compact structures while naturally escaping local minima through scale-induced perturbations. Our code is available in the supplementary materials for reproducibility.

Renormalization Group Guided Tensor Network Structure Search

Capsule Network (CapsNet) has demonstrated significant potential in visual recognition by capturing spatial relationships and part-whole hierarchies for learning equivariant feature representations. However, existing CapsNet and variants often rely on a single high-level feature map, overlooking the rich complementary information provided by multi-scale features. Furthermore, conventional feature fusion strategies, such as addition and concatenation, struggle to reconcile multi-scale feature discrepancies, leading to suboptimal classification performance. To address these limitations, we propose the Multi-Scale Patchify Capsule Network (MSPCaps), a novel architecture that integrates multi-scale feature learning and efficient capsule routing. Specifically, MSPCaps consists of three key components: a Multi-Scale ResNet Backbone (MSRB), a Patchify Capsule Layer (PatchifyCaps), and a Cross-Agreement Routing (CAR) block. First, the MSRB extracts diverse multi-scale feature representations from input images, preserving both fine-grained details and global contextual information. Second, the PatchifyCaps partitions these multi-scale features into primary capsules using a uniform patch size, equipping the model with the ability to learn from diverse receptive fields. Finally, the CAR block adaptively routes the multi-scale capsules by identifying cross-scale prediction pairs with maximum agreement. Unlike the simple concatenation of multiple self-routing blocks, CAR ensures that only the most coherent capsules (best part-to-whole pairs) contribute to the final voting. Our proposed MSPCaps achieves remarkable scalability and superior robustness, consistently surpassing multiple baseline methods in terms of classification accuracy, with configurations ranging from a highly efficient Tiny model (344.3K parameters) to a powerful Large model (10.9M parameters), highlighting its potential in advancing feature representation learning.

MSPCaps: A Multi-Scale Patchify Capsule Network with Cross-Agreement Routing for Visual Recognition

Large Language Models (LLMs) have achieved remarkable success in instruction-following and dialogue tasks, yet aligning them with human preferences remains a critical challenge. Recent advances such as Direct Preference Optimization (DPO) simplify the alignment pipeline by bypassing explicit reward modeling, but they often suffer from suboptimal reward margin distributions, leading to weak supervision signals and reduced discriminative capacity. In this work, we propose Reward Margin Optimization (RMO), a framework that reshapes reward margin distributions during training to improve alignment performance. RMO comprises three components: (1) a Dual Denoising Filtering strategy that filters ambiguous and noisy preference pairs based on reward margin dynamics; (2) Batch Margin Diversification, which maximizes intra-batch margin variance to enhance learning signal diversity; and (3) Pairwise Margin Amplification, an auxiliary regularization term that encourages larger margins between preferred and dispreferred responses. Extensive experiments on multiple LLMs and datasets demonstrate that RMO consistently improves win rates over strong baselines such as DPO and SimPO, while remaining compatible with various preference-based optimization methods. Our results highlight the critical role of reward margin distribution in preference alignment and establish RMO as an effective and scalable enhancement to existing alignment techniques.

RMO: Towards Better LLM Alignment via Reshaping Reward Margin Distributions

Optical Chemical Structure Recognition (OCSR) plays a pivotal role in modern chemical informatics, enabling the automated conversion of chemical structure images from scientific literature, patents, and educational materials into machine-readable molecular representations. This capability is essential for large-scale chemical data mining, drug discovery pipelines, and Large Language Model (LLM) applications in related domains. However, existing OCSR systems face significant challenges in accurately recognizing stereochemical information due to the subtle visual cues that distinguish stereoisomers, such as wedge and dash bonds, ring conformations, and spatial arrangements.
To address these challenges, we propose \textbf{MolSight}, a comprehensive learning framework for OCSR that employs a three-stage training paradigm. In the first stage, we conduct pre-training on large-scale but noisy datasets to endow the model with fundamental perception capabilities for chemical structure images. In the second stage, we perform multi-granularity fine-tuning using datasets with richer supervisory signals, systematically exploring how auxiliary tasks—specifically chemical bond classification and atom localization—contribute to molecular formula recognition. Finally, we employ reinforcement learning for post-training optimization and introduce a novel stereochemical structure dataset. Remarkably, we find that even with MolSight's relatively compact parameter size, the Group Relative Policy Optimization (GRPO) algorithm can further enhance the model's performance on stereomolecular. Through extensive experiments across diverse datasets, our results demonstrate that MolSight achieves state-of-the-art performance in (stereo)chemical optical structure recognition.

MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

Electrocautery or lasers will inevitably generate surgical smoke, which hinders the visual guidance of laparoscopic videos for surgical procedures. The surgical smoke can be classified into different types based on its motion patterns, leading to distinctive spatio-temporal characteristics across smoky laparoscopic videos. However, existing desmoking methods fail to account for such smoke-type-specific distinctions. Therefore, we propose the first Smoke-Type-Aware Laparoscopic Video Desmoking Network (STANet) by introducing two smoke types: Diffusion Smoke and Ambient Smoke. Specifically, a smoke mask segmentation sub-network is designed to jointly conduct smoke mask and smoke type predictions based on the attention-weighted mask aggregation, while a smokeless video reconstruction sub-network is proposed to perform specially desmoking on smoky features guided by two types of smoke mask. To address the entanglement challenges of two smoke types, we further embed a coarse-to-fine disentanglement module into the mask segmentation sub-network, which yields more accurate disentangled masks through the smoke-type-aware cross attention between non-entangled and entangled regions. In addition, we also construct the first large-scale synthetic video desmoking dataset with smoke type annotations. Extensive experiments demonstrate that our method not only outperforms state-of-the-art approaches in quality evaluations, but also exhibits superior generalization across multiple downstream surgical tasks.

Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset

Multi-modal Sentiment Analysis (MSA) enables machines to perceive human sentiments by integrating multiple modalities such as text, video, and audio. Despite recent progress, most existing methods assume distribution consistency between training and test data—a condition rarely met in real-world scenarios. To address domain shifts without relying on source data or target labels, Test-Time Adaptation (TTA) has emerged as a promising paradigm. However, applying TTA methods to MSA faces two challenges: a representation bottleneck inherent to the regression formulation and the inconsistency in modality fusion caused by modality-specific data augmentation techniques. To overcome these issues, we propose Group-aware Multiscale Ensemble Learning (GMEL), which leverages a von Mises-Fisher (vMF) mixture distribution to model latent sentiment groups and integrates a multi-scale re-dropout strategy for modality-agnostic feature augmentation, preserving fusion consistency. Extensive experiments on three benchmark datasets using two backbone architectures show that GMEL significantly outperforms existing baselines, demonstrating strong robustness to test-time distribution shifts in multi-modal sentiment analysis.

Content not yet available

Next from AAAI 2026

PRISM: Privacy-Aware Routing for Adaptive Cloud–Edge LLM Inference via Semantic Sketch Collaboration

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES