Singapore

Machine unlearning has emerged as a promising approach to remove specific knowledge from large language models (LLMs), especially for safety-critical applications. However, existing representation-based methods lack guidance for selecting representation locations to unlearn (RMU), thus lacking precision in unlearning, while probability-based methods are vulnerable to fine-tuning attacks which use unrelated and safe data to fine-tune models. To address these problems, this paper presents an adaptive knowledge guidance and memory perturbation mechanisms, called ALMPU (Adaptive Localized Memory Perturbation Unlearning) which addresses the lack of knowledge guidance in representation-based unlearning methods and mitigates the impact of fine-tuning attacks on unlearned models. Specifically, we apply scaling factors to attention heads and select the most sensitive ones as knowledge guidance. Guided by the previous knowledge localization, we integrate enhanced memory perturbation—which forces the model to preserve specific knowledge—into the standard representation-based unlearning process at these sensitive positions. Through this perturbation mechanism, the model achieves more thorough elimination of the target knowledge. By adding interventions to selected attention heads and explicitly optimizing against fine-tuning attacks during the unlearning process, ALMPU creates a controlled divergence from the original model that is inherently resistant to relearning attempts. Experimental evaluation on the WMDP benchmark demonstrates that ALMPU consistently outperforms baseline methods across different scales of fine-tuning attacks.

AAAI 2026

A Robust Unlearning Method with Adaptive Knowledge Guidance and Memory Preservation

nlp: safety and robustness

nlp: (large) language models

krr: knowledge engineering

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large Language Models (LLMs) are increasingly employed in applications that require processing information from heterogeneous formats, including text, tables, infoboxes, and knowledge graphs. However, systematic biases toward particular formats may undermine LLMs' ability to integrate heterogeneous data impartially, potentially resulting in reasoning errors and increased risks in downstream tasks. Despite these concerns, it remains uncertain whether such format biases are systematic, which data-level factors contribute to them, and what internal mechanisms in LLMs underlie their emergence.

In this paper, we make the first attempt to investigate and analyze the format bias in LLMs. To systematically investigate the aforementioned questions, we conduct a three-stage empirical study by constructing an heterogeneous data conflict scenario for the exploration of bias. The first stage explores the presence and direction of bias across a diverse range of LLMs. The second stage aims to examine how key data-level factors, including information richness, structure quality, and format type, influence these biases. The third stage analyzes how format bias emerges within LLMs' attention patterns and evaluates a lightweight intervention to test its potential mitigability. Based on these investigations, we identify three future research directions to reduce format bias: improving data preprocessing through format sanitization and normalization, introducing inference-time interventions such as attention re-weighting, and developing format-balanced training corpora. These directions will support the design of more robust and fair heterogeneous data processing systems.

Format as a Prior: Quantifying and Analyzing Bias in LLMs for Heterogeneous Data

While classical control theory assumes that the controller has access to measurements of the entire state (or output) at every time instant, this paper investigates a setting where the feedback controller can only access a randomly selected subset of the state vector at each time step. Due to the random sparsification that selects only a subset of the state components at each step, we analyze the stability of the closed-loop system in terms of Asymptotic Mean-Square Stability (AMSS), which ensures that the system state converges to zero in the mean-square sense.  We consider the problem of designing both a feedback gain matrix and a measurement sparsification strategy that minimizes the number of state components required for feedback, while ensuring AMSS of the closed-loop system. Interestingly,  (1)  we provide conditions on the dynamics of the system under which it is possible to find a sparsification strategy, and (2) we propose a Linear Matrix Inequality (LMI) based algorithm that jointly computes a stabilizing gain matrix, and a randomized sparsification strategy that minimizes the expected number of measured state coordinates while preserving the AMSS. Our approach is then extended to the case where the sparsification probabilities vary across the state components. Based on these theoretical findings, we propose an algorithmic procedure to compute the vector of sparsification parameters, along with the corresponding feedback gain matrix. To the best of our knowledge, this is the first study to investigate the stability properties of control systems that rely solely on randomly selected state measurements. Numerical simulations demonstrate that, in some settings, the system achieves comparable performance to full-state feedback while requiring measurements from only $0.3\%$ of the state coordinates.

Just Few States Are Enough: Randomized Sparse Feedback for Stability of Dynamical Systems

Combinatorial optimization problems (COPs) are fundamental to many real-world applications, where efficiently producing high-quality solutions is critical. Recent advances in diffusion-based non-autoregressive models have reformulated solving COP as a generative process, achieving promising results. However, these methods still suffer from accumulated errors and high inference costs due to the multi-step stochastic denoising process. To address these issues, we propose EFLOCO, an efficient discrete flow matching method for solving COPs that learns structured and deterministic solution trajectories. EFLOCO replaces noise-driven updates with smooth and guided transitions, thereby improves inference stability and quality. Furthermore, we introduce an adaptive time-step scheduler that allocates more concentration to critical transition regions, enabling strong performance under few-step constraints. Experiments on standard TSP and ATSP benchmarks show that our method consistently outperforms both learning-based and heuristic baselines in terms of solution quality and inference speed.

Efficient Few-Step Solution Generation via Discrete Flow Matching for Combinatorial Optimization

Document clustering plays an important role in text mining and information retrieval. Existing methods primarily focus on document-intrinsic features, overlooking dataset-level features and consequently failing to construct superior representations. We propose a Contrastive Gaussian Fusion Network (CGFN) that can construct superior representations beyond the original documents.
Specifically, CGFN fuses the Gaussian distributions of neighbor-derived information and intrinsic textual features in the latent space. By incorporating contrastive learning into the fusion process, our proposed method is able to learn high-quality representations while simultaneously mitigating noise and minimizing information loss. Experiments on four real-world datasets demonstrate that CGFN outperforms state-of-the-art methods, achieving superior clustering by robustly capturing holistic distributions and neighbor patterns.

Constructing Superior Representations Beyond the Original Documents via a Contrastive Gaussian Fusion Network for Clustering

Typical deep clustering methods, while achieving notable progress, can only provide one clustering result per dataset. This limitation arises from their assumption of a fixed underlying data distribution, which may fail to meet user needs and provide unsatisfactory clustering outcomes. Our work investigates how multi-modal large language models (MLLMs) can be leveraged to achieve user-driven clustering, emphasizing their adaptability to user-specified semantic requirements. However, directly using MLLM output for clustering has risks for producing unstructured and generic image descriptions instead of feature-specific and concrete ones. To address these issues, our method first discovers that MLLMs' hidden states of text tokens are strongly related to the corresponding features, and leverages these embeddings to perform clusterings from any user-defined criteria. We also employ a lightweight clustering head augmented with pseudo-label learning, significantly enhancing clustering accuracy. Extensive experiments demonstrate its competitive performance on diverse datasets and metrics. Codes and datasets are available in the anonymous repository.

ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering

Table images present unique challenges for effective and efficient understanding due to the need for question-specific focus and the presence of redundant background regions.
Existing Multimodal Large Language Model (MLLM) approaches often overlook these characteristics, resulting in uninformative and redundant visual representations.
To address these issues, we aim to generate visual features that are both informative and compact for improved table understanding.
We first propose progressive question conditioning, which injects the question into Vision Transformer layers with gradually increasing frequency, considering each layer’s capacity to handle additional information, to generate question-aware visual features.
To reduce redundancy, we introduce a pruning strategy that discards background tokens, thereby improving efficiency.
To mitigate information loss from pruning, we further propose token focusing, a training strategy that encourages the model to concentrate essential information in the retained tokens.
By combining these approaches, we present TabFlash, an efficient and effective MLLM for table understanding.
TabFlash achieves state-of-the-art performance, outperforming both open-source and proprietary MLLMs, while requiring 27\% less FLOPs and 30\% less memory usage compared to the second-best MLLM.

TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing

Reinforcement learning from human feedback (RLHF) is widely used to align large language models (LLMs) with human preferences. However, RLHF-trained reward models often exhibit length bias—a systematic tendency to favor longer responses by conflating verbosity with quality. We propose a causal framework for analyzing and mitigating length bias in RLHF reward modeling. Central to our approach is a counterfactual data augmentation method that generates response pairs designed to isolate content quality from verbosity. These counterfactual examples are then used to train the reward model, enabling it to assess responses based on content quality independently of verbosity. Specifically, we construct (1) length-divergent pairs with similar content and (2) content-divergent pairs of similar length. Empirical evaluations show that our method reduces length bias in reward assignment and leads to more concise, content-focused outputs from the policy model. These findings demonstrate that the proposed approach effectively reduces length bias and improves the robustness and content sensitivity of reward modeling in RLHF pipelines.

Mitigating Length Bias in RLHF Through a Causal Lens

High-quality multi-hop instruction data is critical for enhancing the reasoning capabilities of large language models (LLMs) in complex long-context scenarios, e.g., long-form reasoning. Nevertheless, there is currently a notable scarcity of such datasets within the community, and existing data synthesis approaches typically fail to provide explicit modeling of intermediate reasoning steps, resulting in unverifiable and potentially erroneous samples. To mitigate above issue, we design the **C**oncept-**G**raph based **M**ulti-hop **I**nstruction **S**ynthesis (CGMIS) framework, which constructs long-form reasoning paths via concept graph traversal and automatically generates verifiable multi-hop data. The CGMIS framework not only guarantees the accuracy and verifiability of the synthesized data but also enables the construction of high-quality multi-hop instruction datasets from arbitrary corpora. Experiments show that fine-tuning with CGMIS-generated data achieves state-of-the-art performance across 13 long-context reasoning tasks on various models, using only 10% of the data volume required by existing methods.

CGMIS: Concept-Graph Based Multi-Hop Instructions Synthesis for Enhancing Long-Context Reasoning

This paper tackles the fundamental failure of Large Language Models (LLMs) to solve new tasks when prompted with a sufficient, yet overly complex, set of multi-modal episodes. This failure stems from the model's inability to distill underlying patterns from the noisy experiences. We propose Hypothesis-Driven Reasoning (HDR), a framework that enhances LLM reasoning by building an explicit semantic memory—a set of hypotheses induced from the multi-modal episodes. HDR employs a two-stage pipeline. It first extracts potential factors from the episodes and then iteratively refines hypotheses by generate-verify loop with the factors. We first empirically demonstrates this failure and the potential of sematic memory, showing that oracle hypotheses can boost accuracy from 35.3\% to 92.0\% on a novel task we designed. We then evaluate our HDR, achieving near-oracle performance and significantly outperforming baselines, especially on smaller models. This paper validates a shift from unstructured in-context recall to explicit knowledge abstraction for robust reasoning.

Hypothesis-Driven Reasoning for Large Language Models

In modern software development workflows, the open-source software supply chain significantly contributes to efficient and convenient engineering practices. With increasing system complexity, it has become a common practice to use open-source software as third-party dependencies. However, due to the lack of maintenance for underlying dependencies and insufficient community auditing, ensuring the security of source code and the legitimacy of repository maintainers has become a challenge, particularly in the context of high-stealth backdoor attacks such as the XZ-Util incident. To address these problems, we propose a fine-grained project evaluation framework for backdoor risk assessment in open-source software. Our evaluation framework models highly stealthy backdoor attacks from the attacker’s perspective and defines targeted metrics for each attack stage. Moreover, to overcome the limitations of static analysis in assessing the reliability of repository maintenance activities—such as irregular committer privilege escalation and insufficient review participation—we employ large language models (LLMs) to perform semantic evaluation of code repositories while avoiding reliance on manually crafted patterns. The effectiveness of our framework is validated on 156 high-priority Debian packages, and the experimental results reveal that the current open-source software supply chain is exposed to a series of security risks.

Content not yet available

Next from AAAI 2026

Format as a Prior: Quantifying and Analyzing Bias in LLMs for Heterogeneous Data

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES