Singapore

Large Language Models (LLMs) have emerged as a new information channel. Meanwhile, one critical but under-explored question is: Is it possible to bypass the safety alignment and inject harmful information into LLMs stealthily? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation with a newly constructed dataset EditAttack. Specifically, we focus on two typical safety risks of Editing Attack including Misinformation Injection and Bias Injection. For the first risk, we find that editing attacks can inject both commonsense and long-tail misinformation into LLMs, and the effectiveness for the former one is particularly high. For the second risk, we discover that not only can biased sentences be injected into LLMs with high effectiveness, but also one single biased sentence injection can degrade the overall fairness. Then, we further illustrate the high stealthiness of editing attacks. Our discoveries demonstrate the emerging misuse risks of knowledge editing techniques on compromising the safety alignment of LLMs and the feasibility of disseminating misinformation or bias with LLMs as new channels.

AAAI 2026

Can Editing LLMs Inject Harm?

nlp: ethics - bias

nlp: safety and robustness

nlp: (large) language models

and evaluation of nlp models

nlp: interpretability

transparency & privacy

fairness

analysis

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Analogical reasoning is a powerful inductive mechanism, widely used in human cognition and increasingly applied in artificial intelligence. Formal frameworks for analogical inference have been developed for Boolean domains, where inference is provably sound for affine functions and approximately correct for functions close to affine. These results have informed the design of analogy-based classifiers. However, they do not extend to regression tasks or continuous domains.
In this paper, we revisit analogical inference from a foundational perspective. We first present a counterexample showing that existing generalization bounds fail even in the Boolean setting. We then introduce a unified framework for analogical reasoning in real-valued domains based on parameterized analogies defined via generalized means. This model subsumes both Boolean classification and regression, and supports analogical inference over continuous functions. We characterize the class of analogy-preserving functions in this setting and derive both worst-case and average-case error bounds under smoothness assumptions. Our results offer a general theory of analogical inference across discrete and continuous domains.

Generalizing Analogical Inference from Boolean to Continuous Domains

We study non-linear bandit optimization where the learner maximizes a black-box function with zeroth order function oracle, which has been successfully applied in many critical applications such as drug discovery and materials design. Existing works have showed that with the aid of quantum computing, it is possible to break the classical $\Omega(\sqrt{T})$ regret lower bound and achieve the new $O(\mathrm{poly}\log T)$ upper bound. However, they usually assume that the objective function sits within the reproducing kernel Hilbert space and their algorithms suffer from the curse of dimensionality. In this paper, we propose the new Q-NLB-UCB algorithm which enjoys an input dimension-free $O(\mathrm{poly}\log T)$ upper bound, making it applicable for high-dimensional tasks. Furthermore, its time complexity is rigorously demonstrated to be lower than that of existing quantum bandit optimization algorithms. At the heart of our algorithm design are quantum Monte Carlo mean estimator, parametric function approximation technique, and a new quantum non-linear regression oracle, which can be of independent interests in more quantum machine learning problems. Our algorithm is also validated for its efficiency compared with other quantum algorithms on both high-dimensional synthetic and real-world tasks.

Quantum Non-Linear Bandit Optimization

We study a Stackelberg variant of the classical Most Vital Links problem, modeled as a one-round adversarial game between an attacker and a defender. The attacker strategically removes up to $k$ edges from a flow network to maximally disrupt flow between a source $s$ and a sink $t$, after which the defender optimally reroutes the remaining flow.
To capture this attacker–defender interaction, we introduce a new mathematical model of
*discounted cuts*, in which the cost of a cut is evaluated by excluding its $k$ most expensive edges. This model generalizes the Most Vital Links problem and uncovers novel algorithmic and complexity-theoretic properties.

We develop a unified algorithmic framework for analyzing various forms of discounted cut problems, including minimizing or maximizing the cost of a cut under discount mechanisms that exclude either the $k$ most expensive or the $k$ cheapest edges. While most variants are NP-complete on general graphs, our main result establishes polynomial-time solvability for all discounted cut problems in our framework when the input is restricted to bounded-genus graphs, a relevant class that includes many real-world networks such as transportation and infrastructure networks.
With this work, we aim to open collaborative bridges between artificial intelligence, algorithmic game theory, and operations research.

Discounted Cuts: A Stackelberg Approach to Network Disruption

Passive surveillance systems (PSS) are used to detect and track various targets by processing the electromagnetic signals they release. The study and design of the resource management algorithm for these systems revealed several phenomena and combinatorial problems with crucial theoretical properties. In this article, we first prove the completeness of the algorithm used to generate receiver settings that determine which frequency bands the PSS monitors. Next, we formulate a new optimization problem called multiple-interval coverage (MIC), which is used to determine how often each of the generated settings must be used by the PSS. We show that the MIC problem is closely related to the multicover problem, which is an extension of the well-known set cover problem. The uniqueness of MIC stems from the fact that both covered elements and covers are multiple-intervals. We propose a notation to distinguish between different variants of the problem and prove that some of them can be solved in polynomial time. Finally, we prove that the MIC problem is NP-hard even when restricted to 2-interval covers.

Multiple-Interval Coverage for Resource Management of Passive Surveillance Systems

We study non-smooth stochastic decentralized optimization problems over time-varying networks, where objective functions are distributed across nodes and network connections may intermittently appear or break. Specifically, we consider two settings: (i) stochastic non-smooth (strongly) convex optimization, and (ii) stochastic non-smooth (strongly) convex–(strongly) concave saddle point optimization. Convex problems of this type commonly arise in deep neural network training, while saddle point problems are central to machine learning tasks such as the training of generative adversarial networks (GANs). Prior works have primarily focused on the smooth setting, or time-invariant network scenarios. We extend the existing theory to the more general non-smooth and stochastic setting over time-varying networks and saddle point problems. Our analysis establishes upper bounds on both the number of stochastic oracle calls and communication rounds, matching lower bounds for both convex and saddle point optimization problems.

Stochastic Decentralized Optimization of Non-Smooth Convex and Convex-Concave Problems over Time-Varying Networks

Deep Neural Networks (DNNs) are shown to be vulnerable to backdoor poisoning attacks, with most research focusing on digital triggers—artificial patterns added to test-time inputs to induce targeted misclassification. Physical triggers, which are natural objects embedded in real-world scenes, offer a promising alternative for attackers, as they can activate backdoors in real-time without digital manipulation. However, existing physical backdoor attacks are dirty-label, meaning that attackers must change the labels of poisoned inputs to the target label. The inconsistency between image content and label exposes the attack to human inspection, reducing its stealthiness in real-world settings. To address this limitation, we introduce \textbf{C}lean-\textbf{L}abel \textbf{P}hysical \textbf{B}ackdoor \textbf{A}ttack \textbf{(CLPBA)}, a new paradigm of physical backdoor attack that does not require label manipulation and trigger injection at the training stage. Instead, the attacker injects imperceptible perturbations into a small number of target class samples to backdoor a model. By framing the attack as a Dataset Distillation (DD) problem, we develop three CLPBA variants—Parameter Matching, Gradient Matching, and Feature Matching—that craft effective poisons under both linear probing and full-finetuning training settings. In hard scenarios that require backdoor generalizability in the physical world, CLPBA is shown to even surpass Dirty-label attack baselines. We demonstrate the effectiveness of CLPBA via extensive experiments on two collected physical backdoor datasets for facial recognition and animal classification.

Clean-Label Physical Backdoor Attacks with Data Distillation

Agentic AI aims to create systems that set their own goals, adapt proactively to change, and refine behavior through continuous experience. Recent advances suggest that, when facing multiple and unforeseen tasks, agents could benefit from sharing machine-learned knowledge and reuse policies that have already been fully or partially learned by other agents. However, how to query, select, and retrieve policies from a pool of agents, and how to integrate such policies remains a largely unexplored area. This study explores how an agent decides what knowledge to select, from whom, and when and how to integrate it in its own policy in order to accelerate its own learning. The proposed algorithm, \emph{Modular Sharing and Composition in Collective Learning} (MOSAIC), improves learning in agentic collectives by combining (1) knowledge selection using performance signals and cosine similarity on Wasserstein task embeddings, (2) modular and transferable neural representations via masks, and (3) policy integration, composition and fine-tuning. MOSAIC outperforms isolated learners and global sharing approaches in both learning speed and overall performance, and in some cases solves tasks that isolated agents cannot. The results also demonstrate that selective, goal-driven reuse leads to less susceptibility to task interference. We also observe the emergence of self-organization, where agents solving simpler tasks accelerate the learning of harder ones through shared knowledge.

Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems

The ability to design molecules while preserving similarity to a target molecule and/or property is crucial for various applications in drug discovery, chemical design, and biology. We introduce in this paper an efficient training-free method for navigating and sampling from the molecular space with a generative Chemical Language Model (CLM), while using the molecular similarity to the target as a guide. Our method leverages the contextual representations learned from the CLM itself to estimate the molecular similarity, which is then used to adjust the autoregressive sampling strategy of the CLM. At each step of the decoding process, the method tracks the distance of the current generations from the target and updates the logits to encourage the preservation of similarity in generations. We implement the method using a recently proposed ~47M parameter SMILES-based CLM, GP-MoLFormer, and therefore refer to the method as GP-MoLFormer-Sim, which enables a test-time update of the deep generative policy to reflect the contextual similarity to a set of guide molecules. The method is further integrated into a genetic algorithm (GA) and tested on a set of standard molecular optimization benchmarks involving property optimization, molecular rediscovery, and structure-based drug design. Results show that, GP-MoLFormer-Sim, combined with GA (GP-MoLFormer-Sim+GA) outperforms existing training-free baseline methods, when the oracle remains black-box. The findings in this work are a step forward in understanding and guiding the generative mechanisms of CLMs.

GP-MoLFormer-Sim: Test Time Molecular Optimization Through Contextual Similarity Guidance

Vision-Language Models (VLMs) have achieved notable success in tasks such as visual question answering, yet their resilience to distractions in prompts remains underexplored. Understanding how distractions affect VLMs' performance is crucial for real-world applications, as input data often contains noisy or irrelevant content. This paper assesses the robustness of VLMs—including general-purpose models (like GPT-4o) and those specialized for reasoning—against both visual and textual distractions in the context of science question answering. We introduce I-ScienceQA, a new benchmark based on the ScienceQA dataset, which systematically injects distractions into both visual and textual contexts. Using this benchmark, we evaluate how distractions perturb the underlying reasoning processes of these models by analyzing changes in textual explanations leading to answers. Our findings show that most VLMs are vulnerable to distractions, with noticeable degradation in reasoning when extraneous content is present. Notably, some models (such as GPT-o4 mini) exhibit a higher degree of robustness. We also observe that textual distractions generally cause greater performance declines than visual distractions. Finally, we explore mitigation strategies like prompt engineering. While these strategies improve resilience modestly, our analysis highlights considerable space for further improvement in VLM robustness.

Is Your (Reasoning) Multimodal Language Model Vulnerable Toward Distractions?

Text-to-image diffusion models have demonstrated significant capabilities to generate diverse and detailed visuals in various domains, and story visualization is emerging as a particularly promising application. However, as their use in real-world creative domains increases, the need for providing enhanced control, refinement, and the ability to modify images post-generation in a consistent manner becomes an important challenge. Existing methods often lack the flexibility to apply fine or coarse edits while maintaining visual and narrative consistency across multiple frames, preventing creators from seamlessly crafting and refining their visual stories. To address these challenges, we introduce Plot'n Polish, a zero-shot framework that enables consistent story generation and provides fine-grained control over story visualizations at various levels of detail.

Content not yet available

Next from AAAI 2026

Generalizing Analogical Inference from Boolean to Continuous Domains

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES