Singapore

With the rapid advancement of large language models (LLMs), their deployment in real-world applications has become increasingly widespread. LLMs are expected to deliver robust performance across diverse tasks, user preferences, and practical scenarios. However, as demands grow, ensuring that LLMs produce responses aligned with human intent remains a foundational challenge. In particular, aligning model behavior effectively and efficiently during inference, without costly retraining or extensive supervision, is both a critical requirement and a non-trivial technical endeavor. To address the challenge, we propose SDA (Steering-Driven Distribution Alignment), a training-free and model-agnostic alignment framework designed for open-source LLMs. SDA dynamically redistributes model output probabilities based on user-defined alignment instructions, enhancing alignment between model behavior and human intents without fine-tuning. The method is lightweight, resource-efficient, and compatible with a wide range of open-source LLMs. It can function independently during inference or be integrated with training-based alignment strategies. Moreover, SDA supports personalized preference alignment, enabling flexible control over the model’s response behavior. Empirical results demonstrate that SDA consistently improves alignment performance across 8 open-source LLMs with varying scales and diverse origins, evaluated on three key alignment dimensions, helpfulness, harmlessness, and honesty (3H). Specifically, SDA achieves average gains of 64.4% in helpfulness, 30% in honesty and 11.5% in harmlessness across the tested models, indicating its effectiveness and generalization across diverse models and application scenarios.

AAAI 2026

SDA: Steering-Driven Distribution Alignment for Open LLMs Without Fine-Tuning

nlp: prompt engineering / prompting

nlp: (large) language models

nlp: generation

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large Vision-Language Models (LVLMs) excel in multimodal reasoning and have shown impressive performance on various multimodal benchmarks. However, most of these benchmarks evaluate models primarily through multiple-choice or short-answer formats, which do not take the reasoning process into account. Although some benchmarks assess the reasoning process, their methods are often overly simplistic and only examine reasoning when answers are incorrect. This approach overlooks scenarios where flawed reasoning leads to correct answers. In addition, these benchmarks do not consider the impact of intermodal relationships on reasoning. To address this issue, we propose the Reasoning Process Tree Score (RPTS), a tree structure-based metric to assess reasoning processes. Specifically, we organize the reasoning steps into a reasoning tree and leverage its hierarchical information to assign weighted faithfulness scores to each reasoning step. By dynamically adjusting these weights, RPTS not only evaluates the overall correctness of the reasoning, but also pinpoints where the model fails in the reasoning. To validate RPTS in real-world multimodal scenarios, we construct a new benchmark, RPTS-Eval, comprising 374 images and 390 reasoning instances. Each instance includes reliable visual-textual clues that serve as leaf nodes of the reasoning tree. Furthermore, we define three types of intermodal relationships to investigate how intermodal interactions influence the reasoning process. We evaluated representative LVLMs (e.g., GPT4o, Llava-Next), uncovering their limitations in multimodal reasoning and highlighting the differences between open-source and closed-source commercial LVLMs. We believe that this benchmark will contribute to the advancement of research in the field of multimodal reasoning.

RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation

Graph-based incomplete multi-view clustering algorithms have gathered much attention due to their impressive clustering performance. However, existing methods primarily leverage intra-view correlation from observed views, while ignoring the exploration of explicit compensation relationships between different views. Moreover, these methods need post-processing to get labels, and the separate steps lack negotiation, which may lead to sub-optimal solutions. To address these issues, we propose a Cross-view Anchor Graph Learning and Factorization (AGLF) method. AGLF develops an Anchor Graph Completion (AGC) framework that explicitly learn the missing subgraph structures. Instead of requiring post-processing, AGC directly produces soft labels. By establishing a third-order tensor of soft labels, it employs the tensor Schatten $p$-norm to enhance anchor graph learning and factorization. To significantly improve the quality of subgraph learning, AGLF incorporates compensation subgraphs from supplementary views into the AGC framework, enabling the construction of a better anchor graph for label learning. An optimization algorithm is devised to solve the objective function. Experimental results across various datasets demonstrate the effectiveness of our method.

Cross-view Anchor Graph Learning and Factorization for Incomplete Multi-view Clustering

Event cameras provide microsecond latency and high dynamic range, making them ideal for 3D perception tasks in traffic scenes with challenging lighting conditions. Yet existing methods often struggle to generalize to out-of-domain environments due to the limited availability of diverse training data. While synthetic data offers an easily accessible alternative, it introduces a significant sim-to-real gap, particularly in motion patterns. We tackle this challenge by introducing Motion-Adaptation Mamba (MA-Mamba), a dual-track framework that advances both architecture and data augmentation. At the architectural level, we introduce a lightweight Spatio-Temporal Association module that captures motion-induced appearance variations at arbitrary scales, and an Adaptive Memory Balancing module, built on the Mamba state-space framework, that adaptively filters memory updates to maintain stable scene context under diverse dynamics. At the data level, we design event-oriented augmentations that simulate varied motion patterns and apply priority-based masked sequence modeling to strengthen long-range spatio-temporal reasoning. Trained solely on synthetic data, MA-Mamba delivers substantial zero-shot gains on multiple real-world benchmarks, demonstrating strong robustness and generalizability.

Towards Robust Event-Based Depth Estimation: Bridging Synthetic and Real Domains with Motion Adaptation

The Minimum Consistent Subset (MCS) problem arises naturally in the context of supervised clustering and instance selection, both of which are critical in enabling scalable and interpretable learning on large datasets. In supervised clustering, one aims to infer a meaningful partitioning of data using a small labeled subset, where classification is typically performed via nearest neighbor rules. However, the sheer volume of training data in modern applications poses a significant computational challenge. The MCS problem formalizes this goal: given a labeled dataset $\mathcal{X}$ in a metric space, the task is to compute a smallest subset $S \subseteq \mathcal{X}$ such that every point in $\mathcal{X}$ shares its label with at least one of its nearest neighbors in $S$.

Recently, the MCS problem has been extended to $\textit{graph metrics}$, where distances are defined by shortest paths. Prior work has shown that MCS remains NP-hard even on simple graph classes like trees, though an algorithm with runtime $\mathcal{O}(2^{6c} \cdot n^6)$ is known for trees, where $c$ is the number of colors and $n$ the number of vertices. This raises the challenge of identifying graph classes that admit algorithms efficient in both $n$ and $c$.

In this work, we study the Minimum Consistent Subset problem on graphs, focusing on two well-established measures: the vertex cover number ($vc$) and the neighborhood diversity ($nd$). Specifically, we design efficient algorithms for graphs exhibiting small $vc$ or small $nd$, which frequently arise in real-world domains characterized by local sparsity or repetitive structure.
These parameters are particularly relevant because they capture structural properties that often correlate with the tractability of otherwise hard problems. Graphs with small vertex cover are "almost independent sets", representing sparse interactions, while graphs with small neighborhood diversity exhibit a high degree of symmetry and regularity. Importantly, small neighborhood diversity can occur even in dense graphs, a property frequently observed in domains such as social networks with modular communities or knowledge graphs with repeated relational patterns. Thus, algorithms designed to work efficiently for graphs with small neighborhood diversity are capable of efficiently solving MCS in complex settings where small vertex covers may not exist.

We develop an algorithm with running time $vc^{O(vc)}\cdot \text{Poly}(n,c)$, and another algorithm with runtime $nd^{O(nd)}\cdot \text{Poly}(n,c)$. In the language of parameterized complexity, this implies that MCS is fixed-parameter tractable (FPT) parameterized by the vertex cover number and the neighborhood diversity. Notably, our algorithms remain efficient for arbitrarily many colors, as their complexity is polynomially dependent on the number of colors.

Learning with Structure: Computing Consistent Subsets on Structurally-Regular Graphs

Aiming to overcome distribution shift and label sparsity that hinder cross-domain generalization of Graph Neural Networks, Unsupervised Graph Domain Adaptation (UGDA) transfers knowledge from a label-rich source to an unlabeled target graph. Yet in practice, strict privacy protocols often withhold the source graph entirely, reducing UGDA to the far more constrained Source-Free UGDA (SFUGDA) where only a pre-trained source GNN remains. Despite recent progress, existing source-free UGDA methods remain hampered by source-knowledge absence: deprived of source graphs, they lose the reference distribution needed to gauge domain shift and must lean on noisy target cues, incurring biased adaptation and catastrophic forgetting. To overcome this drawback, this paper devise SFGAR, a two-stage SFUGDA framework that first generates pseudo-source graphs to recover the source distribution encoded in a frozen pre-trained GNN, then adversarially aligns these synthetic graphs with the unlabeled target. Theoretical analysis shows that this proxy alignment tightly bounds the target-domain generalization error. Extensive experiments on public benchmarks validate the state-of-the-art performance of SFGAR.

Source-Free Graph Foundation Model Adaptation via Pseudo-Source Reconstruction

With the widespread application of Large Language Models (LLMs), it has become a significant concern to ensure their safety and prevent harmful responses. While current safe-alignment methods based on instruction fine-tuning and Reinforcement Learning from Human Feedback (RLHF) can effectively reduce harmful responses from LLMs, they often require high-quality datasets and heavy computational overhead during model training. Another way to align language models is to modify the logit of tokens in model outputs without heavy training. Recent studies have shown that contrastive decoding can enhance the performance of language models by reducing the likelihood of confused tokens. However, these methods require the manual selection of contrastive models or instruction templates, limiting the degree of contrast. To this end, we propose Adversarial Contrastive Decoding (ACD), an optimization-based framework to generate two opposite soft system prompts, the Safeguarding Prompt (SP) and the Adversarial Prompt (AP), for prompt-based contrastive decoding. The SP aims to promote safer outputs while the AP aims to exploit the harmful parts of the model, providing a strong contrast to align the model with safety. ACD only needs to apply a lightweight prompt tuning on a rather small anchor dataset without training the target model. Experiments conducted on extensive models and benchmarks demonstrate that the proposed method achieves much better safety performance than previous model training-free decoding methods without sacrificing its original generation ability.

Safety Alignment of Large Language Models via Contrasting Safe and Harmful Distributions

We study the problem of (approximate) maximin share
(MMS) allocation of indivisible items among a set of agents.
We focus on the graphical valuation model, previously stud-
ied in (Christodoulou et al. 2023), in which the input is given
by a graph where edges correspond to items, and vertices
correspond to agents. An edge may have non-zero marginal
value only for its incident vertices. We study additive, XOS
and subadditive valuations and we present positive and neg-
ative results for (approximate) MMS fairness, and also for
(approximate) pair-wise maximin share (PMMS) fairness.

Exact and Approximate Maximin Share Allocations in Multi-Graphs

Context-based Offline Meta Reinforcement Learning (COMRL) has shown promising results in improving the cross-task generalization ability of meta-policies. However, current methods often lead to entangled task representations, in which each latent dimension is influenced by multiple causal factors that govern variations in environment dynamics and reward mechanisms. This entanglement can degrade generalization performance, particularly when multiple causal factors vary simultaneously across tasks. To address this limitation, we propose CAusally disentangled TAsk representation Learning (CATAL) method for COMRL that aims to improve the generalization ability of the meta-policy, where each latent dimension in the task representations aligns to a single causal factor.Theoretically, we show that under mild conditions, the task representations learned by CATAL are causally disentangled. Empirically, extensive results on multi-task MuJoCo benchmarks show that CATAL consistently outperforms existing COMRL baselines in both in-distribution and out-of-distribution generalization.

CATAL: Causally Disentangled Task Representation Learning for Offline Meta-Reinforcement Learning

Recent advances in software vulnerability detection have been driven by Language Model (LM)-based approaches. However, these models remain vulnerable to adversarial attacks that exploit lexical and syntax perturbations, allowing critical flaws to evade detection. Existing black-box attacks on LM-based vulnerability detectors primarily rely on isolated perturbation strategies, limiting their ability to efficiently explore the adversarial code space for optimal perturbations. To bridge this gap, we propose HogVul, a black-box adversarial code generation framework that integrates both lexical and syntax perturbations under a unified dual-channel optimization strategy driven by Particle Swarm Optimization (PSO). By systematically coordinating two-level perturbations, HogVul effectively expands the search space for adversarial examples, enhancing the attack efficacy. Extensive experiments on four benchmark datasets demonstrate that HogVul achieves an average attack success rate improvement of 26.05% over state-of-the-art baseline methods. These findings highlight the potential of hybrid optimization strategies in exposing model vulnerabilities.

HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors

This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines which includes statements on the bounds of generalisation and approximation errors, and sample complexity. For classification problems, this approach allows us to learn bounded geometric structures around given data points and hence solve the global model learning problem in an efficient way by exploiting convexity properties of the related optimisation problem in a Reproducing Kernel Hilbert Space (RKHS). In this way, we can reduce classification problems to determining the closest bounded geometric structure from a given data point. Further advantages that come with our solution is that our approach does not require clients to perform multiple epochs of local optimisation using stochastic gradient descent, nor require rounds of communication between client/server for optimising the global model. We highlight that numerous experiments have shown that the proposed method is a competitive alternative to the state-of-the-art.

Downloads

Next from AAAI 2026

RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES