Singapore

Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms, resulting in harmful or inappropriate outputs. Such attacks, including jailbreaking and prompt injection, pose significant risks to the integrity and availability of LLMs in security-critical applications. This paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense mechanism that proactively identifies and neutralizes malicious components in input prompts before they are processed by the LLM. The APD framework integrates three key innovations: (1) a mutual information-based semantic decomposition method to isolate adversarial and benign prompt components, ensuring statistical independence; (2) a graph-based intent classification approach that leverages spectral analysis to detect malicious patterns in prompt semantics; and (3) a lightweight transformer-based classifier trained on real-world datasets of toxic and jailbreaking prompts, enabling efficient and accurate adversarial intent detection. Evaluated on diverse datasets containing adversarial prompts, APD demonstrates superior robustness, reducing harmful output generation by over 85% while maintaining negligible impact on model performance. The framework’s computational efficiency supports real-time deployment, making it a practical solution for securing LLMs.

AAAI 2026

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

language and vision; multi-modal vision; applications

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming LLMs into real-world recommendation systems due to semantic and behavioral misalignment. 
To bridge this gap, we propose Align$^3$GR, a novel framework that unifies token-level, behavior modeling-level, and preference-level alignment. Our approach introduces: Dual tokenization fusing user-item semantic and collaborative signals. Enhanced behavior modeling with bidirectional semantic alignment. Progressive DPO strategy combining self-play (SP-DPO) and real-world feedback (RF-DPO) for dynamic preference adaptation. Experiments show Align$^3$GR outperforms the SOTA baseline by +17.8\% in Recall@10 and +20.2\% in NDCG@10 on the public dataset, with significant gains in online A/B tests and full-scale deployment on an industrial large-scale recommendation platform.

Align³GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

While existing social bot detectors perform well on benchmarks, their robustness across diverse real-world scenarios remains limited due to unclear ground truth and varied misleading cues. In particular, the impact of shortcut learning, where models rely on spurious correlations instead of capturing causal task-relevant features, has received limited attention. To address this gap, we conduct an in-depth study to assess how detectors are influenced by potential shortcuts based on textual features, which are most susceptible to manipulation by social bots. We design a series of shortcut scenarios by constructing spurious associations between user labels and superficial textual cues to evaluate model robustness. Results show that shifts in irrelevant feature distributions significantly degrade social bot detector performance, with an average relative accuracy drop of 32 % in the baseline models. To tackle this challenge, we propose mitigation strategies based on large language models, leveraging counterfactual data augmentation. These methods mitigate the problem from data and model perspectives across three levels, including data distribution at both the individual user text and overall dataset levels, as well as model’s ability to extract causal information. Our strategies achieve an average relative performance improvement of 56 % under shortcut scenarios.

Bot Meets Shortcut: How Can LLMs Aid in Handling Unknown Invariance OOD Scenarios?

Modeling complex rigid motion across large spatiotemporal spans remains an unresolved challenge in dynamic reconstruction. Existing paradigms are mainly confined to short-term, small-scale deformation and offer limited consideration for physical consistency. This study proposes PMGS, focusing on reconstructing Projectile Motion via 3D Gaussian Splatting. The workflow comprises two stages: 1) Target Modeling: achieving object-centralized reconstruction through dynamic scene decomposition and an improved point density control; 2) Motion Recovery: restoring full motion sequences by learning per-frame SE(3) poses. We introduce an acceleration consistency constraint to bridge Newtonian mechanics and pose estimation, and design a dynamic simulated annealing strategy that adaptively schedules learning rates based on motion states. Futhermore, we devise a Kalman fusion scheme to optimize error accumulation from multi-source observations to mitigate disturbances. Experiments show PMGS’s superior performance in reconstructing high-speed nonlinear rigid motion compared to mainstream dynamic methods.

PMGS: Reconstruction of Projectile Motion Across Large Spatiotemporal Spans via 3D Gaussian Splatting

Large language models (LLMs) are increasingly applied to sequential decision-making through in-context learning (ICL), yet their effectiveness is highly sensitive to prompt quality. Effective prompts should meet three principles: focus on decision-critical information, provide step-level granularity, and minimize reliance on expert annotations through label efficiency. However, existing ICL methods often fail to satisfy all three criteria simultaneously. Motivated by these challenges, we introduce SkillGen, a skill-based ICL framework for structured sequential reasoning. It constructs an action-centric, domain-level graph from sampled trajectories, identifies high-utility actions via temporal-difference credit assignment, and retrieves step-wise skills to generate fine-grained, context-aware prompts. We further present a theoretical analysis showing that focusing on high-utility segments supports task identifiability and informs more effective ICL prompt design. Experiments on ALFWorld, BabyAI, and ScienceWorld, using both open-source and proprietary LLMs, show that SkillGen achieves consistent gains, improving progress rate by 5.9\%–16.5\% on average across models. The implementation of SkillGen is available at https://anonymous.4open.science/r/SkillGen-C2E1.

SkillGen: Learning Domain Skills for In-Context Sequential Decision Making

Hyperparameter optimization (HPO) is a crucial step in achieving strong predictive performance. The impact of individual hyperparameters on model generalization is highly context-dependent, prohibiting a one-size-fits-all solution and requiring opaque automated machine learning (AutoML) systems to find optimal configurations. However, the black-box nature of most AutoML systems undermines user trust and discourages adoption. To address this, we propose a game-theoretic explainability framework for HPO that is based on Shapley values and interactions. Our approach provides an additive decomposition of a performance measure across hyperparameters, enabling local and global explanations of hyperparameters' contributions to generalization performance and their interactions. The framework, named HyperSHAP, offers insights into ablations, the tunability of learning algorithms, and optimizer behavior across different hyperparameter spaces. We demonstrate HyperSHAP's capabilities on various HPO benchmarks to analyze the interaction structure of the corresponding HPO problems, demonstrating its versatile applicability.

HyperSHAP: Shapley Values and Interactions for Explaining Hyperparameter Optimization

Image generation models trained on large datasets can synthesize high-quality images but often produce spatially inconsistent and distorted images due to limited information about the underlying structures and spatial layouts. In this work, we leverage intrinsic scene properties (e.g., depth, segmentation maps) that provide rich information about the underlying scene, unlike prior approaches that solely rely on image-text pairs or use intrinsics as conditional inputs. Our approach aims to co-generate both images and their corresponding intrinsics, enabling the model to implicitly capture the underlying scene structure and generate more spatially consistent and realistic images. Specifically, we first extract rich intrinsic scene properties from a large image dataset with pre-trained estimators, eliminating the need for additional scene information or explicit 3D representations. We then aggregate various intrinsic scene properties into a single latent variable using an autoencoder. Building upon pre-trained large-scale Latent Diffusion Models (LDMs), our method simultaneously denoises the image and intrinsic domains by carefully sharing mutual information so that the image and intrinsic reflect each other without degrading image quality. Experimental results demonstrate that our method corrects spatial inconsistencies and produces a more natural layout of scenes while maintaining the fidelity and textual alignment of the base model (e.g., Stable Diffusion).

Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models

Real-world sequential decision making problems often require parameterized action spaces that require both, decisions regarding discrete actions and decisions about continuous action parameters governing how an action is executed. However, existing approaches exhibit severe limitations when handling such parameterized action spaces---planning algorithms require hand-crafted action models, and reinforcement learning (RL) paradigms focus on either discrete or continuous actions but not both. This paper extends the scope of RL algorithms to long-horizon, spare-reward settings with parameterized actions through autonomously learned state and action abstractions. We present algorithms for online learning and flexible refinement of such abstractions during RL. Empirical results show that learning such abstractions on-the-fly enable $TD(\lambda)$ to significantly outperform state-of-the-art RL approaches in terms of sample efficiency across diverse problem domains with long horizons, continuous states, and parameterized actions.

Context-Sensitive Abstractions for Reinforcement Learning with Parameterized Actions

Precipitation nowcasting, a critical task for weather-sensitive applications, is highly challenging owing to the chaotic nature of atmospheric dynamics. Despite recent progress, existing deep learning methods are limited in their capacity to model turbulent motions, one of the key drivers of precipitation evolution. Thus, we propose MoCast, a novel physics-guided neural network that explicitly incorporates fluid dynamics knowledge to model and utilize turbulent motions for precipitation nowcasting. Specifically, inspired by the continuity equation for precipitation evolution, MoCast introduces two core innovations: (1) a physics-guided motion modeling module that learns turbulent motions from physically interpretable mean and fluctuating components based on Reynolds, Helmholtz, and Wavelet decomposition techniques, and (2) a motion-guided source-sink modeling module that learns source-sink features considering the multi-scale impact from turbulent motions based on a mixture-of-experts architecture. Extensive experiments on three real-world datasets demonstrate that MoCast achieves the state-of-the-art performance. MoCast and its diffusion-based variant MoCast+ reduce CSI error by an average of 4.9\% and 4.5\% compared to the best deterministic and probabilistic baselines, respectively, which highlights the significance of turbulence modeling for advancing meteorological AI.

MoCast: Learning Turbulent Motions Under Physical Guidance for Precipitation Nowcasting

Reproducibility is a cornerstone of scientific validation and of the authority it confers on its results. Reproducibility in machine learning evaluations leads to greater trust, confidence, and value. However, the ground truth responses used in machine learning often necessarily come from humans, among whom disagreement is prevalent, and surprisingly little research has studied the impact of effectively ignoring disagreement in these responses, as is typically the case. One reason for the lack of research is that budgets for collecting human-annotated evaluation data are limited, and obtaining more samples from multiple annotators for each example greatly increases the per-item annotation costs. We investigate the trade-off between the number of items ($N$) and the number of responses per item ($K$) needed for reliable machine learning evaluation. We analyze a diverse collection of categorical datasets for which multiple annotations per item exist, and simulated distributions fit to these datasets, to determine the optimal $(N, K)$ configuration, given a fixed budget ($N \times K$), for collecting evaluation data and reliably comparing the performance of machine learning models. Our findings show, first, that accounting for human disagreement may come with $N \times K$ at no more than 1000 (and often much lower) for every dataset tested on at least one metric. Moreover, this minimal $N \times K$ almost always occurred for $K > 10$. Furthermore, the nature of the tradeoff between $K$ and $N$---or if one even existed---depends on the evaluation metric, with metrics that are more sensitive to the full distribution of responses performing better at higher levels of $K$. Our methods can be used to help ML practitioners get more effective test data by finding the optimal metrics and number of items and annotations per item to collect to get the most reliability for their budget.

Forest vs Tree: The (N, K) Trade-off in Reproducible ML Evaluation

ATL and Strategy Logic (SL) are important languages for representation and reasoning about strategic abilities of coalitions in multi-agent systems. In analyzing strategies of agents in multi-agent systems, an important concept to consider is rationality. Strategy Logic can express rationality concepts such as Nash Equilibrium (NE). Recently, there has been work on logics for joint abilities incorporating rationality concepts based on iterated elimination of dominated strategies (IEDS). Each of NE and IEDS has its strengths and limitations. However, when the payoff is binary, e.g., whether a goal is satisfied, IEDS has more distinguishing power than NE. In this work, we propose Strategy Logic with IEDS ($SL_{IEDS}$), an extension of Strategy Logic with an IEDS operator, where we can reason about rational strategies that survive IEDS. We prove that $SL_{IEDS}$ is strictly more expressive than SL. Finally, we prove that model checking memoryless $SL_{IEDS}$ is EXPTIME-complete.

Downloads

Next from AAAI 2026

Align³GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

Align³GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads