Singapore

Identity-preserving models have led to notable progress in generating personalized content. Unfortunately, such models also exacerbate risks when misused, for instance, by generating threatening content targeting specific individuals. This paper introduces the Attribute Misbinding Attack, a novel method that poses a threat to identity-preserving models by inducing them to produce Not-Safe-For-Work (NSFW) content. The attack&#39;s core idea involves crafting benign-looking textual prompts to circumvent text-filter safeguards and leverage a key model vulnerability: flawed attribute binding that stems from its internal attention bias. This results in misattributing harmful descriptions to a target identity and generating NSFW outputs. To facilitate the study of this attack, we present the Misbinding Prompt evaluation set, which examines the content generation risks of current state-of-the-art identity-preserving models across four risk dimensions: pornography, violence, discrimination, and illegality. Additionally, we introduce the Attribute Binding Safety Score (ABSS), a metric for concurrently assessing both content fidelity and safety compliance. Experimental results show that our Misbinding Prompt evaluation set achieves a 5.28 \% higher success rate in bypassing five leading text filters (including GPT-4o) compared to existing main-stream evaluation sets, while also demonstrating the highest proportion of NSFW content generation. The proposed ABSS metric enables a more comprehensive evaluation of identity-preserving models by concurrently assessing both content fidelity and safety compliance. The dataset and code will be open-sourced. For further experimental data and anonymous open-source links, please see the appendix.

Disclaimer: This paper contains NSFW imagery that might be offensive to some readers.

AAAI 2026

Unveiling the Attribute Misbinding Threat in Identity-Preserving Models

cv: diffusion models for vision

app: security

app: misinformation & fake news

Identity-preserving models have led to notable progress in generating personalized content. Unfortunately, such models also exacerbate risks when misused, for instance, by generating threatening content targeting specific individuals. This paper introduces the Attribute Misbinding Attack, a novel method that poses a threat to identity-preserving models by inducing them to produce Not-Safe-For-Work (NSFW) content. The attack's core idea involves crafting benign-looking textual prompts to circumvent text-filter safeguards and leverage a key model vulnerability: flawed attribute binding that stems from its internal attention bias. This results in misattributing harmful descriptions to a target identity and generating NSFW outputs. To facilitate the study of this attack, we present the Misbinding Prompt evaluation set, which examines the content generation risks of current state-of-the-art identity-preserving models across four risk dimensions: pornography, violence, discrimination, and illegality. Additionally, we introduce the Attribute Binding Safety Score (ABSS), a metric for concurrently assessing both content fidelity and safety compliance. Experimental results show that our Misbinding Prompt evaluation set achieves a 5.28 \% higher success rate in bypassing five leading text filters (including GPT-4o) compared to existing main-stream evaluation sets, while also demonstrating the highest proportion of NSFW content generation. The proposed ABSS metric enables a more comprehensive evaluation of identity-preserving models by concurrently assessing both content fidelity and safety compliance. The dataset and code will be open-sourced. For further experimental data and anonymous open-source links, please see the appendix.

Disclaimer: This paper contains NSFW imagery that might be offensive to some readers.

technical paper

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Conversational AI agents are envisioned to provide social
support or functional service to human users via natural
language interactions. The popularity of conversational AI
has grown unprecedentedly with the advent of ChatGPT, which
showcases exceptional proficiency in the capabilities of
context understanding and response generation with large
language models (LLMs). However, typical conversational
systems are built to follow instructions, which means that
the conversation is led by the user, and the system simply
follows the user’s instructions or intents. My research
endows the conversational AI with the capabilities of
creating or controlling the conversation to achieve the
conversational goals by taking initiative and anticipating
impacts on themselves or human users, namely Proactive
Conversational AI. I will also highlight the importance of
moving towards building human-centered proactive
conversational AI that emphasize human needs and
expectations, and that considers ethical and social
implications of these agents, rather than solely focusing
on technological capabilities.

Towards Human-centered Proactive Conversational AI

Understanding opinion evolution in complex social networks is crucial for modeling social influence and predicting collective behavior. Yet, most models overlook how community structures shape opinion updates, often assuming homogeneous influence. This abstraction neglects individuals’ stronger responsiveness to intra-community peers—an empirically observed driver of localized consensus and inter-group polarization. We propose GCAOFP, a co-evolutionary framework that jointly models opinion dynamics and community formation as an integrated process. In GCAOFP, agents strategically alternate between two coupled modules: (1) a Community Dynamics Module, where agents play a non-cooperative game to optimize community memberships based on opinion alignment and structural cohesion; and (2) an Opinion Adjustment Module, where agents revise opinions via a bounded-confidence mechanism modulated by community-induced influence weights. This dual-stage process captures the feedback loop between structure and opinion. We prove that GCAOFP converges to stable equilibria, ensuring intra-community consensus and inter-community diversity—dynamics that standard models fail to replicate. Experiments on real-world networks show that GCAOFP better reproduces localized opinion clusters, while offering strong scalability and interpretability, illuminating the strategic foundations of polarization.

Game Theory Based Community-Aware Opinion Dynamics

As artificial intelligence (AI) becomes increasingly prevalent in society, there is a critical need for accessible K-12 educational resources that introduce students to AI and robotics concepts through engaging, hands-on experiences. In this paper, we present a scalable workshop framework that uses narrative-driven problem solving to teach fundamental AI and autonomous systems concepts to students in grades 5-12. Developed through a collaboration between AI researchers and education specialists, Bot Blitz employs Sphero RVR+ robots within immersive storylines ranging from fairground rescue missions for younger students to urban traffic management scenarios for high schoolers. Preliminary observations from workshops with 56 students show high engagement levels and successful completion of programming challenges.

Bot Blitz: A Scalable Hands-On Workshop for Teaching AI and Robotics Concepts Through Narrative-Driven Problem Solving

Contrastive vision-language models like CLIP have achieved impressive results in image-text retrieval by aligning image and text representations in a shared embedding space. However, these models often treat text as flat sequences, limiting their ability to handle complex, compositional, and long-form descriptions. In particular, they fail to capture two essential properties of language: semantic hierarchy, which reflects the multi-level compositional structure of text, and semantic monotonicity, where richer descriptions should result in stronger alignment with visual content. To address these limitations, we propose HiMo-CLIP, a representation-level framework that enhances CLIP-style models without modifying the encoder architecture. HiMo-CLIP introduces two key components: a hierarchical decomposition (HiDe) module that extracts latent semantic components from long-form text via in-batch PCA, enabling flexible, batch-aware alignment across different semantic granularities, and a monotonicity-aware contrastive loss (MoLo) that jointly aligns global and component-level representations, encouraging the model to internalize semantic ordering and alignment strength as a function of textual completeness. These components work together to produce structured, cognitively aligned cross-modal representations. Experiments on multiple image-text retrieval benchmarks show that HiMo-CLIP consistently outperforms strong baselines, particularly under long or compositional descriptions.

HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment

Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms, resulting in harmful or inappropriate outputs. Such attacks, including jailbreaking and prompt injection, pose significant risks to the integrity and availability of LLMs in security-critical applications. This paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense mechanism that proactively identifies and neutralizes malicious components in input prompts before they are processed by the LLM. The APD framework integrates three key innovations: (1) a mutual information-based semantic decomposition method to isolate adversarial and benign prompt components, ensuring statistical independence; (2) a graph-based intent classification approach that leverages spectral analysis to detect malicious patterns in prompt semantics; and (3) a lightweight transformer-based classifier trained on real-world datasets of toxic and jailbreaking prompts, enabling efficient and accurate adversarial intent detection. Evaluated on diverse datasets containing adversarial prompts, APD demonstrates superior robustness, reducing harmful output generation by over 85% while maintaining negligible impact on model performance. The framework’s computational efficiency supports real-time deployment, making it a practical solution for securing LLMs.

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming LLMs into real-world recommendation systems due to semantic and behavioral misalignment. 
To bridge this gap, we propose Align$^3$GR, a novel framework that unifies token-level, behavior modeling-level, and preference-level alignment. Our approach introduces: Dual tokenization fusing user-item semantic and collaborative signals. Enhanced behavior modeling with bidirectional semantic alignment. Progressive DPO strategy combining self-play (SP-DPO) and real-world feedback (RF-DPO) for dynamic preference adaptation. Experiments show Align$^3$GR outperforms the SOTA baseline by +17.8\% in Recall@10 and +20.2\% in NDCG@10 on the public dataset, with significant gains in online A/B tests and full-scale deployment on an industrial large-scale recommendation platform.

Align³GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

While existing social bot detectors perform well on benchmarks, their robustness across diverse real-world scenarios remains limited due to unclear ground truth and varied misleading cues. In particular, the impact of shortcut learning, where models rely on spurious correlations instead of capturing causal task-relevant features, has received limited attention. To address this gap, we conduct an in-depth study to assess how detectors are influenced by potential shortcuts based on textual features, which are most susceptible to manipulation by social bots. We design a series of shortcut scenarios by constructing spurious associations between user labels and superficial textual cues to evaluate model robustness. Results show that shifts in irrelevant feature distributions significantly degrade social bot detector performance, with an average relative accuracy drop of 32 % in the baseline models. To tackle this challenge, we propose mitigation strategies based on large language models, leveraging counterfactual data augmentation. These methods mitigate the problem from data and model perspectives across three levels, including data distribution at both the individual user text and overall dataset levels, as well as model’s ability to extract causal information. Our strategies achieve an average relative performance improvement of 56 % under shortcut scenarios.

Bot Meets Shortcut: How Can LLMs Aid in Handling Unknown Invariance OOD Scenarios?

Modeling complex rigid motion across large spatiotemporal spans remains an unresolved challenge in dynamic reconstruction. Existing paradigms are mainly confined to short-term, small-scale deformation and offer limited consideration for physical consistency. This study proposes PMGS, focusing on reconstructing Projectile Motion via 3D Gaussian Splatting. The workflow comprises two stages: 1) Target Modeling: achieving object-centralized reconstruction through dynamic scene decomposition and an improved point density control; 2) Motion Recovery: restoring full motion sequences by learning per-frame SE(3) poses. We introduce an acceleration consistency constraint to bridge Newtonian mechanics and pose estimation, and design a dynamic simulated annealing strategy that adaptively schedules learning rates based on motion states. Futhermore, we devise a Kalman fusion scheme to optimize error accumulation from multi-source observations to mitigate disturbances. Experiments show PMGS’s superior performance in reconstructing high-speed nonlinear rigid motion compared to mainstream dynamic methods.

PMGS: Reconstruction of Projectile Motion Across Large Spatiotemporal Spans via 3D Gaussian Splatting

Large language models (LLMs) are increasingly applied to sequential decision-making through in-context learning (ICL), yet their effectiveness is highly sensitive to prompt quality. Effective prompts should meet three principles: focus on decision-critical information, provide step-level granularity, and minimize reliance on expert annotations through label efficiency. However, existing ICL methods often fail to satisfy all three criteria simultaneously. Motivated by these challenges, we introduce SkillGen, a skill-based ICL framework for structured sequential reasoning. It constructs an action-centric, domain-level graph from sampled trajectories, identifies high-utility actions via temporal-difference credit assignment, and retrieves step-wise skills to generate fine-grained, context-aware prompts. We further present a theoretical analysis showing that focusing on high-utility segments supports task identifiability and informs more effective ICL prompt design. Experiments on ALFWorld, BabyAI, and ScienceWorld, using both open-source and proprietary LLMs, show that SkillGen achieves consistent gains, improving progress rate by 5.9\%–16.5\% on average across models. The implementation of SkillGen is available at https://anonymous.4open.science/r/SkillGen-C2E1.

SkillGen: Learning Domain Skills for In-Context Sequential Decision Making

Scaling deep learning to massive and diverse internet data has driven remarkable breakthroughs in domains such as video generation and natural language processing.
Robot learning, however, has thus far failed to replicate this success and remains constrained by a scarcity of available data.
Learning from Videos (LfV) methods aim to address this data bottleneck by augmenting traditional robot data with large-scale internet video.
This video data provides foundational information regarding physical dynamics, behaviours, and tasks, and can be highly informative for general-purpose robots.

This survey systematically examines the emerging field of LfV.
We first outline essential concepts, including detailing fundamental LfV challenges such as distribution shift and missing action labels in video data.
Next, we comprehensively review current methods for extracting knowledge from large-scale internet video, overcoming LfV challenges, and improving robot learning through video-informed training.
The survey concludes with a critical discussion of future opportunities.
Here, we emphasize the need for scalable foundation model approaches that can leverage the full range of available internet video and enhance the learning of robot policies and dynamics models.
Overall, the survey aims to inform and catalyse future LfV research, driving progress towards general-purpose robots.

Downloads

Next from AAAI 2026

Towards Human-centered Proactive Conversational AI

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES