Singapore

As diffusion probabilistic models (DPMs) become central to Generative AI (GenAI), understanding their memorization behavior is essential for evaluating risks such as data leakage, copyright infringement, and trustworthiness. While prior research finds conditional DPMs highly susceptible to data extraction attacks using explicit prompts, unconditional models are often assumed to be safe. We challenge this view by introducing \textbf{Surrogate condItional Data Extraction (SIDE)}, a general framework that constructs data-driven surrogate conditions to enable targeted extraction from any DPM. Through extensive experiments on CIFAR-10, CelebA, ImageNet, and LAION-5B, we show that SIDE can successfully extract training data from so-called safe unconditional models, outperforming baseline attacks even on conditional models. Complementing these findings, we present a unified theoretical framework based on informative labels, demonstrating that all forms of conditioning, explicit or surrogate, amplify memorization. Our work redefines the threat landscape for DPMs, establishing precise conditioning as a fundamental vulnerability and setting a new, stronger benchmark for model privacy evaluation.

AAAI 2026

SIDE: Surrogate Conditional Data Extraction from Diffusion Models

poster

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-26 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.<br><br>

To access this event page, you need to log in with the **email address you registered with**. <br>Access credentials will be sent to your email from Underline -  subject line "Welcome to AAAI 2026". Please be sure to check your spam email folder if you do not see an email confirmation right away.

Please log in

To access this event page, you are required to register.
Please complete your registration to continue.

We recommend reading [**the registration information**](https://aaai.org/conference/aaai/aaai-26/registration/) first.

**Online Registration Form**: https://aaai.getregistered.net/conference-2026 

Registration Required

We are pleased to announce the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), which will be held in Singapore EXPO from January 20 to January 27, 2026.

Although deep learning-based methods have achieved promising performance in Pansharpening, they generally suffer from severe performance degradation when applied to data from unseen sensors. Existing cross-domain strategies, including retraining, fine-tuning, and zero-shot methods, fail to simultaneously preserve model architecture and maintain low adaptation costs. Therefore, we are the first to define and address a novel task in the pansharpening field: enhancing a model's cross-sensor generalization at an extremely low cost while keeping the model architecture invariant. To tackle this task, we propose SWIFT (Sensitive Weight Identification for Fast Transfer), a plug-and-play framework. SWIFT first employs an unsupervised manifold-based sampling strategy to efficiently select a high-fidelity subset the most informative target-domain samples. It then leverages this subset to probe a source-domain pre-trained model, identifying and updating only the weight subset most sensitive to the domain shift by analyzing the gradient behavior of its parameters. Extensive experiments demonstrate that SWIFT can be applied to various deep learning models, boosting adaptation efficiency by up to \textit{30-fold}. On a single NVIDIA RTX 4090 GPU, this reduces adaptation time from hours to as little as one minute. The adapted models not only substantially outperform direct-transfer baselines but also achieve performance competitive with, or even superior to full retraining while using only\textit{ 3\%} of the target domain dataset and adapting nearly 10\% to 30\% of the model’s parameters. This establishs a new state-of-the-art on the WorldView-2 and QuickBird datasets.

SWIFT：A General Sensitive Weight Identification Framework for Fast Sensor-Transfer Pansharpening

Despite extensive theoretical research on proportionality in approval-based multiwinner voting, its implications for which committees can be selected in practical elections remain poorly understood. We address this gap by (i) analyzing the computational complexity of several natural problems related to the behavior of proportionality axioms, and (ii) conducting an extensive experimental study on both real-world and synthetic elections. Our findings reveal substantial variation in the restrictiveness of proportionality across instances, including previously unobserved high levels of restrictiveness in some real-world cases. We also introduce and evaluate novel measures for quantifying a candidate's importance for achieving proportional outcomes, demonstrating that they clearly differ from traditional approval score–based assessments.

Understanding the Impact of Proportionality in Approval-Based Multiwinner Elections

Multi-object tracking (MOT) predominantly follows the tracking-by-detection paradigm, where motion prediction serves as a critical component for maintaining tracking continuity and handling occlusions. While Kalman filter have been the standard motion predictor due to their computational efficiency, they inherently fail on non-linear motion patterns. Conversely, recent data-driven motion predictors capture complex non-linear dynamics but suffer from limited domain generalization and computational overhead. Through extensive analysis, we reveal that even in datasets dominated by non-linear motion, Kalman filter outperforms data-driven predictors in up to 34\% of cases, demonstrating that real-world tracking scenarios inherently involve both linear and non-linear patterns. To leverage this complementarity, we propose PlugTrack, a novel framework that adaptively fuses Kalman filter and data-driven motion predictors through multi-perceptive motion understanding. Our approach employs multi-perceptive motion analysis through temporal patterns, prediction discrepancies, and uncertainty quantification to generate adaptive blending factors. Without architectural modifications to existing motion predictors, PlugTrack achieves significant performance gains on MOT17/MOT20, and attains state-of-the-art performance on DanceTrack. To the best of our knowledge, PlugTrack is the first framework to bridge classical and modern motion prediction paradigms through adaptive fusion in MOT.

PlugTrack: Multi-Perceptive Motion Analysis for Adaptive Fusion in Multi-Object Tracking

Generating high-fidelity full-body human interactions with dynamic objects and static scenes remains a critical challenge in computer graphics and animation. Existing methods for human-object interaction often neglect scene context, leading to implausible penetrations, while human-scene interaction approaches struggle to coordinate fine-grained manipulations with long-range navigation. To address these limitations, we propose HOSIG, a novel framework for synthesizing full-body interactions through hierarchical scene perception. Our method decouples the task into three key components: 1) a scene-aware grasp pose generator that ensures collision-free whole-body postures with precise hand-object contact by integrating local geometry constraints, 2) a heuristic navigation algorithm that autonomously plans obstacle-avoiding paths in complex indoor environments via compressed 2D floor maps and dual-component spatial reasoning, and 3) a scene-guided motion diffusion model that generates trajectory-controlled, full-body motions with finger-level accuracy by incorporating spatial anchors and dual-space classifier-free guidance. Extensive experiments on the TRUMANS dataset demonstrate superior performance over state-of-the-art methods. Notably, our framework supports unlimited motion length through autoregressive generation and requires minimal manual intervention. This work bridges the critical gap between scene-aware navigation and dexterous object manipulation, advancing the frontier of embodied interaction synthesis. Codes will be available after publication.

HOSIG: Full-Body Human-Object-Scene Interaction Generation with Hierarchical Scene Perception

Score Distillation Sampling (SDS) enables 3D asset generation by distilling priors from pretrained 2D text-to-image diffusion models, but vanilla SDS suffers from over-saturation and over-smoothing. To mitigate this issue, recent variants have incorporated negative prompts. However, these methods face a critical trade-off: limited texture optimization, or significant texture gains with shape distortion. In this work, we first conduct a systematic analysis and reveal that this trade-off is fundamentally governed by the utilization of the negative prompts, where Target Negative Prompts (TNP) that embed target information in the negative prompts dramatically enhancing texture realism and fidelity but inducing shape distortions. Informed by this key insight, we introduce the Target-Balanced Score Distillation (TBSD). It formulates generation as a multi-objective optimization problem and introduces an adaptive strategy that effectively resolves the aforementioned trade-off. Extensive experiments demonstrate that TBSD significantly outperforms existing state-of-the-art methods, yielding 3D assets with high-fidelity textures and geometrically accurate shape. Our code is available at https://anonymous.4open.science/r/TBSD-8A62.

Target-Balanced Score Distillation

In industrial anomaly detection, the scarcity of diverse defective samples poses a major challenge to training robust and scalable models. To address this, we propose an efficient few-shot training framework for synthesizing industrial anomalies using diffusion models. Unlike prior generative methods that rely on redundant or semantically meaningless prompts (e.g., "sks"), our method leverages only normal data with minimal textual guidance. We build upon the Stable Diffusion 3 architecture and introduce lightweight architectural adaptations and a curated training strategy guided by vision-language models (VLMs). Our method generates realistic and diverse anomalies aligned with interpretable prompts such as "scratch" or "broken component", and further allows spatial localization through prompt engineering. During inference, we adopt a multi-prompt strategy with attention modulation to enable precise and controllable anomaly synthesis. Experimental results demonstrate that our synthesized anomalies significantly enhance downstream anomaly detection performance and exhibit strong generalization across various industrial categories, even under limited supervision.

CHIMERA: Controllable High-quality Image-Mask Extraction for Reliable Diffusion-based Anomaly Synthesis

Current video generation models perform well at single-shot synthesis but struggle with multi-shot videos, facing critical challenges in maintaining character and background consistency across shots and flexibly generating videos of arbitrary length and shot count. To address these limitations, we introduce \textbf{FilmWeaver}, a novel framework designed to generate consistent, multi-shot videos of arbitrary length. First, it employs an autoregressive diffusion paradigm to achieve arbitrary-length video generation. To address the challenge of consistency, our key insight is to decouple the problem into inter-shot consistency and intra-shot coherence. We achieve this through a dual-level cache mechanism: a shot memory caches keyframes from preceding shots to maintain character and scene identity, while a temporal memory retains a history of frames from the current shot to ensure smooth, continuous motion. The proposed framework allows for flexible, multi-round user interaction to create multi-shot videos. Furthermore, due to this decoupled design, our method demonstrates high versatility by supporting downstream tasks such as multi-concept injection and video extension. To facilitate the training of our consistency-aware method, we also developed a comprehensive pipeline to construct a high-quality multi-shot video dataset. Extensive experimental results demonstrate that our method surpasses existing approaches on metrics for both consistency and aesthetic quality, opening up new possibilities for creating more consistent, controllable, and narrative-driven video content.

FilmWeaver: Weaving Consistent Multi-Shot Videos with Cache-Guided Autoregressive Diffusion

Underwater object detection presents significant challenges due to the unique visual degradations in underwater environments, such as low contrast, poor visibility, and blurry object boundaries. While ANNs have achieved impressive detection accuracy, their high computational cost and power consumption limit their deployment in resource-constrained underwater platforms. In this work, we propose a Spatial-Frequency Spiking Neural Network (SFSNN) that combines the energy-efficient and event-driven nature of Spiking Neural Networks (SNNs) with the discriminative power of spatial-frequency analysis. SFSNN introduces a novel spatial-frequency spiking module that integrates spatial and frequency-domain representations, enhancing edge and texture features crucial for object detection in murky waters. Furthermore, we adapt the YOLOX architecture into a spike-based detector via ANN-to-SNN conversion using signed spiking neurons. Extensive experiments on the RUOD dataset demonstrate that SFSNN achieves superior performance over both SNN- and ANN-based detection models, offering a compelling solution for low-power underwater object detection.

Spatial-Frequency Spiking Neural Network for Underwater Object Detection

As generative audio models are rapidly evolving, AI-generated audios increasingly raise concerns about copyright infringement and misinformation spread. Audio watermarking, as a proactive defense, can embed secret messages into audio for copyright protection and source verification. However, current neural audio watermarking methods focus primarily on the imperceptibility and robustness of watermarking, while ignoring its vulnerability to security attacks. In this paper, we develop a simple yet powerful attack: the overwriting attack that overwrites the legitimate audio watermark with a forged one and makes the original legitimate watermark undetectable. Based on the audio watermarking information that the adversary has, we propose three categories of overwriting attacks, i.e., white-box, gray-box, and black-box attacks. We also thoroughly evaluate the proposed attacks on state-of-the-art neural audio watermarking methods. Experimental results demonstrate that the proposed overwriting attacks can effectively compromise existing watermarking schemes across various settings and achieve a nearly 100% attack success rate. The practicality and effectiveness of the proposed overwriting attacks expose security flaws in existing neural audio watermarking systems, underscoring the need to enhance security in future audio watermarking designs.

Yours or Mine? Overwriting Attacks Against Neural Audio Watermarking

The Lipschitz bandit is a key variant of stochastic bandit problems where the expected reward function satisfies a Lipschitz condition with respect to an arm metric space. With its wide-ranging practical applications, various Lipschitz bandit algorithms have been developed, achieving the optimal regret performance in the classical setting. Motivated by recent advancements in quantum computing and the demonstrated success of quantum Monte Carlo in simpler bandit settings, we introduce the first quantum Lipschitz bandit algorithms to address the challenges of continuous action spaces and non-linear reward functions. Specifically, we first leverage the elimination-based framework to propose an efficient quantum Lipschitz bandit algorithm named Q-LAE. Next, we present novel modifications to the classical Zooming algorithm, which results in a simple quantum Lipschitz bandit method, Q-Zooming. Both algorithms exploit the computational power of quantum methods to obtain a provably improved regret bound over classical Lipschitz bandit algorithms. Comprehensive experiments further validate our improved theoretical findings, demonstrating superior empirical performance compared to existing Lipschitz bandit methods.

Content not yet available

Downloads

Next from AAAI 2026

SWIFT：A General Sensitive Weight Identification Framework for Fast Sensor-Transfer Pansharpening

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Content not yet available

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from AAAI 2026

SWIFT：A General Sensitive Weight Identification Framework for Fast Sensor-Transfer Pansharpening

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads