AAAI 2026 Main Conference

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Large Vision-Language Models (LVLMs) have transformed multi-modal understanding, excelling in tasks like image captioning and visual question answering by integrating visual and textual inputs. However, their robustness against adversarial attacks—particularly those exploiting both modalities—remains underexplored, posing risks to critical applications like autonomous driving and content moderation. Existing attacks focus on single modalities or require impractical white-box access, limiting their real-world relevance. In this paper, we introduce \textit{Multi-Modal Adversarial Synergy (MMAS)}, a groundbreaking framework that crafts universal, black-box multi-modal attacks against LVLMs. MMAS simultaneously generates a texture scale-constrained Universal Adversarial Perturbation (UAP) for images and a learnable prompt perturbation for text, optimized jointly using only model queries. The image perturbation, bounded by an $\ell_{\infty}$-norm, leverages wavelet-based texture constraints to ensure imperceptibility and robustness across diverse visual inputs. The text perturbation, constrained by an $\ell_2$-norm in the embedding space, maintains semantic coherence while steering outputs toward a target. A novel cross-modal regularization term aligns the perturbations’ gradient directions, enhancing their synergistic impact and transferability across tasks and models. Extensive experiments are conducted to verify the strong universal adversarial capabilities of our proposed attack with prevalent LVLMs, spanning a spectrum of tasks on various datasets, all achieved without delving into the details of the model structures.

Downloads

SlidesPaper

Next from AAAI 2026 Main Conference

Fair Domain Generalization: An Information-Theoretic View
technical paper

Fair Domain Generalization: An Information-Theoretic View

AAAI 2026 Main Conference

+2Dimitrios KolliasOya Celiktutan
Oya Celiktutan and 4 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved