AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to process multiple images as inputs. However, the vulnerabilities of multi-image MLLMs remain unexplored. Existing adversarial attacks focus on single-image settings and often assume a white-box threat model which is impractical in many real-world scenarios. This paper introduces LAMP, a black-box method for learning UAPs targeting multi-image MLLMs. LAMP applies an attention-based constraint that which prevents the model from effectively aggregating information across images. LAMP also introduces a novel cross-image contagious constraint that forces perturbed tokens to influence clean tokens to spread adversarial effects without requiring all inputs to be modified. Additionally, an index-attention suppression loss creates a robust position invariant attack. Experimental results show that LAMP outperforms SOTA baselines and achieves the highest attack success rates across multiple vision-language tasks.

Downloads

SlidesPaperTranscript English (automatic)

Next from AAAI 2026

Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks
poster

Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks

AAAI 2026

Zhiyuan He

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved