AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Multimodal Large Language Models (MLLM) have enabled a wide range of advanced vision-language applications, including fine-grained object recognition and contextual understanding. When querying specific regions or objects in an image, human users naturally use "Visual Prompts" (VP) like bounding boxes to provide reference. However, no existing benchmark systematically evaluates the ability of MLLMs to interpret such VPs. This gap raises uncertainty about whether current MLLMs can effectively recognize VPs, an intuitive prompting method for humans, and utilize them to solve problems. To address this limitation, we introduce VP-Bench, aiming to assess MLLMs’ capability in VP perception and utilization. VP-Bench employs a two-stage evaluation framework: Stage 1 examines models’ ability to perceive VPs in natural scenes, utilizing 100K visualized prompts spanning 8 shapes and 355 attribute combinations. Stage 2 investigates the impact of VPs on downstream tasks, measuring their effectiveness in real-world problem-solving scenarios. Using VP-Bench, we evaluate 21 MLLMs, including proprietary systems (e.g., GPT-4o) and open-source models (e.g., InternVL-2.5 and Qwen2.5-VL). In addition, we conduct a comprehensive analysis of the factors influencing VP understanding, such as attribute variations and model scale. VP-Bench establishes a new reference framework for studying MLLMs’ ability to comprehend and resolve grounded referring questions.

Downloads

Paper

Next from AAAI 2026

HGLTR: Hierarchical Knowledge Injection for Calibrating Pre-trained Models in Long-Tail Recognition
poster

HGLTR: Hierarchical Knowledge Injection for Calibrating Pre-trained Models in Long-Tail Recognition

AAAI 2026

+4Songcan Chen
Songcan Chen and 6 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved