Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 22, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The rapid advancement of Large Vision Language Models (LVLMs) has demonstrated excellent abilities in various visual tasks. Building upon these developments, the \textit{thinking with images} paradigm has emerged, enabling models to dynamically edit and re-encode visual information at each reasoning step, mirroring human visual processing. However, this paradigm also introduces significant challenges as diverse errors may occur during reasoning processes. This naturally necessitates Process Reward Models (PRMs) as an essential pattern for distinguishing positive and negative reasoning steps, yet existing benchmarks for PRMs are predominantly text-centric and lack comprehensive assessment of PRMs' capabilities under this paradigm,~\ourbench. To address these gaps, this work introduces the first comprehensive benchmark specifically designed for evaluating PRMs under \textit{thinking with images} paradigm. Our main contributions are as follows: (1) Through extensive analysis of reasoning trajectories under \textit{thinking with images} paradigm and guided search experiments with PRMs, we define 7 fine-grained error types and demonstrate both the necessity for specialized PRMs and the potential for improvement. (2) We construct and curate a comprehensive benchmark comprising 1,134 manually annotated high-quality thinking with images reasoning trajectories spanning 4 categories and 16 subcategories for fine-grained evaluation of PRMs. (3) Our comprehensive experimental analysis reveals that current LVLMs fall short as effective PRMs, exhibiting limited capabilities in visual reasoning process evaluation with significant performance disparities across error types, consistent positive evaluation bias, and notable sensitivity to reasoning step positions. These findings demonstrate the effectiveness of our benchmark and establish crucial foundations for advancing PRMs in LVLMs.

Downloads

Paper

Next from AAAI 2026

Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers
poster

Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers

AAAI 2026

+1
Siqi Huang and 3 other authors

22 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved