Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 25, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Mainstream multimodal large language models (MLLMs) rely on patch-based tokenization methods, which compromise the integrity of objects and thereby limit the model's perception capabilities while triggering object-related hallucinations. To address this issue, we propose ObjecTok, an innovative object tokenization framework. ObjecTok generates a single, holistic object token for each object in an image. This token is produced by a specially trained object encoder that embeds the object's semantic, positional, and shape information into a single compact representation, thereby preserving the object's integrity. To mitigate the imperfections of upstream object proposer models, we introduce learnable confidence embeddings. These embeddings enable the MLLM to learn the reliability of each object's information, significantly enhancing the model's robustness. Additionally, ObjecTok employs a hybrid input strategy, combining object tokens with traditional image patch tokens, allowing the model to leverage both object-level information and global scene context. By integrating ObjecTok into the LLaVA architecture, we achieve notable performance improvements on multiple object-centric benchmarks, effectively reducing object hallucinations and enhancing perception capabilities. Experimental results robustly demonstrate that the object tokens generated by our ObjecTok framework hold great potential for building more powerful and reliable MLLMs.

Downloads

Paper

Next from AAAI 2026

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?
poster

When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion?

AAAI 2026

+4Yupeng Hu
Yupeng Hu and 6 other authors

25 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved