Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/658c-xz20

poster

ACL 2024

August 14, 2024

Bangkok, Thailand

Modality-Aware Integration with Large Language Models for Knowledge-Based Visual Question Answering

keywords:

knowledge-based vqa

multimodal fusion

large language models

Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs). While several attempts have been proposed to leverage large language models (LLMs) as an implicit knowledge source, it remains challenging since LLMs may generate hallucinations. Moreover, multiple knowledge sources, e.g., images, KGs and LLMs, cannot be readily aligned for complex scenarios. To tackle these, we present a novel modality-aware integration with LLMs for KVQA ($\texttt{MAIL}$). It carefully leverages multimodal knowledge for both image understanding and knowledge reasoning. Specifically, $(i)$ we propose a two-stage prompting strategy with LLMs to densely embody the image into a scene graph with detailed visual features; $(ii)$ We construct a coupled concept graph by linking the mentioned entities with external facts. $(iii)$ A tailored pseudo-siamese graph medium fusion is designed for sufficient multimodal fusion. We utilize the shared mentioned entities in two graphs as mediums to bridge a tight inter-modal exchange, while maximally preserving insightful intra-modal learning by constraining the fusion within mediums. Extensive experiments show the superiority of $\texttt{MAIL}$.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression
poster

Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression

ACL 2024

+3Ze-Feng GaoPeiyu Liu
Peiyu Liu and 5 other authors

14 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved