Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/rm8y-qf88

poster

ACL 2024

August 12, 2024

Bangkok, Thailand

InfiMM: Advancing Multimodal Understanding with an Open-Sourced Visual Language Model

keywords:

multimodal; large language models; in-context learning

In this work, we present InfiMM, an advanced Multimodal Large Language Model that adapts to intricate vision-language tasks. InfiMM, inspired by the Flamingo architecture, distinguishes itself through the utilization of large-scale training data, comprehensive training strategies, and diverse large language models. This approach ensures the preservation of Flamingo's foundational strengths while simultaneously introducing augmented capabilities. Empirical evaluations across a variety of benchmarks underscore InfiMM's remarkable capability in multimodal understanding. The code can be found at: https://anonymous.4open.science/r/infimm-zephyr-F60C/.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution
poster

Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution

ACL 2024

+2Yubo MaYixin Cao
XINZE LI and 4 other authors

12 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved