Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/bafp-hq88

poster

ACL 2024

August 12, 2024

Bangkok, Thailand

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

keywords:

large multimodal model

generation model

autoregressive model

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the seamless integration of new modalities into LLMs, akin to the incorporation of new languages.We build a multimodal text-centric dataset for multimodal alignment pre-training. Utilizing generative models, we synthesize the first large-scale any-to-any multimodal instruction dataset. It consists of 108k samples of multi-turn conversations that intricately interweave various modalities, thus equipping the model to handle arbitrary combinations of multimodal inputs and outputs.Experimental results demonstrate that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities, proving that discrete representations can effectively and conveniently unify multiple modalities within a language model. Demos are shown in \href{https://junzhan2000.github.io/AnyGPT.github.io/}{https://junzhan2000.github.io/AnyGPT.github.io/}.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
poster

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

ACL 2024

+2Yonatan Belinkov
Michael Toker and 4 other authors

12 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved