Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/dm38-hx67

technical paper

ACL 2024

August 12, 2024

Bangkok, Thailand

A Meta-Learning Perspective on Transformers for Causal Language Modeling

keywords:

meta-learning

transformers

language models

The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task, by explicating an inner optimization process that may happen within the Transformer. Further, from within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models. Our analysis is supported by experiments conducted on pre-trained large language models and real-world data.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
poster

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

ACL 2024

+1Liang DingChris Biemann
Xintong Wang and 3 other authors

12 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved