Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/13mw-rx23

poster

ACL 2024

August 12, 2024

Bangkok, Thailand

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending

keywords:

long context extrapolation

rope

llm

transformer

Self-attention and position embedding are two crucial modules in transformer-based Large Language Models (LLMs). However, the potential relationship between them is far from well studied, especially for long context window extending. In fact, anomalous behaviors that hinder long context extrapolation exist between Rotary Position Embedding (RoPE) and vanilla self-attention. Incorrect initial angles between $Q$ and $K$ can cause misestimation in modeling rotary position embedding of the closest tokens. To address this issue, we propose $\textbf{Co}$llinear $\textbf{C}$onstrained $\textbf{A}$ttention mechanism, namely CoCA. Specifically, we enforce a collinear constraint between $Q$ and $K$ to seamlessly integrate RoPE and self-attention. While only adding minimal computational and spatial complexity, this integration significantly enhances long context window extrapolation ability. We provide an optimized implementation, making it a drop-in replacement for any existing transformer-based models. Extensive experiments demonstrate that CoCA excels in extending context windows. A CoCA-based GPT model, trained with a context length of 512, can extend the context window up to 32K (60$\times$) without any fine-tuning. Additionally, incorporating CoCA into LLaMA-7B achieves extrapolation up to 32K within a training length of only 2K. Our code is publicly available at: https://github.com/codefuse-ai/Collinear-Constrained-Attention

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

DAPR: A Benchmark on Document-Aware Passage Retrieval
poster

DAPR: A Benchmark on Document-Aware Passage Retrieval

ACL 2024

Nils ReimersIryna GurevychKexin Wang
Kexin Wang and 2 other authors

12 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved