
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
keywords:
long-sequence processing
transformers
reinforcement learning
Although dominant in natural language processing, transformer-based models still struggle with long-sequence processing, due to the computational costs of their self-attention operations, which increase exponentially as the length of the input sequence grows. To address this challenge, we propose a Simple framework to enhance the long-content processing of off-the-shelf pre-trained transformers via three steps: Chunk, Align, and Select (SimCAS). More specifically, we first divide each long-sequence input into a batch of chunks, then align the inter-chunk information during the encoding steps, and finally, select the most representative hidden states from the encoder for the decoding process. With our SimCAS, the computation and memory costs can be reduced to linear complexity. In experiments, we demonstrate the effectiveness of the proposed method on various real-world long-text summarization and reading comprehension tasks, in which SimCAS significantly outperforms prior long-sequence processing baselines. The code is at https://github.com/xjw-nlp/SimCAS.