Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Characterizing the expressive power of the Transformer architecture is critical to understanding its capacity limits and scaling law. Recent works provide the circuit complexity bounds to Transformer-like architecture. On the other hand, Rotary Position Embedding (mathsfRoPE) has emerged as a crucial technique in modern large language models, offering superior performance in capturing positional information compared to traditional position embeddings, which shows great potential in application prospects, particularly for the long context scenario. Empirical evidence also suggests that mathsfRoPE-based Transformer architectures demonstrate greater generalization capabilities compared to conventional Transformer models. In this work, we establish a circuit complexity bound for Transformers with mathsfRoPE attention. Our key contribution is that we show that unless mathsfTC⁰ = mathsfNC¹, a mathsfRoPE-based Transformer with poly(n)-precision, O(1) layers, hidden dimension d leq O(n) cannot solve the arithmetic formula evaluation problem or the Boolean formula value problem. This result significantly demonstrates the fundamental limitation of the expressivity of mathsfRoPE-based Transformers, although they achieve giant empirical success. Our theoretical result not only establishes the complexity bound but also gives insights into designing novel Transformer architectures.