EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Softmax self-attention often assigns disproportionate weight to semantically uninformative tokens such as punctuation and special tokens, a phenomenon known as attention noise. While recent methods like Cog Attention and the Differential Transformer have addressed this by introducing negative attention scores, they risk discarding useful information. In this paper, we propose the Integral Transformer, a novel self-attention mechanism that denoises attention by integrating signals sampled from the logit distribution. This approach mitigates noise while preserving the contributions of special tokens critical for model performance. Extensive experiments demonstrate that our model outperforms vanilla, Cog, and Differential attention variants on rigorous knowledge and reasoning benchmarks. Moreover, our analysis reveals that employing vanilla self-attention in the lower Transformer layers enhances performance and that the Integral Transformer more effectively balances attention distributions and reduces rank collapse in upper layers.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Large Language Models Do Multi-Label Classification Differently
poster

Large Language Models Do Multi-Label Classification Differently

EMNLP 2025

+3Georgios Chochlakis
Georgios Chochlakis and 5 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved