EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

The softmax function is crucial in Transformer attention, which normalizes each row of the attention scores with summation to one. Usually, tokens with larger attention scores are important for the final prediction. However, the softmax function can face a gradient vanishing issue for such important tokens (e.g., probabilities close to one), leading to optimization difficulties for the important tokens so that the performance may not be better. In this paper, we propose Self-Adjust Softmax (SA-Softmax) to address this issue by modifying softmax(z) to z cdot softmax(z) and its normalized variant frac(z - min(zmin,0))max(0,zmax)-min(zmin,0) cdot softmax(z). We theoretically show that SA-Softmax provides enhanced gradient properties compared to the vanilla softmax function. Moreover, \methodShortName Attention can be seamlessly integrated into existing Transformer models to their attention mechanisms with minor adjustments. We conducted experiments to evaluate the empirical performance of Transformer models using \methodShortName compared to the vanilla softmax function. These experiments, involving models with up to 2.7 billion parameters, are conducted across diverse datasets, language tasks, and positional encoding methods.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Probing for Arithmetic Errors in Language Models
technical paper

Probing for Arithmetic Errors in Language Models

EMNLP 2025

Alessandro StolfoMrinmaya Sachan
Mrinmaya Sachan and 2 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved