EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

We propose a novel framework, Continuous-Time Attention, which infuses partial differential equations (PDEs) into the Transformer’s attention mechanism to address the challenges of extremely long input sequences. Instead of relying solely on a static attention matrix, we allow attention weights to evolve over a pseudo-time dimension via diffusion, wave, or reaction-diffusion dynamics. This mechanism systematically smooths local noise, enhances long-range dependencies, and stabilizes gradient flow. Theoretically, our analysis shows that PDE-based attention leads to better optimization landscapes and polynomial rather than exponential decay of distant interactions. Empirically, we benchmark our method on diverse experiments—demonstrating consistent gains over both standard and specialized long-sequence Transformer variants. Our findings highlight the potential of PDE-based formulations to enrich attention mechanisms with continuous-time dynamics and global coherence.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
poster

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon

EMNLP 2025

+2Seffi Cohen
Seffi Cohen and 4 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved