AAAI 2026

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Automatic speech recognition (ASR) systems have achieved remarkable performance in common conditions but often struggle to leverage long-context information in contextualized scenarios that require domain-specific knowledge, such as conference presentations. This challenge arises primarily due to constrained model context windows and the sparsity of relevant information within extensive contextual noise. To solve this, we propose the SAP^2 method, a novel framework that dynamically prunes and integrates relevant contextual keywords in two stages. Specifically, each stage leverages our proposed Speech-Driven Attention-based Pooling mechanism, enabling efficient compression of context embeddings while preserving speech-salient information. Experimental results demonstrate state-of-the-art performance of SAP^2 on the SlideSpeech and LibriSpeech datasets, achieving word error rates (WER) of 7.71\% and 1.12\%, respectively. On SlideSpeech, our method notably reduces biased keyword error rates (B-WER) by 41.1\% compared to non-contextual baselines. SAP^2 also exhibits robust scalability, consistently maintaining performance under extensive contextual input conditions on both datasets.

Downloads

SlidesPaper

Next from AAAI 2026

Learning DFAs from Positive Examples Only via Word Counting
poster

Learning DFAs from Positive Examples Only via Word Counting

AAAI 2026

Benjamin Bordais and 1 other author

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved