Content not yet available

This lecture has no active video or poster.

AAAI 2026

January 24, 2026

Singapore, Singapore

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Recent advances in speech large language models(Speech LLMs) have led to significant progress in speech understanding tasks such as Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER). However, whether these models can achieve human-level auditory perception, particularly in terms of their ability to comprehend latent intentions and implicit emotions in real-world spoken language, remains underexplored.To this end, we introduce the Human-level Perception in Spoken Speech Understanding (HPSU), a pioneering benchmark for systematically evaluating the human-level perceptual and understanding capabilities of Speech LLMs. HPSU comprises 20k expert-validated English and Chinese spoken language understanding instances . It establishes a comprehensive evaluation framework by encompassing a spectrum of tasks, ranging from fundamental speaker attribute recognition to complex inference of latent intentions and implicit emotions.To address the challenges of data scarcity in real-world scenarios and the difficulty of fine-grained annotation, we developed an annotation pipeline that emulates human multimodal cognitive mechanisms. This process fuses audio, textual, and visual information to enable precise speech understanding and labeling, thus significantly enhancing both annotation efficiency and quality.Our systematic evaluation of various open-source and proprietary Speech LLMs demonstrates that even top-performing models still fall considerably short of human capabilities in understanding genuine spoken interactions. Consequently, HPSU will be instrumental in guiding the development of Speech LLMs toward human-level perception and cognition.

Downloads

Paper

Next from AAAI 2026

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models
poster

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

AAAI 2026

+3
Sourav Bhattacharya and 5 other authors

24 January 2026

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved