Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/5e2t-e830

poster

ACL 2024

August 14, 2024

Bangkok, Thailand

Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground

keywords:

discourse and pragmatics

language model evaluation

theory of mind

Evaluating the theory of mind (ToM) capabilities of language models (LMs) has recently received a great deal of attention. However, many existing benchmarks rely on synthetic data, which risks misaligning the resulting experiments with human behavior. We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM. We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

DBQR-QA: A Question Answering Dataset on a Hybrid of Database Querying and Reasoning
poster

DBQR-QA: A Question Answering Dataset on a Hybrid of Database Querying and Reasoning

ACL 2024

+2Ryutaro IchiseChung-Chi ChenRungsiman Nararatwong
Rungsiman Nararatwong and 4 other authors

14 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved