EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Despite growing interest in Theory of Mind (ToM) tasks for evaluating language models (LMs), little is known about how LMs internally represent mental states. Understanding these internal mechanisms is critical - not only to move beyond surface-level performance, but also for model alignment and safety, where subtle misattributions of mental states may go undetected in generated outputs. In this work, we present the first systematic investigation of belief representations in LMs by probing models across different scales, training regimens, and prompts - using control tasks to rule out confounds. Our experiments provide evidence that both model size and fine‑tuning substantially improve belief representations, which are structured - not mere by-products of higher dimensionality - yet brittle: even semantically neutral prompt variations can impair them. Crucially, we show that these representations can be strengthened: targeted edits to model activations can correct wrong ToM inferences.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
poster

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation

EMNLP 2025

+2Ziyang WangMohit Bansal
Mohit Bansal and 4 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved