EMNLP 2025

November 07, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

To collaborate effectively with humans, language models must be able to explain their decisions in natural language. We study a specific type of self-explanation: self-generated counterfactual explanations (SCEs), where a model explains its own prediction by modifying the input such that it would have predicted a different outcome. We evaluate whether models can produce SCEs that are valid, achieving the intended outcome, and minimal, modifying the input no more than necessary. We find a trade-off. When simply asked to generate counterfactual explanations, models typically produce SCEs that are valid, but far from minimal, despite this being a well-established property of good counterfactuals. Worryingly, when explicitly instructed to provide minimal counterfactual explanations, the resulting SCEs typically fail to change the models' predictions. No model is able to reliably satisfy both criteria. We examine why models are unable to do this task, arguing they do not engage in self-modelling, the ability to internally predict how they would behave in alternative situations. We argue this is unlikely to be incentivised by standard training techniques and suggest that new learning objectives are required for LLMs to reliably explain themselves counterfactually. Our code is available in the anonymous repository: https://anonymous.4open.science/r/SCEs-3747/README.md.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Hallucination Detection in LLMs Using Spectral Features of Attention Maps
poster

Hallucination Detection in LLMs Using Spectral Features of Attention Maps

EMNLP 2025

+2
Jakub Binkowski and 4 other authors

07 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved