EMNLP 2025

November 05, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

We investigate whether internal activations in language models can be used to detect arithmetic errors. Starting with a controlled setting of 3-digit addition, we show that simple probes can accurately decode both the model’s predicted output and the correct answer from hidden states, regardless of whether the model’s output is correct. Building on this, we train lightweight error detectors that predict model correctness with over 90% accuracy. We then extend our analysis to multi-step arithmetic reasoning in the GSM8K dataset and find that probes trained on simple arithmetic generalize well to this more complex setting, maintaining high accuracy and revealing consistent internal representations. Finally, we demonstrate that these probes can guide selective re-prompting of erroneous reasoning steps, improving task accuracy with minimal disruption to correct outputs. Our findings suggest that arithmetic errors can be anticipated from internal activations alone, and that simple probes offer a viable path toward lightweight model self-correction.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Mining the Past with Dual Criteria: Integrating Three types of Historical Information for Context-aware Event Forecasting
poster

Mining the Past with Dual Criteria: Integrating Three types of Historical Information for Context-aware Event Forecasting

EMNLP 2025

+6Rui Dong
Ahtamjan Ahmat and 8 other authors

05 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2026 Underline - All rights reserved