Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Language models often fail on idiomatic, figurative, or context-sensitive inputs - not due to generation errors, but due to misinterpretation during input processing. We propose an input-only, model-agnostic method for anticipating such failures using token-level likelihood features inspired by surprisal and the Uniform Information Density hypothesis. These features capture localised uncertainty in input comprehension and outperform standard baselines across five linguistically challenging datasets. We show that span-localised features improve error detection for larger models, while smaller models benefit from global patterns. Our method requires no access to outputs or activations, offering a lightweight and generalisable approach to pre-generation error prediction.