Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks. However, they remain vulnerable to semantic inconsistency, where minor, semantically equivalent variations in input formatting result in divergent predictions. Our comprehensive evaluation reveals that this brittleness persists even in state-of-the-art models such as GPT-4o, posing a serious challenge to their reliability. Through a mechanistic analysis, we attribute this phenomenon to deep representational failures, whereby semantic-equivalent input changes induce instability in the model’s internal representations. We further examine standard mitigation strategies and uncover their fundamental limitations. In particular, even direct fine-tuning on format variations frequently fails to yield format-invariant semantic representations, highlighting the difficulty of the problem. By explaining the failure of existing methods through our representational diagnosis, we underscore the need for representation-aware strategies to achieve robust and reliable LLM behavior.