Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large Language Models (LLMs) can improve commonsense reasoning through generating intermediate knowledge. However, the effectiveness of this knowledge introspection is not always guaranteed. This paper first systematically investigates and reveals an introspection paradox: while simple introspection tends to benefit weaker models, it often degrades the performance of stronger ones, particularly on simpler tasks. Our deep analysis indicates that this paradox arises from a complex interplay among model capability, task difficulty and the quality of generated knowledge. Further interpretability analysis reveals the origins of low-quality knowledge generation. To better employ introspected knowledge in LLM, this paper proposes a training-free Adaptive Introspection Strategy that operates in two stages using only the model's internal states: Knowledge Detection, which dynamically identifies and discards potentially low-quality knowledge, and Knowledge Regeneration, which employs attention smoothing to guide the model away from harmful failure modes during knowledge generation. Extensive experiments on five Llama models with different sizes and eight commonsense reasoning benchmarks demonstrate that our approach effectively mitigates the limitations of standard introspection and has consistent performance gains across almost all settings.