
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
Inference to the Best Explanation in Large Language Models
keywords:
natural language explanations
explanation evaluation
causal reasoning
reasoning
While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes \textit{IBE-Eval}, a framework inspired by philosophical accounts on \emph{Inference to the Best Explanation (IBE)} to advance the interpretation and evaluation of LLMs' explanations. \textit{IBE-Eval} estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: \emph{consistency}, \emph{parsimony}, \emph{coherence}, and \emph{uncertainty}. Extensive experiments are conducted on \emph{Causal Question Answering (CQA)}, where \textit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that \textit{IBE-Eval} can successfully identify the best explanation with up to 77\% accuracy ($\approx 27\%$ above random), improving upon a GPT 3.5-as-a-Judge baseline ($\approx+17\%$) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that \textit{IBE-Eval} is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.