
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
RORA: Robust Free-Text Rationale Evaluation
keywords:
free-text rationale
information theory
interpretability
Free-text rationales play a pivotal role in explainable NLP, bridging the knowledge and reasoning gaps behind a model's decision-making. However, due to the diversity of potential reasoning paths and a corresponding lack of definitive ground truth, their evaluation remains a challenge. Existing metrics rely on the degree to which a rationale \emph{supports} a target label, but we find these fall short in evaluating rationales that inadvertently \emph{leak the label}. To address this problem, we propose RORA, a \underline{RO}bust free-text \underline{RA}tionale evaluation against label leakage. RORA quantifies the new information supplied by a rationale to justify the label. This is achieved by assessing the conditional $\mathcal{V}$-information (Hewitt et al., 2021) with a predictive family robust against leaky features that can be exploited by a small model. RORA consistently outperforms existing approaches in evaluating human-written, synthetic, or model-generated rationales, particularly demonstrating robustness against label leakage. We also show that RORA aligns well with human judgment, providing a more reliable and accurate measurement across diverse free-text rationales.