Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Automatic readability assessment plays a key role in ensuring effective communication between humans and language models. Despite significant progress, the field is hindered by inconsistent definitions of readability and measurements that rely on surface-level text properties. In this work, we investigate the factors shaping human perceptions of readability through the analysis of 1.2k judgments, finding that, beyond surface-level cues, information content and topic strongly shape text comprehensibility. Furthermore, we evaluate 15 popular readability metrics across 5 datasets, contrasting them with 5 more nuanced, model-based metrics. Our results show that four model-based metrics consistently place among the top 4 in rank correlations with human judgments, while the best performing traditional metric achieves an average rank of 7.8. These findings highlight a mismatch between current readability metrics and human perceptions, pointing to model-based approaches as a more promising direction.
