EMNLP 2025

November 06, 2025

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Wikipedia is the largest open knowledge corpus, widely used worldwide and serving as a key resource for training large language models (LLMs) and retrieval-augmented generation (RAG) systems. Ensuring its accuracy is therefore critical. But how accurate is Wikipedia? In this paper, we focus on inconsistencies, a specific type of factual inaccuracy. We introduce the task of corpus-level inconsistency detection and present WikiCollide, a human-annotated dataset for this task. We also propose CLAIRE, an agent-based system combining an LLM with information retrieval to effectively identify inconsistencies, which outperforms strong LLM baselines by 2.1% in terms of AUROC on our dataset. Based on our findings, we estimate that at least 79.9 million facts (approximately 3.3%) in the English Wikipedia contradict at least one other fact within the corpus (99% confidence interval: 37.6 million to 121.9 million). We further show that these inconsistencies propagate into widely-used NLP datasets, affecting gold labels in at least 7.3% of examples in the fact-verification dataset FEVEROUS and 4.0% in the question-answering dataset AmbigQA. In a user study with experienced Wikipedia editors, 87.5% of participants reported increased confidence in identifying inconsistencies when using CLAIRE, discovering on average 64.7% more inconsistencies in the same amount of time. Our results demonstrate that LLM-based tools can effectively assist humans in detecting inconsistencies in large-scale corpora.

Downloads

SlidesPaperTranscript English (automatic)

Next from EMNLP 2025

Retrieving Support to Rank Answers in Open-Domain Question Answering
poster

Retrieving Support to Rank Answers in Open-Domain Question Answering

EMNLP 2025

Zeyu ZhangThuy VuAlessandro Moschitti
Alessandro Moschitti and 2 other authors

06 November 2025

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved