Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Human fact-checking is too slow to meet current demands, making automatic fact-checking system an essential alternative. Evaluating such systems is challenging as existing benchmark datasets either suffer from leakage or evidence incompleteness. This limits the realism of current evaluations. We present textbfPoliti-Fact-Only (PFO), a 5-class benchmark dataset of 2,982 political claims from politifact.com, where all post-claim analysis and annotator cues have been removed manually from evidence article. After filtration, evidence contains information available prior to the claim’s verification. By evaluating PFO, we see an average performance drop of textbf11.39\% in terms of macro-f1 compared to PFO's unfiltered version. Based on the identified challenges of the existing LLM-based fact-checking system, we propose textbfRAV (Recon-Answer-Verify), an agentic framework with three agents, it iteratively generates and answers sub-questions to verify different aspects of the claim before finally generating the label. Unlike prior literature, we worked on reducing the follow-up question complexity by leveraging two 2 types of structured questions, which either validate a fact or inquire about a fact. RAV generalizes across both domains and label granularities, outperforming state-of-the-art methods by textbf57.5\% on PFO textit(political, 5-class) and by textbf3.05\% on the widely used HOVER dataset textit(encyclopedic, 2-class).