Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Recently LLMs have faced increasing demands to selectively remove specific information through Machine Unlearning. While evaluating unlearning effectiveness is crucial, exist- ing benchmarks suffer from fundamental limitations in audit dataset generation from unstructured corpora. We identify two critical challenges: ensuring audit adequacy and handling knowledge redundancy between forget and retain datasets. Current approaches rely on ad-hoc question generation from unstructured text, leading to unpredictable coverage gaps and evaluation blind spots. Knowledge redundancy between for-get and retain corpora further obscures evaluation, making it difficult to distinguish genuine unlearning failures from legitimately retained knowledge. To bring clarity to this chal-lenge, we propose LUCID, an automated framework that leverages knowledge graphs to achieve comprehensive audit dataset generation with fine-grained coverage and systematic redundancy elimination. By converting unstructured corpora into structured knowledge representations, it transforms the ad-hoc audit dataset generation process into a transparent and automated generation pipeline that ensures both adequacy and non-redundancy. Applying LUCID to the MUSE bench-mark, we generated over 69,000 and 111,000 audit cases for News and Books datasets respectively, identifying thousands of previously undetected knowledge memorization instances. Our analysis reveals that knowledge redundancy significantly skews metrics, artificially inflating ROUGE from 19.7% to 26.1% and Entailment Scores from 32.4% to 35.2%, highlight- ing the necessity of deduplication for accurate assessment.
