Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

Wenbo Zhu

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99 Pay per view - $4.99 Access through your institution Login with Underline account

Need help?

Contact us

AAAI 2025

•

March 01, 2025

•

Philadelphia, United States

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

keywords:

ml

transparency

ethics

fairness

bias

privacy

Text-to-image diffusion models have achieved remarkable success in generating photorealistic images. However, the inclusion of sensitive information during pre-training poses significant risks, such as copyright infringement and privacy violations. Machine Unlearning (MU) offers a promising solution to eliminate sensitive concepts from these models. Despite its potential, existing MU methods face two main challenges: 1) limited generalization, where concept erasure is effective only within the unlearned set, failing to prevent sensitive concept generation from out-of-set prompts; and 2) utility degradation, where removing target concepts significantly impacts the model's overall performance. To address these issues, we propose a novel concept domain correction framework named DoCoPreG (Domain Correction and Preserving Gradient). By aligning the output domains of sensitive and anchor concepts through adversarial training, our approach ensures comprehensive unlearning of target concepts. Additionally, we introduce a concept-preserving gradient surgery technique that mitigates conflicting gradient components, thereby preserving the model's utility while unlearning specific concepts. Extensive experiments across various instances, styles, and offensive concepts demonstrate the effectiveness of our method in unlearning targeted concepts with minimal impact on related concepts, outperforming existing state-of-the-art approaches even for strongly related out-of-distribution prompts.