Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Certified defenses aim to provide provable robustness against attacks. Models certified against adversarial attacks provide $l_p$-bounded guarantees on the absence of adversarial manipulation of inputs for predicted labels. We study the potential for malicious exploitation of certification frameworks to better understand the limits of guarantee provisions. The objective is to not only mislead a classifier but also manipulate the certification process to generate robustness certificate guarantee for an adversarial input—certificate spoofing. A recent study in ICLR demonstrated crafting large perturbations can shift inputs far into regions cable of generating a certificate for an incorrect class. Our study investigates if perturbations needed to cause a misclassification and yet coax a certified model into issuing deceptive, large robustness radii can still be imperceptible. We explore the idea of region-focused adversarial examples to demonstrate imperceptible perturbations capable of spoofing certificates and achieving certification radii larger than the source class—ghost certificates. Extensive evaluations with ImageNet demonstrate the ability to effectively bypass state-of-the-art certified defenses. Our work raises new questions regarding the safe deployment of systems with certified defenses and current robustness verification methods, while underscoring the need to invest efforts to better understand and appreciate the robustness guarantees of certified models.