Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

dc.contributor.authorZeng, Yien
dc.contributor.authorPan, Minzhouen
dc.contributor.authorJust, Hoang Anhen
dc.contributor.authorLyu, Lingjuanen
dc.contributor.authorQiu, Meikangen
dc.contributor.authorJia, Ruoxien
dc.date.accessioned2023-12-04T18:14:34Zen
dc.date.available2023-12-04T18:14:34Zen
dc.date.issued2023-11-15en
dc.date.updated2023-12-01T08:51:40Zen
dc.description.abstractBackdoor attacks introduce manipulated data into a machine learning model's training set, causing the model to misclassify inputs with a trigger during testing to achieve a desired outcome by the attacker. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as "clean-label attacks." The success of current clean-label backdoor methods largely depends on access to the complete training set. Yet, accessing the complete dataset is often challenging or unfeasible since it frequently comes from varied, independent sources, like images from distinct users. It remains a question of whether backdoor attacks still present real threats. In this paper, we provide an affirmative answer to this question by designing an algorithm to launch clean-label backdoor attacks using only samples from the target class and public out-of-distribution data. By inserting carefully crafted malicious examples totaling less than 0.5% of the target class size and 0.05% of the full training set size, we can manipulate the model to misclassify arbitrary inputs into the target class when they contain the backdoor trigger. Importantly, the trained poisoned model retains high accuracy for regular test samples without the trigger, as if the model is trained on untainted data. Our technique is consistently effective across various datasets, models, and even when the trigger is injected into the physical world. We explore the space of defenses and find that Narcissus can evade the latest state-of-the-art defenses in their vanilla form or after a simple adaptation. We analyze the effectiveness of our attack - the synthesized Narcissus trigger contains durable features as persistent as the original target class features. Attempts to remove the trigger inevitably hurt model accuracy first.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1145/3576915.3616617en
dc.identifier.urihttps://hdl.handle.net/10919/116733en
dc.language.isoenen
dc.publisherACMen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderThe author(s)en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleNarcissus: A Practical Clean-Label Backdoor Attack with Limited Informationen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3576915.3616617.pdf
Size:
6.8 MB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: