Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

Zeng, Yi; Pan, Minzhou; Just, Hoang Anh; Lyu, Lingjuan; Qiu, Meikang; Jia, Ruoxi

Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

dc.contributor.author	Zeng, Yi	en
dc.contributor.author	Pan, Minzhou	en
dc.contributor.author	Just, Hoang Anh	en
dc.contributor.author	Lyu, Lingjuan	en
dc.contributor.author	Qiu, Meikang	en
dc.contributor.author	Jia, Ruoxi	en
dc.date.accessioned	2023-12-04T18:14:34Z	en
dc.date.available	2023-12-04T18:14:34Z	en
dc.date.issued	2023-11-15	en
dc.date.updated	2023-12-01T08:51:40Z	en
dc.description.abstract	Backdoor attacks introduce manipulated data into a machine learning model's training set, causing the model to misclassify inputs with a trigger during testing to achieve a desired outcome by the attacker. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as "clean-label attacks." The success of current clean-label backdoor methods largely depends on access to the complete training set. Yet, accessing the complete dataset is often challenging or unfeasible since it frequently comes from varied, independent sources, like images from distinct users. It remains a question of whether backdoor attacks still present real threats. In this paper, we provide an affirmative answer to this question by designing an algorithm to launch clean-label backdoor attacks using only samples from the target class and public out-of-distribution data. By inserting carefully crafted malicious examples totaling less than 0.5% of the target class size and 0.05% of the full training set size, we can manipulate the model to misclassify arbitrary inputs into the target class when they contain the backdoor trigger. Importantly, the trained poisoned model retains high accuracy for regular test samples without the trigger, as if the model is trained on untainted data. Our technique is consistently effective across various datasets, models, and even when the trigger is injected into the physical world. We explore the space of defenses and find that Narcissus can evade the latest state-of-the-art defenses in their vanilla form or after a simple adaptation. We analyze the effectiveness of our attack - the synthesized Narcissus trigger contains durable features as persistent as the original target class features. Attempts to remove the trigger inevitably hurt model accuracy first.	en
dc.description.version	Published version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1145/3576915.3616617	en
dc.identifier.uri	https://hdl.handle.net/10919/116733	en
dc.language.iso	en	en
dc.publisher	ACM	en
dc.relation.ispartof	CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.holder	The author(s)	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.title	Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3576915.3616617.pdf
Size:: 6.8 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Electrical and Computer Engineering