Understanding and Mitigating Data-Centric Vulnerabilities in Modern AI Systems

Zeng, Yi

Understanding and Mitigating Data-Centric Vulnerabilities in Modern AI Systems

dc.contributor.author	Zeng, Yi	en
dc.contributor.committeechair	Jia, Ruoxi	en
dc.contributor.committeemember	Ramakrishnan, Narendran	en
dc.contributor.committeemember	Abbott, Amos L.	en
dc.contributor.committeemember	Jin, Ming	en
dc.contributor.committeemember	Li, Bo	en
dc.contributor.department	Electrical and Computer Engineering	en
dc.date.accessioned	2025-04-19T08:00:20Z	en
dc.date.available	2025-04-19T08:00:20Z	en
dc.date.issued	2025-04-18	en
dc.description.abstract	Modern artificial intelligence (AI) systems, trained on vast internet-scale datasets, demonstrate remarkable performance and emergent capabilities. However, this reliance on large datasets that are expensive or difficult to quality-control exposes AI systems to critical vulnerabilities, including data poisoning, backdoor attacks, and subtle human-exploitation vectors. This thesis addresses these challenges through a comprehensive data-centric perspective on AI security. First, we examine backdoor attacks in the frequency domain, revealing that many triggers exhibit characteristic high-frequency artifacts that can be leveraged for detection while informing the design of more effective defenses. However, we also show that high-frequency signatures are not a necessary property for successful backdoor attacks, which motivates a deeper investigation into their fundamental mechanisms. Building on the insight that all effective backdoor attacks, regardless of design, divert models from their correct outputs, we formulate backdoor removal as a minimax optimization problem and develop I-BAU (Implicit Backdoor Adversarial Unlearning), an efficient algorithm that outperforms existing defenses across diverse attack settings. As AI systems evolve toward large foundation models, so too must our security approaches, leading us to extend our focus to safety backdoors in large language models, where we introduce BEEAR (Backdoor Embedding Entrapment and Adversarial Removal), which mitigates such vulnerabilities by identifying and counteracting universal embedding patterns associated with backdoor behavior. Beyond technical vulnerabilities like backdoor attacks and data poisoning, we discover that even safety-aligned models exhibit an emergent susceptibility to human persuasion techniques, prompting us to explore how social influence strategies can be weaponized to manipulate AI systems, developing a taxonomy of persuasion-based vulnerabilities that bridges technical security and human-computer interaction. Collectively, these contributions advance our understanding of data-centric security risks and provide practical mitigation strategies applicable across the AI development pipeline. By addressing both technical vulnerabilities and human-centered attack vectors, this work aims to facilitate the development of more robust and trustworthy AI systems suitable for deployment in critical applications.	en
dc.description.abstractgeneral	Modern artificial intelligence systems achieve impressive results by learning from enormous amounts of internet data. However, this reliance on vast datasets—which are often difficult to monitor for quality—creates serious security vulnerabilities. This thesis examines these data-related security risks and develops practical solutions to address them. First, we analyze how attackers can secretly manipulate AI systems by hiding ``triggers'' and associated targeted behaviors of the model in training data. We discover that many of these hidden triggers leave distinctive patterns that can be detected through frequency analysis—similar to how different sound frequencies can be separated in audio processing. While this finding enables faster and easier detection methods, we also show that sophisticated attackers could design triggers without these telltale signs, highlighting the need for more advanced defenses. To address this challenge, we develop I-BAU, a new method that can effectively ``unlearn'' these unwanted association between the ``trigger'' patterns and attacker specified behaviors using only a small amount of trusted data, making AI systems significantly more resistant to manipulation. As AI technology advances toward more powerful language models like ChatGPT, we extend our security approaches with BEEAR, a technique that identifies and neutralizes hidden vulnerabilities in these systems by focusing on the internal representation patterns associated with harmful behaviors. Finally, we discover that even AI systems designed to be safe and helpful can be manipulated through persuasive communication techniques commonly used by humans. By studying how social influence strategies affect AI behavior, we develop a systematic framework for understanding and protecting against these human-centered vulnerabilities. Together, these contributions provide a comprehensive approach to AI security that addresses both technical weaknesses and human interaction risks. This research aims to help build more trustworthy AI systems that can be safely deployed in important applications across society.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:42825	en
dc.identifier.uri	https://hdl.handle.net/10919/125218	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	Creative Commons Attribution-ShareAlike 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/	en
dc.subject	Data Poisoning	en
dc.subject	Backdoor Attacks	en
dc.subject	AI Safety	en
dc.subject	AI Security	en
dc.title	Understanding and Mitigating Data-Centric Vulnerabilities in Modern AI Systems	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zeng_Y_D_2025.pdf
Size:: 14.48 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations