Understanding and Mitigating Data-Centric Vulnerabilities in Modern AI Systems
dc.contributor.author | Zeng, Yi | en |
dc.contributor.committeechair | Jia, Ruoxi | en |
dc.contributor.committeemember | Ramakrishnan, Narendran | en |
dc.contributor.committeemember | Abbott, Amos L. | en |
dc.contributor.committeemember | Jin, Ming | en |
dc.contributor.committeemember | Li, Bo | en |
dc.contributor.department | Electrical and Computer Engineering | en |
dc.date.accessioned | 2025-04-19T08:00:20Z | en |
dc.date.available | 2025-04-19T08:00:20Z | en |
dc.date.issued | 2025-04-18 | en |
dc.description.abstract | Modern artificial intelligence (AI) systems, trained on vast internet-scale datasets, demonstrate remarkable performance and emergent capabilities. However, this reliance on large datasets that are expensive or difficult to quality-control exposes AI systems to critical vulnerabilities, including data poisoning, backdoor attacks, and subtle human-exploitation vectors. This thesis addresses these challenges through a comprehensive data-centric perspective on AI security. First, we examine backdoor attacks in the frequency domain, revealing that many triggers exhibit characteristic high-frequency artifacts that can be leveraged for detection while informing the design of more effective defenses. However, we also show that high-frequency signatures are not a necessary property for successful backdoor attacks, which motivates a deeper investigation into their fundamental mechanisms. Building on the insight that all effective backdoor attacks, regardless of design, divert models from their correct outputs, we formulate backdoor removal as a minimax optimization problem and develop I-BAU (Implicit Backdoor Adversarial Unlearning), an efficient algorithm that outperforms existing defenses across diverse attack settings. As AI systems evolve toward large foundation models, so too must our security approaches, leading us to extend our focus to safety backdoors in large language models, where we introduce BEEAR (Backdoor Embedding Entrapment and Adversarial Removal), which mitigates such vulnerabilities by identifying and counteracting universal embedding patterns associated with backdoor behavior. Beyond technical vulnerabilities like backdoor attacks and data poisoning, we discover that even safety-aligned models exhibit an emergent susceptibility to human persuasion techniques, prompting us to explore how social influence strategies can be weaponized to manipulate AI systems, developing a taxonomy of persuasion-based vulnerabilities that bridges technical security and human-computer interaction. Collectively, these contributions advance our understanding of data-centric security risks and provide practical mitigation strategies applicable across the AI development pipeline. By addressing both technical vulnerabilities and human-centered attack vectors, this work aims to facilitate the development of more robust and trustworthy AI systems suitable for deployment in critical applications. | en |
dc.description.abstractgeneral | Modern artificial intelligence systems achieve impressive results by learning from enormous amounts of internet data. However, this reliance on vast datasets—which are often difficult to monitor for quality—creates serious security vulnerabilities. This thesis examines these data-related security risks and develops practical solutions to address them. First, we analyze how attackers can secretly manipulate AI systems by hiding ``triggers'' and associated targeted behaviors of the model in training data. We discover that many of these hidden triggers leave distinctive patterns that can be detected through frequency analysis—similar to how different sound frequencies can be separated in audio processing. While this finding enables faster and easier detection methods, we also show that sophisticated attackers could design triggers without these telltale signs, highlighting the need for more advanced defenses. To address this challenge, we develop I-BAU, a new method that can effectively ``unlearn'' these unwanted association between the ``trigger'' patterns and attacker specified behaviors using only a small amount of trusted data, making AI systems significantly more resistant to manipulation. As AI technology advances toward more powerful language models like ChatGPT, we extend our security approaches with BEEAR, a technique that identifies and neutralizes hidden vulnerabilities in these systems by focusing on the internal representation patterns associated with harmful behaviors. Finally, we discover that even AI systems designed to be safe and helpful can be manipulated through persuasive communication techniques commonly used by humans. By studying how social influence strategies affect AI behavior, we develop a systematic framework for understanding and protecting against these human-centered vulnerabilities. Together, these contributions provide a comprehensive approach to AI security that addresses both technical weaknesses and human interaction risks. This research aims to help build more trustworthy AI systems that can be safely deployed in important applications across society. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:42825 | en |
dc.identifier.uri | https://hdl.handle.net/10919/125218 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | Creative Commons Attribution-ShareAlike 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ | en |
dc.subject | Data Poisoning | en |
dc.subject | Backdoor Attacks | en |
dc.subject | AI Safety | en |
dc.subject | AI Security | en |
dc.title | Understanding and Mitigating Data-Centric Vulnerabilities in Modern AI Systems | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Computer Engineering | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1