Zeng, Yi2025-04-192025-04-192025-04-18vt_gsexam:42825https://hdl.handle.net/10919/125218Modern artificial intelligence (AI) systems, trained on vast internet-scale datasets, demonstrate remarkable performance and emergent capabilities. However, this reliance on large datasets that are expensive or difficult to quality-control exposes AI systems to critical vulnerabilities, including data poisoning, backdoor attacks, and subtle human-exploitation vectors. This thesis addresses these challenges through a comprehensive data-centric perspective on AI security. First, we examine backdoor attacks in the frequency domain, revealing that many triggers exhibit characteristic high-frequency artifacts that can be leveraged for detection while informing the design of more effective defenses. However, we also show that high-frequency signatures are not a necessary property for successful backdoor attacks, which motivates a deeper investigation into their fundamental mechanisms. Building on the insight that all effective backdoor attacks, regardless of design, divert models from their correct outputs, we formulate backdoor removal as a minimax optimization problem and develop I-BAU (Implicit Backdoor Adversarial Unlearning), an efficient algorithm that outperforms existing defenses across diverse attack settings. As AI systems evolve toward large foundation models, so too must our security approaches, leading us to extend our focus to safety backdoors in large language models, where we introduce BEEAR (Backdoor Embedding Entrapment and Adversarial Removal), which mitigates such vulnerabilities by identifying and counteracting universal embedding patterns associated with backdoor behavior. Beyond technical vulnerabilities like backdoor attacks and data poisoning, we discover that even safety-aligned models exhibit an emergent susceptibility to human persuasion techniques, prompting us to explore how social influence strategies can be weaponized to manipulate AI systems, developing a taxonomy of persuasion-based vulnerabilities that bridges technical security and human-computer interaction. Collectively, these contributions advance our understanding of data-centric security risks and provide practical mitigation strategies applicable across the AI development pipeline. By addressing both technical vulnerabilities and human-centered attack vectors, this work aims to facilitate the development of more robust and trustworthy AI systems suitable for deployment in critical applications.ETDenCreative Commons Attribution-ShareAlike 4.0 InternationalData PoisoningBackdoor AttacksAI SafetyAI SecurityUnderstanding and Mitigating Data-Centric Vulnerabilities in Modern AI SystemsDissertation