Evolving Threats and Defenses in Machine Learning: Focus on Model Inversion and Beyond
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Machine learning (ML) models are increasingly integrated into critical real-world applications, raising concerns about security, privacy, and trustworthiness. Among various emerging threats, model inversion (MI) attacks stand out due to their potential to compromise the confidentiality of training data. This dissertation investigates evolving threats in ML, centering on model inversion and its implications across image classification and natural language processing domains. Initially, we present an advanced model inversion attack algorithm leveraging knowledge-enriched distributional strategies under white-box conditions, effectively reconstructing private training data from image classifiers. To counteract such threats, we develop a novel data-centric defense approach, strategically utilizing augmentation techniques to reshape the model's loss landscape, thereby mitigating vulnerability to MI attacks. Recognizing the dual nature of threats and defenses, we further demonstrate how MI attacks, conventionally viewed as harmful, can be creatively repurposed to enhance model security. Specifically, we show MI can detect and neutralize backdoor attacks in image classification, enabling effective clean-data-free defense strategies. Broadening the scope beyond vision tasks, this dissertation introduces a proactive red-teaming framework designed for large language models (LLMs). By combining global strategy formation with local adaptive learning, our proposed red-teaming agent systematically identifies vulnerabilities, thus enhancing robustness against adaptive adversarial scenarios. Finally, addressing the critical issue of hallucination in language models, we propose FASTTRACK, a reliable fact-tracing framework. FASTTRACK uniquely integrates recursive clustering with large language model-driven validation, significantly surpassing existing methods in accuracy and computational efficiency. Collectively, these works illustrate a comprehensive narrative—from understanding foundational threats to innovating versatile, robust defenses—advancing the ongoing effort toward secure, privacy-preserving, and trustworthy machine learning systems.