Evolving Threats and Defenses in Machine Learning: Focus on Model Inversion and Beyond

Chen, Si

Evolving Threats and Defenses in Machine Learning: Focus on Model Inversion and Beyond

Files

Chen_S_D_2025.pdf (6.73 MB)

Downloads: 64

Date

2025-05-23

Authors

Chen, Si

Publisher

Virginia Tech

Abstract

Machine learning (ML) models are increasingly integrated into critical real-world applications, raising concerns about security, privacy, and trustworthiness. Among various emerging threats, model inversion (MI) attacks stand out due to their potential to compromise the confidentiality of training data. This dissertation investigates evolving threats in ML, centering on model inversion and its implications across image classification and natural language processing domains. Initially, we present an advanced model inversion attack algorithm leveraging knowledge-enriched distributional strategies under white-box conditions, effectively reconstructing private training data from image classifiers. To counteract such threats, we develop a novel data-centric defense approach, strategically utilizing augmentation techniques to reshape the model's loss landscape, thereby mitigating vulnerability to MI attacks. Recognizing the dual nature of threats and defenses, we further demonstrate how MI attacks, conventionally viewed as harmful, can be creatively repurposed to enhance model security. Specifically, we show MI can detect and neutralize backdoor attacks in image classification, enabling effective clean-data-free defense strategies. Broadening the scope beyond vision tasks, this dissertation introduces a proactive red-teaming framework designed for large language models (LLMs). By combining global strategy formation with local adaptive learning, our proposed red-teaming agent systematically identifies vulnerabilities, thus enhancing robustness against adaptive adversarial scenarios. Finally, addressing the critical issue of hallucination in language models, we propose FASTTRACK, a reliable fact-tracing framework. FASTTRACK uniquely integrates recursive clustering with large language model-driven validation, significantly surpassing existing methods in accuracy and computational efficiency. Collectively, these works illustrate a comprehensive narrative—from understanding foundational threats to innovating versatile, robust defenses—advancing the ongoing effort toward secure, privacy-preserving, and trustworthy machine learning systems.

Keywords

AI Safety

Persistent link

https://hdl.handle.net/10919/134204

Collections

Doctoral Dissertations

Full item page

Evolving Threats and Defenses in Machine Learning: Focus on Model Inversion and Beyond

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections