Multimodal Foundation Models through the Lens of Security: Robust Deepfake Detection and Adversarial Resilience
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Generative AI plays a crucial role in processing and interpreting information, making its reliability more important than ever. Multimodal Foundation Models (MFM), which drive the latest innovations in generative AI, have a significant impact on our daily lives. These models can process multiple types of data, such as text, images, video as input and output, enabling seamless interaction across different modalities. Examples include Text-to-Image (T2I) generation models like DALL-E and Stable Diffusion, which create highly realistic images from simple text prompts. Additionally, Multimodal Large Language Models (MLLMs) such as LLaMA and ChatGPT integrate visual and textual data to generate informative responses. Vision Foundation Models (VFM) like OpenAI CLIP further enhance AI capabilities by efficiently encoding image and text data for tasks like zero-shot image classification and image understanding. However, studying the security of these models is crucial to safeguard their integrity and prevent potential misuse. MFMs are often exploited to generate highly realistic deepfake images and are also vulnerable to adversarial attacks that degrade their performance. These threats contribute to the spread of misinformation and the manipulation of AI systems, raising serious concerns about their security and reliability. This thesis explores robust detection methods for deepfakes and strategies to strengthen MFMs against deceptive manipulations, enhancing their security and trustworthiness. We investigate MFMs through the lens of security on the following 2 principal threats: (1) Understanding the threat from misuse of MFMs and developing methods for their mitigation. T2I models can generate highly convincing deepfake media which can be misused to spread misinformation. This raises concerns about the authenticity of digital content and the potential for large-scale manipulation. To address this, it is crucial to develop robust detection methods that can accurately identify and mitigate the risks posed by such synthetic media. (2) Attackers violating the integrity of MFMs. Adversarially perturbed images can significantly deteriorate the performance of MLLMs causing them to miscaption images and elicit toxic responses. Mitigating such adversarial threats is essential to ensure the optimal performance of these models and to maintain their reliability.
The following are my contributions to address the above 2 threat directions: (1) Assessing the real-world applicability of state-of-the-art (SOTA) deepfake defenses and developing robust detection methods. We evaluate the effectiveness of 8 SOTA deepfake image detectors against advances in MFM customization and semantically meaningful adversarial attacks. Our findings reveal that most defenses show significant degradation in performance in such an evolving threat landscape. We also identified key features and built defenses for highly generalized and robust deepfake detection. (2) Defend MFMs against perturbation-based adversarial attacks with advances in off-the-shelf Generative AI (GenAI) image translation models and their reasoning capabilities. Image perturbation-based adversarial attacks can severely degrade utility of MFMs by causing them to harm benign users. I have studied methods to leverage advances in GenAI image translation models to defend MFMs against such attacks with adversarial purification, and explored utilization of inference-time reasoning capabilities of MFMs in self-defending against such attacks.