Multimodal Foundation Models through the Lens of Security: Robust Deepfake Detection and Adversarial Resilience

Abdullah, Sifat Muhammad

Multimodal Foundation Models through the Lens of Security: Robust Deepfake Detection and Adversarial Resilience

dc.contributor.author	Abdullah, Sifat Muhammad	en
dc.contributor.committeechair	Viswanath, Bimal	en
dc.contributor.committeemember	Gao, Peng	en
dc.contributor.committeemember	Jadliwala, Murtuza	en
dc.contributor.committeemember	Chung, Taejoong Tijay	en
dc.contributor.committeemember	Yao, Danfeng	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2025-12-18T09:00:25Z	en
dc.date.available	2025-12-18T09:00:25Z	en
dc.date.issued	2025-12-17	en
dc.description.abstract	Generative AI plays a crucial role in processing and interpreting information, making its reliability more important than ever. Multimodal Foundation Models (MFM), which drive the latest innovations in generative AI, have a significant impact on our daily lives. These models can process multiple types of data, such as text, images, video as input and output, enabling seamless interaction across different modalities. Examples include Text-to-Image (T2I) generation models like DALL-E and Stable Diffusion, which create highly realistic images from simple text prompts. Additionally, Multimodal Large Language Models (MLLMs) such as LLaMA and ChatGPT integrate visual and textual data to generate informative responses. Vision Foundation Models (VFM) like OpenAI CLIP further enhance AI capabilities by efficiently encoding image and text data for tasks like zero-shot image classification and image understanding. However, studying the security of these models is crucial to safeguard their integrity and prevent potential misuse. MFMs are often exploited to generate highly realistic deepfake images and are also vulnerable to adversarial attacks that degrade their performance. These threats contribute to the spread of misinformation and the manipulation of AI systems, raising serious concerns about their security and reliability. This thesis explores robust detection methods for deepfakes and strategies to strengthen MFMs against deceptive manipulations, enhancing their security and trustworthiness. We investigate MFMs through the lens of security on the following 2 principal threats: (1) Understanding the threat from misuse of MFMs and developing methods for their mitigation. T2I models can generate highly convincing deepfake media which can be misused to spread misinformation. This raises concerns about the authenticity of digital content and the potential for large-scale manipulation. To address this, it is crucial to develop robust detection methods that can accurately identify and mitigate the risks posed by such synthetic media. (2) Attackers violating the integrity of MFMs. Adversarially perturbed images can significantly deteriorate the performance of MLLMs causing them to miscaption images and elicit toxic responses. Mitigating such adversarial threats is essential to ensure the optimal performance of these models and to maintain their reliability. The following are my contributions to address the above 2 threat directions: (1) Assessing the real-world applicability of state-of-the-art (SOTA) deepfake defenses and developing robust detection methods. We evaluate the effectiveness of 8 SOTA deepfake image detectors against advances in MFM customization and semantically meaningful adversarial attacks. Our findings reveal that most defenses show significant degradation in performance in such an evolving threat landscape. We also identified key features and built defenses for highly generalized and robust deepfake detection. (2) Defend MFMs against perturbation-based adversarial attacks with advances in off-the-shelf Generative AI (GenAI) image translation models and their reasoning capabilities. Image perturbation-based adversarial attacks can severely degrade utility of MFMs by causing them to harm benign users. I have studied methods to leverage advances in GenAI image translation models to defend MFMs against such attacks with adversarial purification, and explored utilization of inference-time reasoning capabilities of MFMs in self-defending against such attacks.	en
dc.description.abstractgeneral	Generative AI (GenAI) is increasingly shaping how we create and interact with information, making reliability and safety critical as these systems grow more capable. Much of this progress is driven by Multimodal foundation models that jointly process text, images, and video. Examples include text-to-image systems like DALL-E and Stable Diffusion, Multimodal LLMs such as ChatGPT and LLaMA, and models like CLIP that align vision and language for tasks such as image-text matching and captioning. Alongside their benefits, these models introduce important security concerns. They can be misused to create highly realistic deepfake content, and they can be fooled by small and intentional changes to images, which can cause them to make incorrect or even harmful decisions. These weaknesses can lead to misinformation, deception, and the manipulation of AI systems. This motivates a deeper study of their security and trustworthiness. This thesis examines two main risks. The first is the misuse of generative models for the creation of deepfake content. To address this, we study the limits of current detection tools and develop new strategies that are more reliable across real scenarios. The second risk is the loss of model integrity when an attacker introduces subtle changes to an image to mislead a model. These altered images can cause multimodal models to describe a scene incorrectly or produce unsafe \and toxic responses. We study new ways to defend against these attacks by using modern image-to-image translation tools to clean adversarial inputs and by using the reasoning capabilities of these models to help them protect themselves.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:45108	en
dc.identifier.uri	https://hdl.handle.net/10919/140018	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	Creative Commons Attribution-NonCommercial 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	en
dc.subject	FoundationModels	en
dc.subject	Security	en
dc.subject	Multimodal	en
dc.subject	AI	en
dc.subject	ML	en
dc.title	Multimodal Foundation Models through the Lens of Security: Robust Deepfake Detection and Adversarial Resilience	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Abdullah_S_D_2025.pdf
Size:: 23.38 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations