Multimodal Foundation Models through the Lens of Security: Robust Deepfake Detection and Adversarial Resilience

dc.contributor.authorAbdullah, Sifat Muhammaden
dc.contributor.committeechairViswanath, Bimalen
dc.contributor.committeememberGao, Pengen
dc.contributor.committeememberJadliwala, Murtuzaen
dc.contributor.committeememberChung, Taejoong Tijayen
dc.contributor.committeememberYao, Danfengen
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2025-12-18T09:00:25Zen
dc.date.available2025-12-18T09:00:25Zen
dc.date.issued2025-12-17en
dc.description.abstractGenerative AI plays a crucial role in processing and interpreting information, making its reliability more important than ever. Multimodal Foundation Models (MFM), which drive the latest innovations in generative AI, have a significant impact on our daily lives. These models can process multiple types of data, such as text, images, video as input and output, enabling seamless interaction across different modalities. Examples include Text-to-Image (T2I) generation models like DALL-E and Stable Diffusion, which create highly realistic images from simple text prompts. Additionally, Multimodal Large Language Models (MLLMs) such as LLaMA and ChatGPT integrate visual and textual data to generate informative responses. Vision Foundation Models (VFM) like OpenAI CLIP further enhance AI capabilities by efficiently encoding image and text data for tasks like zero-shot image classification and image understanding. However, studying the security of these models is crucial to safeguard their integrity and prevent potential misuse. MFMs are often exploited to generate highly realistic deepfake images and are also vulnerable to adversarial attacks that degrade their performance. These threats contribute to the spread of misinformation and the manipulation of AI systems, raising serious concerns about their security and reliability. This thesis explores robust detection methods for deepfakes and strategies to strengthen MFMs against deceptive manipulations, enhancing their security and trustworthiness. We investigate MFMs through the lens of security on the following 2 principal threats: (1) Understanding the threat from misuse of MFMs and developing methods for their mitigation. T2I models can generate highly convincing deepfake media which can be misused to spread misinformation. This raises concerns about the authenticity of digital content and the potential for large-scale manipulation. To address this, it is crucial to develop robust detection methods that can accurately identify and mitigate the risks posed by such synthetic media. (2) Attackers violating the integrity of MFMs. Adversarially perturbed images can significantly deteriorate the performance of MLLMs causing them to miscaption images and elicit toxic responses. Mitigating such adversarial threats is essential to ensure the optimal performance of these models and to maintain their reliability. The following are my contributions to address the above 2 threat directions: (1) Assessing the real-world applicability of state-of-the-art (SOTA) deepfake defenses and developing robust detection methods. We evaluate the effectiveness of 8 SOTA deepfake image detectors against advances in MFM customization and semantically meaningful adversarial attacks. Our findings reveal that most defenses show significant degradation in performance in such an evolving threat landscape. We also identified key features and built defenses for highly generalized and robust deepfake detection. (2) Defend MFMs against perturbation-based adversarial attacks with advances in off-the-shelf Generative AI (GenAI) image translation models and their reasoning capabilities. Image perturbation-based adversarial attacks can severely degrade utility of MFMs by causing them to harm benign users. I have studied methods to leverage advances in GenAI image translation models to defend MFMs against such attacks with adversarial purification, and explored utilization of inference-time reasoning capabilities of MFMs in self-defending against such attacks.en
dc.description.abstractgeneralGenerative AI (GenAI) is increasingly shaping how we create and interact with information, making reliability and safety critical as these systems grow more capable. Much of this progress is driven by Multimodal foundation models that jointly process text, images, and video. Examples include text-to-image systems like DALL-E and Stable Diffusion, Multimodal LLMs such as ChatGPT and LLaMA, and models like CLIP that align vision and language for tasks such as image-text matching and captioning. Alongside their benefits, these models introduce important security concerns. They can be misused to create highly realistic deepfake content, and they can be fooled by small and intentional changes to images, which can cause them to make incorrect or even harmful decisions. These weaknesses can lead to misinformation, deception, and the manipulation of AI systems. This motivates a deeper study of their security and trustworthiness. This thesis examines two main risks. The first is the misuse of generative models for the creation of deepfake content. To address this, we study the limits of current detection tools and develop new strategies that are more reliable across real scenarios. The second risk is the loss of model integrity when an attacker introduces subtle changes to an image to mislead a model. These altered images can cause multimodal models to describe a scene incorrectly or produce unsafe \and toxic responses. We study new ways to defend against these attacks by using modern image-to-image translation tools to clean adversarial inputs and by using the reasoning capabilities of these models to help them protect themselves.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:45108en
dc.identifier.urihttps://hdl.handle.net/10919/140018en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en
dc.subjectFoundationModelsen
dc.subjectSecurityen
dc.subjectMultimodalen
dc.subjectAIen
dc.subjectMLen
dc.titleMultimodal Foundation Models through the Lens of Security: Robust Deepfake Detection and Adversarial Resilienceen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Abdullah_S_D_2025.pdf
Size:
23.38 MB
Format:
Adobe Portable Document Format