A Submodular Approach to Find Interpretable Directions in Text-to-Image Models

Allada, Ritika

A Submodular Approach to Find Interpretable Directions in Text-to-Image Models

dc.contributor.author	Allada, Ritika	en
dc.contributor.committeechair	Yanardag Delul, Pinar	en
dc.contributor.committeemember	Eldardiry, Hoda Mohamed	en
dc.contributor.committeemember	Thomas, Christopher Lee	en
dc.contributor.committeemember	North, Christopher L.	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2025-06-11T08:04:26Z	en
dc.date.available	2025-06-11T08:04:26Z	en
dc.date.issued	2025-06-10	en
dc.description.abstract	Text-to-image models have significantly improved the field of image editing. However, finding attributes that the model can actually edit is still a remaining challenge. This thesis proposes a solution to this problem by leveraging a multimodal vision-language model (MMVLM) to find a list of potential attributes that can be used to edit an image, using Flux and ControlNet to generate edits using those keywords, and then applying a submodular ranking method to find which edits actually work. The experiments in this paper demonstrate the robustness of this approach and its ability to produce high-quality edits across various domains, such as dresses and living rooms.	en
dc.description.abstractgeneral	In today's world, generative AI models are capable of editing images based on user-specified text prompts. However, finding attributes that the model can actually edit is a time-consuming process. This thesis proposes a solution to this problem by proposing a submodular ranking function that provides users with a list of top attributes that a model can actually edit a particular image with. Compared to existing editing methods, this method is able to find more meaningful attributes and produce high-quality edits across various domains, including fashion and interior design.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:44224	en
dc.identifier.uri	https://hdl.handle.net/10919/135475	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.subject	Diffusion Models	en
dc.subject	Interpretability	en
dc.subject	Image Editing	en
dc.subject	Explainable AI	en
dc.subject	Recommendation Systems	en
dc.title	A Submodular Approach to Find Interpretable Directions in Text-to-Image Models	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Allada_R_T_2025.pdf
Size:: 10.54 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses