A Submodular Approach to Find Interpretable Directions in Text-to-Image Models
dc.contributor.author | Allada, Ritika | en |
dc.contributor.committeechair | Yanardag Delul, Pinar | en |
dc.contributor.committeemember | Eldardiry, Hoda Mohamed | en |
dc.contributor.committeemember | Thomas, Christopher Lee | en |
dc.contributor.committeemember | North, Christopher L. | en |
dc.contributor.department | Computer Science and#38; Applications | en |
dc.date.accessioned | 2025-06-11T08:04:26Z | en |
dc.date.available | 2025-06-11T08:04:26Z | en |
dc.date.issued | 2025-06-10 | en |
dc.description.abstract | Text-to-image models have significantly improved the field of image editing. However, finding attributes that the model can actually edit is still a remaining challenge. This thesis proposes a solution to this problem by leveraging a multimodal vision-language model (MMVLM) to find a list of potential attributes that can be used to edit an image, using Flux and ControlNet to generate edits using those keywords, and then applying a submodular ranking method to find which edits actually work. The experiments in this paper demonstrate the robustness of this approach and its ability to produce high-quality edits across various domains, such as dresses and living rooms. | en |
dc.description.abstractgeneral | In today's world, generative AI models are capable of editing images based on user-specified text prompts. However, finding attributes that the model can actually edit is a time-consuming process. This thesis proposes a solution to this problem by proposing a submodular ranking function that provides users with a list of top attributes that a model can actually edit a particular image with. Compared to existing editing methods, this method is able to find more meaningful attributes and produce high-quality edits across various domains, including fashion and interior design. | en |
dc.description.degree | Master of Science | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:44224 | en |
dc.identifier.uri | https://hdl.handle.net/10919/135475 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | Creative Commons Attribution 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | en |
dc.subject | Diffusion Models | en |
dc.subject | Interpretability | en |
dc.subject | Image Editing | en |
dc.subject | Explainable AI | en |
dc.subject | Recommendation Systems | en |
dc.title | A Submodular Approach to Find Interpretable Directions in Text-to-Image Models | en |
dc.type | Thesis | en |
thesis.degree.discipline | Computer Science & Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1