A Submodular Approach to Find Interpretable Directions in Text-to-Image Models

dc.contributor.authorAllada, Ritikaen
dc.contributor.committeechairYanardag Delul, Pinaren
dc.contributor.committeememberEldardiry, Hoda Mohameden
dc.contributor.committeememberThomas, Christopher Leeen
dc.contributor.committeememberNorth, Christopher L.en
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2025-06-11T08:04:26Zen
dc.date.available2025-06-11T08:04:26Zen
dc.date.issued2025-06-10en
dc.description.abstractText-to-image models have significantly improved the field of image editing. However, finding attributes that the model can actually edit is still a remaining challenge. This thesis proposes a solution to this problem by leveraging a multimodal vision-language model (MMVLM) to find a list of potential attributes that can be used to edit an image, using Flux and ControlNet to generate edits using those keywords, and then applying a submodular ranking method to find which edits actually work. The experiments in this paper demonstrate the robustness of this approach and its ability to produce high-quality edits across various domains, such as dresses and living rooms.en
dc.description.abstractgeneralIn today's world, generative AI models are capable of editing images based on user-specified text prompts. However, finding attributes that the model can actually edit is a time-consuming process. This thesis proposes a solution to this problem by proposing a submodular ranking function that provides users with a list of top attributes that a model can actually edit a particular image with. Compared to existing editing methods, this method is able to find more meaningful attributes and produce high-quality edits across various domains, including fashion and interior design.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:44224en
dc.identifier.urihttps://hdl.handle.net/10919/135475en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectDiffusion Modelsen
dc.subjectInterpretabilityen
dc.subjectImage Editingen
dc.subjectExplainable AIen
dc.subjectRecommendation Systemsen
dc.titleA Submodular Approach to Find Interpretable Directions in Text-to-Image Modelsen
dc.typeThesisen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Name:
Allada_R_T_2025.pdf
Size:
10.54 MB
Format:
Adobe Portable Document Format

Collections