A Submodular Approach to Find Interpretable Directions in Text-to-Image Models

Allada, Ritika

A Submodular Approach to Find Interpretable Directions in Text-to-Image Models

Files

Allada_R_T_2025.pdf (10.54 MB)

Downloads:

Date

2025-06-10

Authors

Allada, Ritika

Publisher

Virginia Tech

Abstract

Text-to-image models have significantly improved the field of image editing. However, finding attributes that the model can actually edit is still a remaining challenge. This thesis proposes a solution to this problem by leveraging a multimodal vision-language model (MMVLM) to find a list of potential attributes that can be used to edit an image, using Flux and ControlNet to generate edits using those keywords, and then applying a submodular ranking method to find which edits actually work. The experiments in this paper demonstrate the robustness of this approach and its ability to produce high-quality edits across various domains, such as dresses and living rooms.

Keywords

Diffusion Models, Interpretability, Image Editing, Explainable AI, Recommendation Systems

Persistent link

https://hdl.handle.net/10919/135475

Collections

Masters Theses

Full item page

A Submodular Approach to Find Interpretable Directions in Text-to-Image Models

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections