Quantitative and Qualitative Analysis of Text-to-Image models

Masrourisaadat, Nila

Quantitative and Qualitative Analysis of Text-to-Image models

Files

Masrourisaadat_N_T_2023.pdf (4.48 MB)

Downloads: 601

Date

2023-08-30

Authors

Masrourisaadat, Nila

Publisher

Virginia Tech

Abstract

The field of image synthesis has seen significant progress recently, including great strides with generative models like Generative Adversarial Networks (GANs), Diffusion Models, and Transformers.

These models have shown they can create high-quality images from a variety of text prompts. However, a comprehensive analysis that examines both their performance and possible biases is often missing from existing research.

In this thesis, I undertake a thorough examination of several leading text-to-image models, namely Stable Diffusion, DALL-E Mini, Lafite, and Ernie-ViLG. I assess their performance in generating accurate images of human faces, groups, and specified numbers of objects, using both Frechet Inception Distance (FID) scores and R-precision as my evaluation metrics. Moreover, I uncover inherent gender or social biases these models may possess.

My research reveals a noticeable bias in these models, which show a tendency towards generating images of white males, thus under-representing minorities in their output of human faces. This finding contributes to the broader dialogue on ethics in AI and sets the stage for further research aimed at developing more equitable AI systems.

Furthermore, based on the metrics I used for evaluation, the Stable Diffusion model outperforms the others in generating images from text prompts. This information could be particularly useful for researchers and practitioners trying to choose the most effective model for their future projects.

To facilitate further research in this field, I have made my findings, the related data, and the source code publicly available.

Keywords

Text to Image, Deep Learning, Transformers, Bias Analysis, Quantitative Analysis, Qualitative Analysis, R-Precision, FID, DALL-E, LAFITE, Stable Diffusion, ERNIE

Persistent link

http://hdl.handle.net/10919/116173

Collections

Masters Theses

Full item page

Quantitative and Qualitative Analysis of Text-to-Image models

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections