Quantitative and Qualitative Analysis of Text-to-Image models

Masrourisaadat, Nila

Quantitative and Qualitative Analysis of Text-to-Image models

dc.contributor.author	Masrourisaadat, Nila	en
dc.contributor.committeechair	Fox, Edward A.	en
dc.contributor.committeemember	Jones, Creed F. III	en
dc.contributor.committeemember	Lourentzou, Ismini	en
dc.contributor.department	Electrical and Computer Engineering	en
dc.date.accessioned	2023-08-31T08:00:41Z	en
dc.date.available	2023-08-31T08:00:41Z	en
dc.date.issued	2023-08-30	en
dc.description.abstract	The field of image synthesis has seen significant progress recently, including great strides with generative models like Generative Adversarial Networks (GANs), Diffusion Models, and Transformers. These models have shown they can create high-quality images from a variety of text prompts. However, a comprehensive analysis that examines both their performance and possible biases is often missing from existing research. In this thesis, I undertake a thorough examination of several leading text-to-image models, namely Stable Diffusion, DALL-E Mini, Lafite, and Ernie-ViLG. I assess their performance in generating accurate images of human faces, groups, and specified numbers of objects, using both Frechet Inception Distance (FID) scores and R-precision as my evaluation metrics. Moreover, I uncover inherent gender or social biases these models may possess. My research reveals a noticeable bias in these models, which show a tendency towards generating images of white males, thus under-representing minorities in their output of human faces. This finding contributes to the broader dialogue on ethics in AI and sets the stage for further research aimed at developing more equitable AI systems. Furthermore, based on the metrics I used for evaluation, the Stable Diffusion model outperforms the others in generating images from text prompts. This information could be particularly useful for researchers and practitioners trying to choose the most effective model for their future projects. To facilitate further research in this field, I have made my findings, the related data, and the source code publicly available.	en
dc.description.abstractgeneral	In my research, I explored how cutting-edge computer models, namely Stable Diffusion, DALL-E Mini, Lafite, and Ernie-ViLG, can create images from text descriptions, a process that holds exciting possibilities for the future. However, these technologies aren't without their challenges. An important finding from my study is that these models exhibit bias, e.g., they often generate images of white males more than they do of other races and genders. This suggests they're not representing our diverse society fairly. Among these models, Stable Diffusion outperforms the others at creating images from text prompts, which is valuable information for anyone choosing a model for their projects. To help others learn from my work and build upon it, I've made all my data, findings, and the code I used in this study publicly available. By sharing this work, I hope to contribute to improving this technology, making it even better and fairer for everyone in the future.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:38331	en
dc.identifier.uri	http://hdl.handle.net/10919/116173	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Text to Image	en
dc.subject	Deep Learning	en
dc.subject	Transformers	en
dc.subject	Bias Analysis	en
dc.subject	Quantitative Analysis	en
dc.subject	Qualitative Analysis	en
dc.subject	R-Precision	en
dc.subject	FID	en
dc.subject	DALL-E	en
dc.subject	LAFITE	en
dc.subject	Stable Diffusion	en
dc.subject	ERNIE	en
dc.title	Quantitative and Qualitative Analysis of Text-to-Image models	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Masrourisaadat_N_T_2023.pdf
Size:: 4.48 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses