Quantitative and Qualitative Analysis of Text-to-Image models

dc.contributor.authorMasrourisaadat, Nilaen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberJones, Creed F. IIIen
dc.contributor.committeememberLourentzou, Isminien
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2023-08-31T08:00:41Zen
dc.date.available2023-08-31T08:00:41Zen
dc.date.issued2023-08-30en
dc.description.abstractThe field of image synthesis has seen significant progress recently, including great strides with generative models like Generative Adversarial Networks (GANs), Diffusion Models, and Transformers. These models have shown they can create high-quality images from a variety of text prompts. However, a comprehensive analysis that examines both their performance and possible biases is often missing from existing research. In this thesis, I undertake a thorough examination of several leading text-to-image models, namely Stable Diffusion, DALL-E Mini, Lafite, and Ernie-ViLG. I assess their performance in generating accurate images of human faces, groups, and specified numbers of objects, using both Frechet Inception Distance (FID) scores and R-precision as my evaluation metrics. Moreover, I uncover inherent gender or social biases these models may possess. My research reveals a noticeable bias in these models, which show a tendency towards generating images of white males, thus under-representing minorities in their output of human faces. This finding contributes to the broader dialogue on ethics in AI and sets the stage for further research aimed at developing more equitable AI systems. Furthermore, based on the metrics I used for evaluation, the Stable Diffusion model outperforms the others in generating images from text prompts. This information could be particularly useful for researchers and practitioners trying to choose the most effective model for their future projects. To facilitate further research in this field, I have made my findings, the related data, and the source code publicly available.en
dc.description.abstractgeneralIn my research, I explored how cutting-edge computer models, namely Stable Diffusion, DALL-E Mini, Lafite, and Ernie-ViLG, can create images from text descriptions, a process that holds exciting possibilities for the future. However, these technologies aren't without their challenges. An important finding from my study is that these models exhibit bias, e.g., they often generate images of white males more than they do of other races and genders. This suggests they're not representing our diverse society fairly. Among these models, Stable Diffusion outperforms the others at creating images from text prompts, which is valuable information for anyone choosing a model for their projects. To help others learn from my work and build upon it, I've made all my data, findings, and the code I used in this study publicly available. By sharing this work, I hope to contribute to improving this technology, making it even better and fairer for everyone in the future.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:38331en
dc.identifier.urihttp://hdl.handle.net/10919/116173en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectText to Imageen
dc.subjectDeep Learningen
dc.subjectTransformersen
dc.subjectBias Analysisen
dc.subjectQuantitative Analysisen
dc.subjectQualitative Analysisen
dc.subjectR-Precisionen
dc.subjectFIDen
dc.subjectDALL-Een
dc.subjectLAFITEen
dc.subjectStable Diffusionen
dc.subjectERNIEen
dc.titleQuantitative and Qualitative Analysis of Text-to-Image modelsen
dc.typeThesisen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Masrourisaadat_N_T_2023.pdf
Size:
4.48 MB
Format:
Adobe Portable Document Format

Collections