Visual Question Answering in the Medical Domain

Sharma, Dhruv

Visual Question Answering in the Medical Domain

dc.contributor.author	Sharma, Dhruv	en
dc.contributor.committeechair	Reddy, Chandan K.	en
dc.contributor.committeemember	Viswanath, Bimal	en
dc.contributor.committeemember	Jiang, Jiepu	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2022-01-13T07:00:06Z	en
dc.date.available	2022-01-13T07:00:06Z	en
dc.date.issued	2020-07-21	en
dc.description.abstract	Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this thesis, we develop a deep learning-based model for VQA on medical images taking the associated challenges into account. Our MedFuseNet system aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and weaving everything together to predict the answer. We tackle two types of answer prediction - categorization and generation. We conduct an extensive set of both quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks.	en
dc.description.abstractgeneral	Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. In this thesis, we propose an end-to-end deep learning-based system, MedFuseNet, for predicting the answer for the input query associated with the image. We cater to close-ended as well as open-ended type question-answer pairs. We conduct an extensive analysis to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:26935	en
dc.identifier.uri	http://hdl.handle.net/10919/107586	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Visual Question Answering	en
dc.subject	deep learning	en
dc.subject	medical images	en
dc.title	Visual Question Answering in the Medical Domain	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sharma_D_T_2020.pdf
Size:: 4.99 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses