Visual Question Answering in the Medical Domain

dc.contributor.authorSharma, Dhruven
dc.contributor.committeechairReddy, Chandan K.en
dc.contributor.committeememberViswanath, Bimalen
dc.contributor.committeememberJiang, Jiepuen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2022-01-13T07:00:06Zen
dc.date.available2022-01-13T07:00:06Zen
dc.date.issued2020-07-21en
dc.description.abstractMedical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this thesis, we develop a deep learning-based model for VQA on medical images taking the associated challenges into account. Our MedFuseNet system aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and weaving everything together to predict the answer. We tackle two types of answer prediction - categorization and generation. We conduct an extensive set of both quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks.en
dc.description.abstractgeneralMedical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. In this thesis, we propose an end-to-end deep learning-based system, MedFuseNet, for predicting the answer for the input query associated with the image. We cater to close-ended as well as open-ended type question-answer pairs. We conduct an extensive analysis to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:26935en
dc.identifier.urihttp://hdl.handle.net/10919/107586en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectVisual Question Answeringen
dc.subjectdeep learningen
dc.subjectmedical imagesen
dc.titleVisual Question Answering in the Medical Domainen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sharma_D_T_2020.pdf
Size:
4.99 MB
Format:
Adobe Portable Document Format

Collections