Role of Premises in Visual Question Answering

dc.contributor.authorMahendru, Aromaen
dc.contributor.committeechairBatra, Dhruven
dc.contributor.committeememberHuang, Berten
dc.contributor.committeememberParikh, Devien
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2017-06-13T08:00:43Zen
dc.date.available2017-06-13T08:00:43Zen
dc.date.issued2017-06-12en
dc.description.abstractIn this work, we make a simple but important observation questions about images often contain premises -- objects and relationships implied by the question -- and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer based purely on learned language biases, resulting in nonsensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel irrelevant question detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning.en
dc.description.abstractgeneralThere has been substantial recent work on the Visual Question Answering (VQA) problem in which an automated agent is tasked on answering questions about images posed in natural language. In this work, we make a simple but important observation – questions about images often contain premises – objects and relationships implied by the question – and that reasoning about premises can help VQA models respond more intelligently to irrelevant or previously unseen questions. When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer based purely on learned language biases, resulting in nonsensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel irrelevant question detection models and show that models that reason about premises consistently outperform models that do not. We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:10003en
dc.identifier.urihttp://hdl.handle.net/10919/78030en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectMachine learningen
dc.subjectNatural Language Processingen
dc.subjectComputer Visionen
dc.subjectArtificial Intelligenceen
dc.titleRole of Premises in Visual Question Answeringen
dc.typeThesisen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mahendru_A_T_2017.pdf
Size:
9.3 MB
Format:
Adobe Portable Document Format

Collections