Role of Premises in Visual Question Answering

Mahendru, Aroma

Role of Premises in Visual Question Answering

Files

Mahendru_A_T_2017.pdf (9.3 MB)

Downloads: 1279

Date

2017-06-12

Authors

Mahendru, Aroma

Publisher

Virginia Tech

Abstract

In this work, we make a simple but important observation questions about images often contain premises -- objects and relationships implied by the question -- and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions.

When presented with a question that is irrelevant to an image, state-of-the-art VQA models will still answer based purely on learned language biases, resulting in nonsensical or even misleading answers. We note that a visual question is irrelevant to an image if at least one of its premises is false (i.e. not depicted in the image). We leverage this observation to construct a dataset for Question Relevance Prediction and Explanation (QRPE) by searching for false premises. We train novel irrelevant question detection models and show that models that reason about premises consistently outperform models that do not.

We also find that forcing standard VQA models to reason about premises during training can lead to improvements on tasks requiring compositional reasoning.

Keywords

Machine learning, Natural Language Processing, Computer Vision, Artificial Intelligence

Persistent link

http://hdl.handle.net/10919/78030

Collections

Masters Theses

Full item page