Towards Interpretable Vision Systems

dc.contributor.authorZhang, Pengen
dc.contributor.committeechairParikh, Devien
dc.contributor.committeememberHuang, Jia-Binen
dc.contributor.committeememberHuang, Berten
dc.contributor.committeememberDhillon, Harpreet Singhen
dc.contributor.committeememberSummers-Stay, Douglasen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2017-12-07T09:00:46Zen
dc.date.available2017-12-07T09:00:46Zen
dc.date.issued2017-12-06en
dc.description.abstractArtificial intelligent (AI) systems today are booming and they are used to solve new tasks or improve the performance on existing ones. However, most AI systems work in a black-box fashion, which prevents the users from accessing the inner modules. This leads to two major problems: (i) users have no idea when the underlying system will fail and thus it could fail abruptly without any warning or explanation, and (ii) users' lack of proficiency about the system could fail pushing the AI progress to its state-of-the-art. In this work, we address these problems in the following directions. First, we develop a failure prediction system, acting as an input filter. It raises a flag when the system is likely to fail with the given input. Second, we develop a portfolio computer vision system. It is able to predict which of the candidate computer vision systems perform the best on the input. Both systems have the benefit of only looking at the inputs without running the underlying vision systems. Besides, they are applicable to any vision system. By equipped such systems on different applications, we confirm the improved performance. Finally, instead of identifying errors, we develop more interpretable AI systems, which reveal the inner modules directly. We take two tasks as examples, words semantic matching and Visual Question Answering (VQA). In VQA, we take binary questions on abstract scenes as the first stage, then we extend to all question types on real images. In both cases, we take attention as an important intermediate output. By explicitly forcing the systems to attend correct regions, we ensure the correctness in the systems. We build a neural network to directly learn the semantic matching, instead of using the relation similarity between words. Across all the above directions, we show that by diagnosing errors and making more interpretable systems, we are able to improve the performance in the current models.en
dc.description.abstractgeneralResearchers have made rapid progresses in artificial intelligence (AI). For example, AI systems were able to reach new state-of-the-art performance on object detection task in computer vision; AI systems were able to play games themselves, such as Alpha GO, which was never happened before. However, most of the AI systems work in a black-box fashion, which prevents users from accessing the inner modules. This could result in two problems. On one hand, users do not know when the underlying systems will fail. For example, in object detection task, users have no idea when the system could not recognize a cat in a cat image or when the system will recognize a dog as a cat. On the other hand, users have no access on how the system work, so it is hard for them to find the bottle neck and improve the overall performance. In this work, we tackle the above problems in two broad directions: diagnosing the errors and making interpretable systems. The first one can be addressed in two ways: identifying the erroneous inputs and identifying the erroneous systems. Thus, we build a failure prediction system and a portfolio computer vision system, respectively. Failure prediction system could raise a warning when the input is not reliable, while the portfolio system could pick predicted best-performing approach from candidates. Finally, we focus on developing more interpretable AI systems, which reveal the inner modules directly. We take two tasks as examples, words semantic matching and Visual Question Answering (VQA). VQA system produces an answer upon given image and question. We take attention as the important intermediate output, which mimics how humans solve this task. In semantic matching, we build a system to learn the semantic matching between words, instead of using the relation similarity between them. In both directions, we show the improved performance in a variety of applications.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:13402en
dc.identifier.urihttp://hdl.handle.net/10919/81074en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectfailure predictionen
dc.subjectportfolio vision systemen
dc.subjectinterpretable vision systemsen
dc.titleTowards Interpretable Vision Systemsen
dc.typeDissertationen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Zhang_P_D_2017.pdf
Size:
15.6 MB
Format:
Adobe Portable Document Format
Name:
Zhang_P_D_2017_support_2.zip
Size:
52.67 MB
Format:
Description:
Supporting documents