Data Augmentation with Seq2Seq Models

dc.contributor.authorGranstedt, Jason Louisen
dc.contributor.committeechairBatra, Dhruven
dc.contributor.committeememberBaumann, William T.en
dc.contributor.committeememberHuang, Berten
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2017-07-07T08:00:35Zen
dc.date.available2017-07-07T08:00:35Zen
dc.date.issued2017-07-06en
dc.description.abstractParaphrase sparsity is an issue that complicates the training process of question answering systems: syntactically diverse but semantically equivalent sentences can have significant disparities in predicted output probabilities. We propose a method for generating an augmented paraphrase corpus for the visual question answering system to make it more robust to paraphrases. This corpus is generated by concatenating two sequence to sequence models. In order to generate diverse paraphrases, we sample the neural network using diverse beam search. We evaluate the results on the standard VQA validation set. Our approach results in a significantly expanded training dataset and vocabulary size, but has slightly worse performance when tested on the validation split. Although not as fruitful as we had hoped, our work highlights additional avenues for investigation into selecting more optimal model parameters and the development of a more sophisticated paraphrase filtering algorithm. The primary contribution of this work is the demonstration that decent paraphrases can be generated from sequence to sequence models and the development of a pipeline for developing an augmented dataset.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:10139en
dc.identifier.urihttp://hdl.handle.net/10919/78315en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectData Augmentationen
dc.subjectSeq2Seqen
dc.subjectDiverse Beam Searchen
dc.subjectVQAen
dc.titleData Augmentation with Seq2Seq Modelsen
dc.typeThesisen
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Granstedt_JL_T_2017.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format

Collections