Unsupervised Learning of Spatiotemporal Features by Video Completion

dc.contributor.authorNallabolu, Adithya Reddyen
dc.contributor.committeechairKochersberger, Kevin B.en
dc.contributor.committeechairHuang, Jia-Binen
dc.contributor.committeememberDhillon, Harpreet Singhen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2017-10-19T08:00:43Zen
dc.date.available2017-10-19T08:00:43Zen
dc.date.issued2017-10-18en
dc.description.abstractIn this work, we present an unsupervised representation learning approach for learning rich spatiotemporal features from videos without the supervision from semantic labels. We propose to learn the spatiotemporal features by training a 3D convolutional neural network (CNN) using video completion as a surrogate task. Using a large collection of unlabeled videos, we train the CNN to predict the missing pixels of a spatiotemporal hole given the remaining parts of the video through minimizing per-pixel reconstruction loss. To achieve good reconstruction results using color videos, the CNN needs to have a certain level of understanding of the scene dynamics and predict plausible, temporally coherent contents. We further explore to jointly reconstruct both color frames and flow fields. By exploiting the statistical temporal structure of images, we show that the learned representations capture meaningful spatiotemporal structures from raw videos. We validate the effectiveness of our approach for CNN pre-training on action recognition and action similarity labeling problems. Our quantitative results demonstrate that our method compares favorably against learning without external data and existing unsupervised learning approaches.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:12668en
dc.identifier.urihttp://hdl.handle.net/10919/79702en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectRepresentation Learningen
dc.subjectSuperviseden
dc.subjectUnsuperviseden
dc.titleUnsupervised Learning of Spatiotemporal Features by Video Completionen
dc.typeThesisen
thesis.degree.disciplineComputer Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nallabolu_A_T_2017.pdf
Size:
17.03 MB
Format:
Adobe Portable Document Format

Collections