Im2Vid: Future Video Prediction for Static Image Action Recognition

AlBahar, Badour A Sh A.

Im2Vid: Future Video Prediction for Static Image Action Recognition

dc.contributor.author	AlBahar, Badour A Sh A.	en
dc.contributor.committeechair	Huang, Jia-Bin	en
dc.contributor.committeemember	Tokekar, Pratap	en
dc.contributor.committeemember	Abbott, A. Lynn	en
dc.contributor.department	Electrical and Computer Engineering	en
dc.date.accessioned	2018-06-21T08:01:04Z	en
dc.date.available	2018-06-21T08:01:04Z	en
dc.date.issued	2018-06-20	en
dc.description.abstract	Static image action recognition aims at identifying the action performed in a given image. Most existing static image action recognition approaches use high-level cues present in the image such as objects, object human interaction, or human pose to better capture the action performed. Unlike images, videos have temporal information that greatly improves action recognition by resolving potential ambiguity. We propose to leverage a large amount of readily available unlabeled videos to transfer the temporal information from video domain to static image domain and hence improve static image action recognition. Specifically, We propose a video prediction model to predict the future video of a static image and use the future predicted video to improve static image action recognition. Our experimental results on four datasets validate that the idea of transferring the temporal information from videos to static images is promising, and can enhance static image action recognition performance.	en
dc.description.abstractgeneral	Static image action recognition is the problem of identifying the action performed in a given image. Most existing approaches use the high-level cues present in the image like objects, object human interaction, or human pose to better capture the action performed. Unlike images, videos have temporal information that greatly improves action recognition. Looking at a static image of a man who is about to sit on a chair might be misunderstood as an image of a man who is standing from the chair. Because of the temporal information in videos, such ambiguity is not present. To transfer the temporal information and action features from video domain to static image domain and hence improve static image action recognition, we propose a model that learns a mapping from a static image to its future video by looking at a large number of existing images and their future videos. We then use this model to predict the future video of a static image to improve its action recognition. Our experimental results on four datasets show that the idea of transferring the temporal information from videos to static images is promising, and can enhance static image action recognition performance.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:15485	en
dc.identifier.uri	http://hdl.handle.net/10919/83602	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Human Action Recognition	en
dc.subject	Static Image Action Recognition	en
dc.subject	Video Action Recognition	en
dc.subject	Future Video Prediction	en
dc.title	Im2Vid: Future Video Prediction for Static Image Action Recognition	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AlBahar_BA_T_2018.pdf
Size:: 3.89 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses