VTechWorks staff will be away for the Independence Day holiday from July 4-7. We will respond to email inquiries on Monday, July 8. Thank you for your patience.
 

Ensemble Classification Project

dc.contributor.authorAlabdulhadi, Mohammed H.en
dc.contributor.authorKannan, Vijayasarathyen
dc.contributor.authorSoundarapandian, Manikandanen
dc.contributor.authorHamid, Taniaen
dc.date.accessioned2014-05-08T15:58:53Zen
dc.date.available2014-05-08T15:58:53Zen
dc.date.issued2014-05-08en
dc.description.abstractTransfer learning unlike traditional machine learning is a technique that allows domains, tasks and distributions used in training and testing to be different. Knowledge gained from one domain can be utilized to learn a completely different domain. Ensemble computing portal is a digital library that contains resources, communities and technologies to aid in teaching. The major objective of this project is to apply the learning gained from the ACM Computing Classification System and classify educational YouTube videos so that they can be included in the Ensemble computing portal. Metadata of technical papers published in ACM are indexed in a SOLR server and we issue REST calls to retrieve the required metadata viz. title, abstract and general terms that we use to build the features. We make use of the ACM Computing Classification System 2012’s classification hierarchy to train our classifiers. We build classifiers for the level-2 and level-3 categories in the classification tree to help in classifying the educational YouTube videos. We utilize YouTube data API to search for educational videos in YouTube and retrieve the metadata including title, description and transcripts of the videos. These become the features of our test set. We specifically search for YouTube playlists that contain educational videos as we found out from our experience that neither a regular video search nor a search for videos in channels do retrieve relevant educational videos. We evaluate our classifiers using 10-fold cross-validation and present their accuracy in this report. With the classifiers built and trained using ACM metadata, we provide them the metadata that we collect from YouTube as the test data and manually evaluate the predictions. The results of our manual evaluation and the accuracy of our classifiers are also discussed. We identified that the ACM Computing Classification System’s hierarchy is sometimes ambiguous and YouTube metadata are not always reliable. These are the major factors that contribute to the reduced accuracy of our classifiers. In the future, we hope sophisticated natural language processing techniques can be applied to refine the features of both training and target data, which would help in improving the performance. We believe that more relevant metadata from YouTube in the form of transcripts and embedded text can be collected using sophisticated voice-to-text conversion and image retrieval algorithms respectively. This idea of transfer learning can also be extended to classify the presentation slides that are available in slideshare (http://www.slideshare.net) and also to classify certain educational blogs.en
dc.description.sponsorshipThe Project client is Chen, Yinlin (ylchen@vt.edu) who is a PHD candidate working in the Digital Library Research Laboratory at Virginia Techen
dc.identifier.urihttp://hdl.handle.net/10919/47922en
dc.language.isoen_USen
dc.rightsCreative Commons Attribution-NonCommercial 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/us/en
dc.subjectEnsemble Classificationen
dc.subjectACM CCSen
dc.subjectYouTube Educational Videosen
dc.subjectTransfer Learningen
dc.subjectText Classificationen
dc.titleEnsemble Classification Projecten
dc.typePresentationen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 7
Loading...
Thumbnail Image
Name:
Ensemble_Classification_Mid_Term_Presentation.pdf
Size:
530.42 KB
Format:
Adobe Portable Document Format
Description:
Ensemble Classification Midterm Presentation (PDF)
Name:
Ensemble_Classification_Mid_Term_Presentation.pptx
Size:
308.42 KB
Format:
Microsoft Powerpoint XML
Description:
Ensemble Classification Midterm Presentation (PowerPoint)
Loading...
Thumbnail Image
Name:
Ensemble_Classification_Final_Presentation.pdf
Size:
931.26 KB
Format:
Adobe Portable Document Format
Description:
Ensemble Classification Final Presentation (PDF)
Name:
Ensemble_Classification_Final_Presentation.pptx
Size:
707.13 KB
Format:
Microsoft Powerpoint XML
Description:
Ensemble Classification Final Presentation (PowerPoint)
Loading...
Thumbnail Image
Name:
Ensemble_Classification_Final_Report.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format
Description:
Ensemble Classification Final Report (PDF)
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: