Ensemble Classification Project
dc.contributor.author | Alabdulhadi, Mohammed H. | en |
dc.contributor.author | Kannan, Vijayasarathy | en |
dc.contributor.author | Soundarapandian, Manikandan | en |
dc.contributor.author | Hamid, Tania | en |
dc.date.accessioned | 2014-05-08T15:58:53Z | en |
dc.date.available | 2014-05-08T15:58:53Z | en |
dc.date.issued | 2014-05-08 | en |
dc.description.abstract | Transfer learning unlike traditional machine learning is a technique that allows domains, tasks and distributions used in training and testing to be different. Knowledge gained from one domain can be utilized to learn a completely different domain. Ensemble computing portal is a digital library that contains resources, communities and technologies to aid in teaching. The major objective of this project is to apply the learning gained from the ACM Computing Classification System and classify educational YouTube videos so that they can be included in the Ensemble computing portal. Metadata of technical papers published in ACM are indexed in a SOLR server and we issue REST calls to retrieve the required metadata viz. title, abstract and general terms that we use to build the features. We make use of the ACM Computing Classification System 2012’s classification hierarchy to train our classifiers. We build classifiers for the level-2 and level-3 categories in the classification tree to help in classifying the educational YouTube videos. We utilize YouTube data API to search for educational videos in YouTube and retrieve the metadata including title, description and transcripts of the videos. These become the features of our test set. We specifically search for YouTube playlists that contain educational videos as we found out from our experience that neither a regular video search nor a search for videos in channels do retrieve relevant educational videos. We evaluate our classifiers using 10-fold cross-validation and present their accuracy in this report. With the classifiers built and trained using ACM metadata, we provide them the metadata that we collect from YouTube as the test data and manually evaluate the predictions. The results of our manual evaluation and the accuracy of our classifiers are also discussed. We identified that the ACM Computing Classification System’s hierarchy is sometimes ambiguous and YouTube metadata are not always reliable. These are the major factors that contribute to the reduced accuracy of our classifiers. In the future, we hope sophisticated natural language processing techniques can be applied to refine the features of both training and target data, which would help in improving the performance. We believe that more relevant metadata from YouTube in the form of transcripts and embedded text can be collected using sophisticated voice-to-text conversion and image retrieval algorithms respectively. This idea of transfer learning can also be extended to classify the presentation slides that are available in slideshare (http://www.slideshare.net) and also to classify certain educational blogs. | en |
dc.description.sponsorship | The Project client is Chen, Yinlin (ylchen@vt.edu) who is a PHD candidate working in the Digital Library Research Laboratory at Virginia Tech | en |
dc.identifier.uri | http://hdl.handle.net/10919/47922 | en |
dc.language.iso | en_US | en |
dc.rights | Creative Commons Attribution-NonCommercial 3.0 United States | en |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/us/ | en |
dc.subject | Ensemble Classification | en |
dc.subject | ACM CCS | en |
dc.subject | YouTube Educational Videos | en |
dc.subject | Transfer Learning | en |
dc.subject | Text Classification | en |
dc.title | Ensemble Classification Project | en |
dc.type | Presentation | en |
dc.type | Technical report | en |
Files
Original bundle
1 - 5 of 7
Loading...
- Name:
- Ensemble_Classification_Mid_Term_Presentation.pdf
- Size:
- 530.42 KB
- Format:
- Adobe Portable Document Format
- Description:
- Ensemble Classification Midterm Presentation (PDF)
- Name:
- Ensemble_Classification_Mid_Term_Presentation.pptx
- Size:
- 308.42 KB
- Format:
- Microsoft Powerpoint XML
- Description:
- Ensemble Classification Midterm Presentation (PowerPoint)
Loading...
- Name:
- Ensemble_Classification_Final_Presentation.pdf
- Size:
- 931.26 KB
- Format:
- Adobe Portable Document Format
- Description:
- Ensemble Classification Final Presentation (PDF)
- Name:
- Ensemble_Classification_Final_Presentation.pptx
- Size:
- 707.13 KB
- Format:
- Microsoft Powerpoint XML
- Description:
- Ensemble Classification Final Presentation (PowerPoint)
Loading...
- Name:
- Ensemble_Classification_Final_Report.pdf
- Size:
- 1.5 MB
- Format:
- Adobe Portable Document Format
- Description:
- Ensemble Classification Final Report (PDF)
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: