Show simple item record

dc.contributor.authorAzizi, Ahmadreza
dc.contributor.authorMulchandani, Deepika
dc.contributor.authorNaik, Amit
dc.contributor.authorNgo, Khai
dc.contributor.authorPatil, Suraj
dc.contributor.authorVezvaee, Arian
dc.contributor.authorYang, Robin
dc.date.accessioned2018-01-04T02:22:49Z
dc.date.available2018-01-04T02:22:49Z
dc.date.issued2018-01-03
dc.identifier.urihttp://hdl.handle.net/10919/81512
dc.description.abstractThis project submission includes the work of the 'Classification' team of the CS5604 'Information Storage and Retrieval' course of Fall 2017 towards the GETAR project. Classification of the GETAR data would allow users to analyze, visualize, and explore content related to crises, disasters, human rights, inequality, population growth, shootings, violence, etc. Binary classification models were trained for different events for both tweet and webpage collections. Word2Vec was used as the feature selection technique and the Word2Vec model was trained on the entire corpus available. Logistic Regression was used as our classification technique. As part of this submission, we detail our classification framework and the experiments that we conducted. We also give an insight into the challenges we faced, how we overcame those challenges, and also what we learned in the process. We also provide the code that we implemented and the models that were built to classify 1,562,215 tweets and 4,366 webpages.en_US
dc.description.sponsorshipNSF grant IIS-1619028en_US
dc.language.isoen_USen_US
dc.publisherVirginia Techen_US
dc.subjectClassificationen_US
dc.subjectMachine Learningen_US
dc.subjectWord2Vecen_US
dc.subjectLogisitic Regressionen_US
dc.titleCS5604 Fall 2017 Classification Team Submissionen_US
dc.typeDataseten_US
dc.typePresentationen_US
dc.typeReporten_US
dc.typeSoftwareen_US
dc.description.notesThis submission includes the work done for 'Classification' of the GETAR data during the Fall 2017 CS5604 'Information Storage and Retrieval' course conducted at Virginia Tech, Blacksburg, VA 24061. In this submission, we include a presentation, a report, and a zip file for our code. The details of these files are given below: 1. CS5604F2017_Final Presentation_ClassificationTeam: This presentation gives a brief overview of our team's objective, the various challenges we faced, and the results that we obtained while implementing machine learning techniques for Classification. This file is included in 2 formats - PowerPoint Presentation(.pptx) and Portable Document Format(.pdf) 2. CS5604F2017_Final Report_ClassificationTeam: This report gives a detailed account of our efforts, our failed and successful attempts towards the implementation of this project. It also presents what we learned, implemented, and what we see as future work for this project. This file is in the Portable Document Format (.pdf) format 3. CS5604F2017_Final Report_ClassificationTeam_Tex: Zip file containing the LaTex package of the above-mentioned report. 4. ISRProject-cs5604f17_cla: Zip file for the Scala and Spark code implemented for the project along with the classification models generated and the data used.en_US


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record