CS5604 Fall 2017 Classification Team Submission

dc.contributor.authorAzizi, Ahmadrezaen
dc.contributor.authorMulchandani, Deepikaen
dc.contributor.authorNaik, Amiten
dc.contributor.authorNgo, Khaien
dc.contributor.authorPatil, Surajen
dc.contributor.authorVezvaee, Arianen
dc.contributor.authorYang, Robinen
dc.date.accessioned2018-01-04T02:22:49Zen
dc.date.available2018-01-04T02:22:49Zen
dc.date.issued2018-01-03en
dc.description.abstractThis project submission includes the work of the 'Classification' team of the CS5604 'Information Storage and Retrieval' course of Fall 2017 towards the GETAR project. Classification of the GETAR data would allow users to analyze, visualize, and explore content related to crises, disasters, human rights, inequality, population growth, shootings, violence, etc. Binary classification models were trained for different events for both tweet and webpage collections. Word2Vec was used as the feature selection technique and the Word2Vec model was trained on the entire corpus available. Logistic Regression was used as our classification technique. As part of this submission, we detail our classification framework and the experiments that we conducted. We also give an insight into the challenges we faced, how we overcame those challenges, and also what we learned in the process. We also provide the code that we implemented and the models that were built to classify 1,562,215 tweets and 4,366 webpages.en
dc.description.notesThis submission includes the work done for 'Classification' of the GETAR data during the Fall 2017 CS5604 'Information Storage and Retrieval' course conducted at Virginia Tech, Blacksburg, VA 24061. In this submission, we include a presentation, a report, and a zip file for our code. The details of these files are given below: 1. CS5604F2017_Final Presentation_ClassificationTeam: This presentation gives a brief overview of our team's objective, the various challenges we faced, and the results that we obtained while implementing machine learning techniques for Classification. This file is included in 2 formats - PowerPoint Presentation(.pptx) and Portable Document Format(.pdf) 2. CS5604F2017_Final Report_ClassificationTeam: This report gives a detailed account of our efforts, our failed and successful attempts towards the implementation of this project. It also presents what we learned, implemented, and what we see as future work for this project. This file is in the Portable Document Format (.pdf) format 3. CS5604F2017_Final Report_ClassificationTeam_Tex: Zip file containing the LaTex package of the above-mentioned report. 4. ISRProject-cs5604f17_cla: Zip file for the Scala and Spark code implemented for the project along with the classification models generated and the data used.en
dc.description.sponsorshipNSF grant IIS-1619028en
dc.identifier.urihttp://hdl.handle.net/10919/81512en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectClassificationen
dc.subjectMachine Learningen
dc.subjectWord2Vecen
dc.subjectLogisitic Regressionen
dc.titleCS5604 Fall 2017 Classification Team Submissionen
dc.typeDataseten
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
CS5604F2017_Final Presentation_ClassificationTeam.pdf
Size:
938.51 KB
Format:
Adobe Portable Document Format
Name:
CS5604F2017_Final Presentation_ClassificationTeam.pptx
Size:
1.47 MB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
CS5604F2017_Final Report_ClassificationTeam.pdf
Size:
2.93 MB
Format:
Adobe Portable Document Format
Name:
CS5604F2017_Final Report_ClassificationTeam_Tex.zip
Size:
5.65 MB
Format:
Name:
ISRProject-cs5604f17_cla.zip
Size:
350.24 MB
Format:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: