CS5604 Fall 2017 Classification Team Submission
dc.contributor.author | Azizi, Ahmadreza | en |
dc.contributor.author | Mulchandani, Deepika | en |
dc.contributor.author | Naik, Amit | en |
dc.contributor.author | Ngo, Khai | en |
dc.contributor.author | Patil, Suraj | en |
dc.contributor.author | Vezvaee, Arian | en |
dc.contributor.author | Yang, Robin | en |
dc.date.accessioned | 2018-01-04T02:22:49Z | en |
dc.date.available | 2018-01-04T02:22:49Z | en |
dc.date.issued | 2018-01-03 | en |
dc.description.abstract | This project submission includes the work of the 'Classification' team of the CS5604 'Information Storage and Retrieval' course of Fall 2017 towards the GETAR project. Classification of the GETAR data would allow users to analyze, visualize, and explore content related to crises, disasters, human rights, inequality, population growth, shootings, violence, etc. Binary classification models were trained for different events for both tweet and webpage collections. Word2Vec was used as the feature selection technique and the Word2Vec model was trained on the entire corpus available. Logistic Regression was used as our classification technique. As part of this submission, we detail our classification framework and the experiments that we conducted. We also give an insight into the challenges we faced, how we overcame those challenges, and also what we learned in the process. We also provide the code that we implemented and the models that were built to classify 1,562,215 tweets and 4,366 webpages. | en |
dc.description.notes | This submission includes the work done for 'Classification' of the GETAR data during the Fall 2017 CS5604 'Information Storage and Retrieval' course conducted at Virginia Tech, Blacksburg, VA 24061. In this submission, we include a presentation, a report, and a zip file for our code. The details of these files are given below: 1. CS5604F2017_Final Presentation_ClassificationTeam: This presentation gives a brief overview of our team's objective, the various challenges we faced, and the results that we obtained while implementing machine learning techniques for Classification. This file is included in 2 formats - PowerPoint Presentation(.pptx) and Portable Document Format(.pdf) 2. CS5604F2017_Final Report_ClassificationTeam: This report gives a detailed account of our efforts, our failed and successful attempts towards the implementation of this project. It also presents what we learned, implemented, and what we see as future work for this project. This file is in the Portable Document Format (.pdf) format 3. CS5604F2017_Final Report_ClassificationTeam_Tex: Zip file containing the LaTex package of the above-mentioned report. 4. ISRProject-cs5604f17_cla: Zip file for the Scala and Spark code implemented for the project along with the classification models generated and the data used. | en |
dc.description.sponsorship | NSF grant IIS-1619028 | en |
dc.identifier.uri | http://hdl.handle.net/10919/81512 | en |
dc.language.iso | en_US | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Classification | en |
dc.subject | Machine Learning | en |
dc.subject | Word2Vec | en |
dc.subject | Logisitic Regression | en |
dc.title | CS5604 Fall 2017 Classification Team Submission | en |
dc.type | Dataset | en |
dc.type | Presentation | en |
dc.type | Report | en |
dc.type | Software | en |
Files
Original bundle
1 - 5 of 5
Loading...
- Name:
- CS5604F2017_Final Presentation_ClassificationTeam.pdf
- Size:
- 938.51 KB
- Format:
- Adobe Portable Document Format
- Name:
- CS5604F2017_Final Presentation_ClassificationTeam.pptx
- Size:
- 1.47 MB
- Format:
- Microsoft Powerpoint XML
Loading...
- Name:
- CS5604F2017_Final Report_ClassificationTeam.pdf
- Size:
- 2.93 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: