CS5604 Fall 2017 Classification Team Submission

This project submission includes the work of the 'Classification' team of the CS5604 'Information Storage and Retrieval' course of Fall 2017 towards the GETAR project. Classification of the GETAR data would allow users to analyze, visualize, and explore content related to crises, disasters, human rights, inequality, population growth, shootings, violence, etc. Binary classification models were trained for different events for both tweet and webpage collections. Word2Vec was used as the feature selection technique and the Word2Vec model was trained on the entire corpus available. Logistic Regression was used as our classification technique. As part of this submission, we detail our classification framework and the experiments that we conducted. We also give an insight into the challenges we faced, how we overcame those challenges, and also what we learned in the process. We also provide the code that we implemented and the models that were built to classify 1,562,215 tweets and 4,366 webpages.

Keywords

Classification, Machine Learning, Word2Vec, Logisitic Regression

Persistent link

http://hdl.handle.net/10919/81512

Collections

CS5604: Information Retrieval

Full item page

CS5604 Fall 2017 Classification Team Submission

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections