Crisis Events One-Class Text Classification
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Analyzing web articles related to crisis events can help social scientists gauge public sentiment and form public policy around how to react to such disasters. However, data collection for such tasks is difficult. Manual dataset curation is time-consuming and costly, as a user needs to use some sort of search engine to iterate through multiple web pages, painstakingly analyzing each document thoroughly to determine the crisis events it may be related to. Automated processes, however, such as web crawlers, operate primarily via rule-based methods, which may not accurately classify individual documents as being related to the crisis event of interest. In our work, we seek to use machine learning techniques to determine whether individual documents are related to a specific crisis event using natural language processing techniques. To accomplish this, we treat the area of interest as a single class, and consider all other topics as not being of interest. We hypothesize that natural language processing techniques can be used to to classify a particular webpage as being relevant to a certain crisis. A potential motivation for this approach is to guide efficient web crawling using techniques from semantic analysis.