Crisis Events One-Class Text Classification

Analyzing web articles related to crisis events can help social scientists gauge public sentiment and form public policy around how to react to such disasters. However, data collection for such tasks is difficult. Manual dataset curation is time-consuming and costly, as a user needs to use some sort of search engine to iterate through multiple web pages, painstakingly analyzing each document thoroughly to determine the crisis events it may be related to. Automated processes, however, such as web crawlers, operate primarily via rule-based methods, which may not accurately classify individual documents as being related to the crisis event of interest. In our work, we seek to use machine learning techniques to determine whether individual documents are related to a specific crisis event using natural language processing techniques. To accomplish this, we treat the area of interest as a single class, and consider all other topics as not being of interest. We hypothesize that natural language processing techniques can be used to to classify a particular webpage as being relevant to a certain crisis. A potential motivation for this approach is to guide efficient web crawling using techniques from semantic analysis.

Persistent link

https://hdl.handle.net/10919/117116

Collections

CS4624: Multimedia, Hypertext, and Information Access

Full item page

Crisis Events One-Class Text Classification

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections