A Cost-Effective Semi-Automated Approach for Comprehensive Event Extraction

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Automated event extraction from free text remains an open problem, particularly when the goal is to identify all relevant events. Manual extraction is currently the only alternative for comprehensive and reliable extraction. Therefore, it is required to have a system that can comprehensively extract events reported in news articles (high recall) and is also scalable enough to handle a large number of articles.

In this dissertation, we explore various methods to develop an event extraction system that can mitigate these challenges. We primarily investigate three major problems related to event extraction as follows. (i) What are the strengths and weaknesses of the automated event extractors? A thorough understanding of what can be automated with high success and what leads to common pitfalls is crucial before we could develop a superior event extraction system. (ii) How can we build a hybrid event extraction system that can bridge the gap between manual and automated event extraction? Hybrid extraction is a semi-automated approach that uses an ecosystem of machine learning models along with a carefully designed user interface for extracting events. Since this method is semi-automated it also requires a meticulous understanding of user behavior in order to identify tasks that humans can perform with ease while diverting the more tedious task to the machine learning methods (iii) Finally, we explore methods for displaying extracted events that could simplify the analytical and inference generation processes for an analyst. We particularly aim to develop visualizations that would allow analysts can perform macro and micro level analysis of significant societal events.



Event Extraction, Visual Analytics, News Analytics, Civil Unrest