This project seeks to understand how newspapers shaped public opinion during the 1918 influenza pandemic. Using data mining techniques combined with historical and rhetorical analysis, we explore hundreds of newspaper titles, including those from Chronicling America at the United States Library of Congress and the Peel’s Prairie Provinces collection at the University of Alberta Library, to understand the flow of information about the spread and impact of disease.
For more information, contact Professor E. Thomas Ewing, Principal Investigator and Project Director, Department of History, Virginia Tech, at email@example.com.
We developed a dynamic temporal segmentation algorithm that wraps around topic modeling algorithms for the purpose of identifying change points where significant shifts in topics occur. The main task of the segmentation algorithm is to automatically partition the total time period defined by the documents in the collection such that segment boundaries indicate important periods of temporal evolution and re-organization. The algorithm moves across the data by time and evaluates two adjacent windows, assuming a given segmentation granularity (e.g., discrete days, weeks, or months). This granularity varies from one application to another and is decided by domain experts. We evaluate adjacent windows by comparing their underlying topic distributions and quantifying common terms and their probabilities. We chose to quantify common terms based on the overlap between them. The overlap can be captured using a contingency table.
This project research report describes the results of four case studies undertaken as part of Virginia Tech’s “An Epidemiology of Information: Data Mining the 1918 Flu Pandemic,” which was funded through the Digging into Data Challenge of the National Endowment for the Humanities.
The goal of tone analysis is to identify tone from text. We focused on the following tones: alarmist, warning, reassuring, and explanatory. To detect tones from text automatically, we used a supervised machine learning approach. This is a classic text classification problem, and a usual practice in approaching such problems is to first examine text chunks using a Multinomial Naïve Bayes classifier (based on the bag-of-words model). The classifier is based on Bayes’s theorem with a feature model that is conditionally independent of the tone. The classifier is first trained using the features extracted from manually tagged text. After training, the classifier predicts tones for newly extracted, previously unseen, text.