Browsing by Author "Gad, Samah"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- An Epidemiology of Information: Data Mining the 1918 Influenza Epidemic Project ReportHausman, Bernice L.; Pencek, Bruce; Ramakrishnan, Naren; Eysenbach, Gunther; Ewing, E. Thomas; Kerr, Kathleen; Gad, Samah (2014-04-07)This project research report describes the results of four case studies undertaken as part of Virginia Tech’s “An Epidemiology of Information: Data Mining the 1918 Flu Pandemic,” which was funded through the Digging into Data Challenge of the National Endowment for the Humanities.
- Segmentation AlgorithmGad, Samah (2014-04-09)We developed a dynamic temporal segmentation algorithm that wraps around topic modeling algorithms for the purpose of identifying change points where significant shifts in topics occur. The main task of the segmentation algorithm is to automatically partition the total time period defined by the documents in the collection such that segment boundaries indicate important periods of temporal evolution and re-organization. The algorithm moves across the data by time and evaluates two adjacent windows, assuming a given segmentation granularity (e.g., discrete days, weeks, or months). This granularity varies from one application to another and is decided by domain experts. We evaluate adjacent windows by comparing their underlying topic distributions and quantifying common terms and their probabilities. We chose to quantify common terms based on the overlap between them. The overlap can be captured using a contingency table.
- Tone classifierGad, Samah (2014-02-24)The goal of tone analysis is to identify tone from text. We focused on the following tones: alarmist, warning, reassuring, and explanatory. To detect tones from text automatically, we used a supervised machine learning approach. This is a classic text classification problem, and a usual practice in approaching such problems is to first examine text chunks using a Multinomial Naïve Bayes classifier (based on the bag-of-words model). The classifier is based on Bayes’s theorem with a feature model that is conditionally independent of the tone. The classifier is first trained using the features extracted from manually tagged text. After training, the classifier predicts tones for newly extracted, previously unseen, text.
- Understanding the Role of Medical Experts during a Public Health Crisis: Digital Tools and Library Resources for Research on the 1918 Spanish InfluenzaEwing, E. Thomas; Gad, Samah; Ramakrishnan, Naren (IEEE, 2014-10)Humanities scholars, particularly historians of health and disease, can benefit from digitized library collections and tools such as topic modeling. Using a case study from the 1918 Spanish Flu epidemic, this paper explores the application of a big humanities approach to understanding the impact of a public health official on the course of the disease and the response of the public, as documented through digitized newspapers and medical periodicals.