Show simple item record

dc.contributor.authorGhosh, Sauraven_US
dc.date.accessioned2017-11-30T09:00:26Z
dc.date.available2017-11-30T09:00:26Z
dc.date.issued2017-11-29en_US
dc.identifier.othervt_gsexam:13231en_US
dc.identifier.urihttp://hdl.handle.net/10919/80574
dc.description.abstractTraditional disease surveillance can be augmented with a wide variety of open sources, such as online news media, twitter, blogs, and web search records. Rapidly increasing volumes of these open sources are proving to be extremely valuable resources in helping analyze, detect, and forecast outbreaks of infectious diseases, especially new diseases or diseases spreading to new regions. However, these sources are in general unstructured (noisy) and construction of surveillance tools ranging from real-time disease outbreak monitoring to construction of epidemiological line lists involves considerable human supervision. Intelligent modeling of such sources using text mining methods such as, topic models, deep learning and dependency parsing can lead to automated generation of the mentioned surveillance tools. Moreover, realtime global availability of these open sources from web-based bio-surveillance systems, such as HealthMap and WHO Disease Outbreak News (DONs) can aid in development of generic tools which will be applicable to a wide range of diseases (rare, endemic and emerging) across different regions of the world. In this dissertation, we explore various methods of using internet news reports to develop generic surveillance tools which can supplement traditional surveillance systems and aid in early detection of outbreaks. We primarily investigate three major problems related to infectious disease surveillance as follows. (i) Can trends in online news reporting monitor and possibly estimate infectious disease outbreaks? We introduce approaches that use temporal topic models over HealthMap corpus for detecting rare and endemic disease topics as well as capturing temporal trends (seasonality, abrupt peaks) for each disease topic. The discovery of temporal topic trends is followed by time-series regression techniques to estimate future disease incidence. (ii) In the second problem, we seek to automate the creation of epidemiological line lists for emerging diseases from WHO DONs in a near real-time setting. For this purpose, we formulate Guided Epidemiological Line List (GELL), an approach that combines neural word embeddings with information extracted from dependency parse-trees at the sentence level to extract line list features. (iii) Finally, for the third problem, we aim to characterize diseases automatically from HealthMap corpus using a disease-specific word embedding model which were subsequently evaluated against human curated ones for accuracies.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis item is protected by copyright and/or related rights. Some uses of this item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectInfectious Disease Surveillanceen_US
dc.subjectHealthMapen_US
dc.subjectWHO DONsen_US
dc.subjectTemporal Topic Modelingen_US
dc.subjectGuided Epidemiological Line Listen_US
dc.subjectWord Embeddingsen_US
dc.titleNews Analytics for Global Infectious Disease Surveillanceen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairRamakrishnan, Narendranen_US
dc.contributor.committeememberNsoesie, Elaine Okanyeneen_US
dc.contributor.committeememberLewis, Bryan Leroyen_US
dc.contributor.committeememberLu, Chang Tienen_US
dc.contributor.committeememberMarathe, Madhav Vishnuen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record