dc.contributor.author | Ghosh, Saurav | en |
dc.date.accessioned | 2017-11-30T09:00:26Z | en |
dc.date.available | 2017-11-30T09:00:26Z | en |
dc.date.issued | 2017-11-29 | en |
dc.identifier.other | vt_gsexam:13231 | en |
dc.identifier.uri | http://hdl.handle.net/10919/80574 | en |
dc.description.abstract | Traditional disease surveillance can be augmented with a wide variety of open sources, such
as online news media, twitter, blogs, and web search records. Rapidly increasing volumes of
these open sources are proving to be extremely valuable resources in helping analyze, detect,
and forecast outbreaks of infectious diseases, especially new diseases or diseases spreading
to new regions. However, these sources are in general unstructured (noisy) and construction
of surveillance tools ranging from real-time disease outbreak monitoring to construction of
epidemiological line lists involves considerable human supervision. Intelligent modeling of
such sources using text mining methods such as, topic models, deep learning and dependency
parsing can lead to automated generation of the mentioned surveillance tools. Moreover, realtime
global availability of these open sources from web-based bio-surveillance systems, such
as HealthMap and WHO Disease Outbreak News (DONs) can aid in development of generic
tools which will be applicable to a wide range of diseases (rare, endemic and emerging) across
different regions of the world.
In this dissertation, we explore various methods of using internet news reports to develop
generic surveillance tools which can supplement traditional surveillance systems and aid in
early detection of outbreaks. We primarily investigate three major problems related to infectious
disease surveillance as follows. (i) Can trends in online news reporting monitor and
possibly estimate infectious disease outbreaks? We introduce approaches that use temporal
topic models over HealthMap corpus for detecting rare and endemic disease topics as well as
capturing temporal trends (seasonality, abrupt peaks) for each disease topic. The discovery
of temporal topic trends is followed by time-series regression techniques to estimate future
disease incidence. (ii) In the second problem, we seek to automate the creation of epidemiological
line lists for emerging diseases from WHO DONs in a near real-time setting. For
this purpose, we formulate Guided Epidemiological Line List (GELL), an approach that
combines neural word embeddings with information extracted from dependency parse-trees
at the sentence level to extract line list features. (iii) Finally, for the third problem, we
aim to characterize diseases automatically from HealthMap corpus using a disease-specific
word embedding model which were subsequently evaluated against human curated ones for
accuracies. | en |
dc.format.medium | ETD | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Infectious Disease Surveillance | en |
dc.subject | HealthMap | en |
dc.subject | WHO DONs | en |
dc.subject | Temporal Topic Modeling | en |
dc.subject | Guided Epidemiological Line List | en |
dc.subject | Word Embeddings | en |
dc.title | News Analytics for Global Infectious Disease Surveillance | en |
dc.type | Dissertation | en |
dc.contributor.department | Computer Science | en |
dc.description.degree | Ph. D. | en |
thesis.degree.name | Ph. D. | en |
thesis.degree.level | doctoral | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.discipline | Computer Science and Applications | en |
dc.contributor.committeechair | Ramakrishnan, Naren | en |
dc.contributor.committeemember | Nsoesie, Elaine Okanyene | en |
dc.contributor.committeemember | Lewis, Bryan L. | en |
dc.contributor.committeemember | Lu, Chang Tien | en |
dc.contributor.committeemember | Marathe, Madhav Vishnu | en |