A Semantic Web-Based Digital Library Infrastructure to Facilitate Computational Epidemiology

dc.contributor.authorHasan, S. M. Shamimulen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeechairMarathe, Madhav Vishnuen
dc.contributor.committeememberGupta, Sandeepen
dc.contributor.committeememberTilevich, Elien
dc.contributor.committeememberLeidig, Jonathan P.en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2019-03-10T07:00:39Zen
dc.date.available2019-03-10T07:00:39Zen
dc.date.issued2017-09-15en
dc.description.abstractComputational epidemiology generates and utilizes massive amounts of data. There are two primary categories of datasets: reported and synthetic. Reported data include epidemic data published by organizations (e.g., WHO, CDC, other national ministries and departments of health) during and following actual outbreaks, while synthetic datasets are comprised of spatially explicit synthetic populations, labeled social contact networks, multi-cell statistical experiments, and output data generated from the execution of computer simulation experiments. The discipline of computational epidemiology encounters numerous challenges because of the size, volume, and dynamic nature of both types of these datasets. In this dissertation, we present semantic web-based schemas to organize diverse reported and synthetic computational epidemiology datasets. There are three layers of these schemas: conceptual, logical, and physical. The conceptual layer provides data abstraction by exposing common entities and properties to the end user. The logical layer captures data fragmentation and linking aspects of the datasets. The physical layer covers storage aspects of the datasets. We can create mapping files from the schemas. The schemas are flexible and can grow. The schemas presented include data linking approaches that can connect large-scale and widely varying epidemic datasets. This linked data leads to an integrated knowledge-base, enabling an epidemiologist to ask complex queries that employ multiple datasets. We demonstrate the utility of our knowledge-base by developing a query bank, which represents typical analyses carried out by an epidemiologist during the course of planning for or responding to an epidemic. By running queries with different data mapping techniques, we demonstrate the performance of various tools. The empirical results show that leveraging semantic web technology is an effective strategy for: reasoning over multiple datasets simultaneously, developing network queries pertinent in an epidemic analysis, and conducting realistic studies undertaken in an epidemic investigation. The performance of queries varies according to the choice of hardware, underlying database, and resource description framework (RDF) engine. We provide application programming interfaces (APIs) on top of our linked datasets, which an epidemiologist can use for information retrieval, without knowing much about underlying datasets. The proposed semantic web-based digital library infrastructure can be highly beneficial for epidemiologists as they work to comprehend disease propagation for timely outbreak detection and efficient disease control activities.en
dc.description.abstractgeneralComputational epidemiology generates and utilizes massive amounts of data, and the field faces numerous challenges because of the volume and dynamic nature of the datasets utilized. There are two primary categories of datasets. The first contains epidemic datasets tracking actual outbreaks of disease, which are reported by governments, private companies, and associated parties. The second category is synthetic data created through computer simulation. We present semantic web-based schemas to organize diverse reported and synthetic computational epidemiology datasets. The schemas are flexible in use and scale, and utilize data linking approaches that can connect large-scale and widely varying epidemic datasets. This linked data leads to an integrated knowledge-base, enabling an epidemiologist to ask complex queries that employ multiple datasets. This ability helps epidemiologists better understand disease propagation, for efficient outbreak detection and disease control activities.en
dc.description.degreePHDen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:12719en
dc.identifier.urihttp://hdl.handle.net/10919/88386en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectEpidemiologyen
dc.subjectInformation Retrievalen
dc.subjectSchemaen
dc.subjectSemantic Weben
dc.titleA Semantic Web-Based Digital Library Infrastructure to Facilitate Computational Epidemiologyen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePHDen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hasan_S_D_2017.pdf
Size:
5.23 MB
Format:
Adobe Portable Document Format