Show simple item record

dc.contributor.authorGupta, Raginien_US
dc.date.accessioned2014-07-25T08:00:10Z
dc.date.available2014-07-25T08:00:10Z
dc.date.issued2014-07-24en_US
dc.identifier.othervt_gsexam:3405en_US
dc.identifier.urihttp://hdl.handle.net/10919/49675
dc.description.abstractData quality has been an area of increasing interest for researchers in recent years due to the rapid emergence of 'big data' processes and applications. In this work, the data quality problem is viewed from the standpoint of synthetic information. Based on the structure and complexity of synthetic data, a need to have a data quality framework specific to it was realized. This thesis presents this framework along with implementation details and results of a large synthetic dataset to which the developed testing framework is applied. A formal conceptual framework was designed for assessing data quality of synthetic information. This framework involves developing analytical methods and software for assessing data quality for synthetic information. It includes dimensions of data quality that check the inherent properties of the data as well as evaluate it in the context of its use. The framework developed here is a software framework which is designed considering software design techniques like scalability, generality, integrability and modularity. A data abstraction layer has been introduced between the synthetic data and the tests. This abstraction layer has multiple benefits over direct access of the data by the tests. It decouples the tests from the data so that the details of storage and implementation are kept hidden from the user. We have implemented data quality measures for several quality dimensions: accuracy and precision, reliability, completeness, consistency, and validity. The particular tests and quality measures implemented span a range from low-level syntactic checks to high-level semantic quality measures. In each case, in addition to the results of the quality measure itself, we also present results on the computational performance (scalability) of the measure.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis Item is protected by copyright and/or related rights. Some uses of this Item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectData qualityen_US
dc.subjectSynthetic dataen_US
dc.subjectTestingen_US
dc.titleA Framework for Data Quality for Synthetic Informationen_US
dc.typeThesisen_US
dc.contributor.departmentIndustrial and Systems Engineeringen_US
dc.description.degreeMaster of Scienceen_US
thesis.degree.nameMaster of Scienceen_US
thesis.degree.levelmastersen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineIndustrial and Systems Engineeringen_US
dc.contributor.committeechairBish, Douglas R.en_US
dc.contributor.committeememberSwarup, Samarthen_US
dc.contributor.committeememberMarathe, Madhav Vishnuen_US
dc.contributor.committeememberFraticelli, Barbara M.P.en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record