A Framework for Data Quality for Synthetic Information

dc.contributor.authorGupta, Raginien
dc.contributor.committeechairBish, Douglas R.en
dc.contributor.committeememberSwarup, Samarthen
dc.contributor.committeememberMarathe, Madhav Vishnuen
dc.contributor.committeememberFraticelli, Barbara M. P.en
dc.contributor.departmentIndustrial and Systems Engineeringen
dc.date.accessioned2014-07-25T08:00:10Zen
dc.date.available2014-07-25T08:00:10Zen
dc.date.issued2014-07-24en
dc.description.abstractData quality has been an area of increasing interest for researchers in recent years due to the rapid emergence of 'big data' processes and applications. In this work, the data quality problem is viewed from the standpoint of synthetic information. Based on the structure and complexity of synthetic data, a need to have a data quality framework specific to it was realized. This thesis presents this framework along with implementation details and results of a large synthetic dataset to which the developed testing framework is applied. A formal conceptual framework was designed for assessing data quality of synthetic information. This framework involves developing analytical methods and software for assessing data quality for synthetic information. It includes dimensions of data quality that check the inherent properties of the data as well as evaluate it in the context of its use. The framework developed here is a software framework which is designed considering software design techniques like scalability, generality, integrability and modularity. A data abstraction layer has been introduced between the synthetic data and the tests. This abstraction layer has multiple benefits over direct access of the data by the tests. It decouples the tests from the data so that the details of storage and implementation are kept hidden from the user. We have implemented data quality measures for several quality dimensions: accuracy and precision, reliability, completeness, consistency, and validity. The particular tests and quality measures implemented span a range from low-level syntactic checks to high-level semantic quality measures. In each case, in addition to the results of the quality measure itself, we also present results on the computational performance (scalability) of the measure.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:3405en
dc.identifier.urihttp://hdl.handle.net/10919/49675en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectData qualityen
dc.subjectSynthetic dataen
dc.subjectTestingen
dc.titleA Framework for Data Quality for Synthetic Informationen
dc.typeThesisen
thesis.degree.disciplineIndustrial and Systems Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gupta_R_T_2014.pdf
Size:
1.86 MB
Format:
Adobe Portable Document Format

Collections