A Framework for Data Quality for Synthetic Information

Gupta, Ragini

A Framework for Data Quality for Synthetic Information

dc.contributor.author	Gupta, Ragini	en
dc.contributor.committeechair	Bish, Douglas R.	en
dc.contributor.committeemember	Swarup, Samarth	en
dc.contributor.committeemember	Marathe, Madhav Vishnu	en
dc.contributor.committeemember	Fraticelli, Barbara M. P.	en
dc.contributor.department	Industrial and Systems Engineering	en
dc.date.accessioned	2014-07-25T08:00:10Z	en
dc.date.available	2014-07-25T08:00:10Z	en
dc.date.issued	2014-07-24	en
dc.description.abstract	Data quality has been an area of increasing interest for researchers in recent years due to the rapid emergence of 'big data' processes and applications. In this work, the data quality problem is viewed from the standpoint of synthetic information. Based on the structure and complexity of synthetic data, a need to have a data quality framework specific to it was realized. This thesis presents this framework along with implementation details and results of a large synthetic dataset to which the developed testing framework is applied. A formal conceptual framework was designed for assessing data quality of synthetic information. This framework involves developing analytical methods and software for assessing data quality for synthetic information. It includes dimensions of data quality that check the inherent properties of the data as well as evaluate it in the context of its use. The framework developed here is a software framework which is designed considering software design techniques like scalability, generality, integrability and modularity. A data abstraction layer has been introduced between the synthetic data and the tests. This abstraction layer has multiple benefits over direct access of the data by the tests. It decouples the tests from the data so that the details of storage and implementation are kept hidden from the user. We have implemented data quality measures for several quality dimensions: accuracy and precision, reliability, completeness, consistency, and validity. The particular tests and quality measures implemented span a range from low-level syntactic checks to high-level semantic quality measures. In each case, in addition to the results of the quality measure itself, we also present results on the computational performance (scalability) of the measure.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:3405	en
dc.identifier.uri	http://hdl.handle.net/10919/49675	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Data quality	en
dc.subject	Synthetic data	en
dc.subject	Testing	en
dc.title	A Framework for Data Quality for Synthetic Information	en
dc.type	Thesis	en
thesis.degree.discipline	Industrial and Systems Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Gupta_R_T_2014.pdf
Size:: 1.86 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses