A Framework for Data Quality for Synthetic Information

Gupta, Ragini

A Framework for Data Quality for Synthetic Information

Files

Gupta_R_T_2014.pdf (1.86 MB)

Downloads: 1303

Date

2014-07-24

Authors

Gupta, Ragini

Publisher

Virginia Tech

Abstract

Data quality has been an area of increasing interest for researchers in recent years due to the rapid emergence of 'big data' processes and applications. In this work, the data quality problem is viewed from the standpoint of synthetic information. Based on the structure and complexity of synthetic data, a need to have a data quality framework specific to it was realized. This thesis presents this framework along with implementation details and results of a large synthetic dataset to which the developed testing framework is applied. A formal conceptual framework was designed for assessing data quality of synthetic information. This framework involves developing analytical methods and software for assessing data quality for synthetic information. It includes dimensions of data quality that check the inherent properties of the data as well as evaluate it in the context of its use. The framework developed here is a software framework which is designed considering software design techniques like scalability, generality, integrability and modularity. A data abstraction layer has been introduced between the synthetic data and the tests. This abstraction layer has multiple benefits over direct access of the data by the tests. It decouples the tests from the data so that the details of storage and implementation are kept hidden from the user. We have implemented data quality measures for several quality dimensions: accuracy and precision, reliability, completeness, consistency, and validity. The particular tests and quality measures implemented span a range from low-level syntactic checks to high-level semantic quality measures. In each case, in addition to the results of the quality measure itself, we also present results on the computational performance (scalability) of the measure.

Keywords

Data quality, Synthetic data, Testing

Persistent link

http://hdl.handle.net/10919/49675

Collections

Masters Theses

Full item page

A Framework for Data Quality for Synthetic Information

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections