Privacy-Preserving Synthetic Medical Data Generation with Deep Learning

Torfi, Amirsina

Privacy-Preserving Synthetic Medical Data Generation with Deep Learning

dc.contributor.author	Torfi, Amirsina	en
dc.contributor.committeechair	Fox, Edward A.	en
dc.contributor.committeemember	Greene, Casey Stephen	en
dc.contributor.committeemember	Tegge, Allison	en
dc.contributor.committeemember	Reddy, Chandan K.	en
dc.contributor.committeemember	Yao, Danfeng (Daphne)	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2020-08-27T08:00:32Z	en
dc.date.available	2020-08-27T08:00:32Z	en
dc.date.issued	2020-08-26	en
dc.description.abstract	Deep learning models demonstrated good performance in various domains such as ComputerVision and Natural Language Processing. However, the utilization of data-driven methods in healthcare raises privacy concerns, which creates limitations for collaborative research. A remedy to this problem is to generate and employ synthetic data to address privacy concerns. Existing methods for artificial data generation suffer from different limitations, such as being bound to particular use cases. Furthermore, their generalizability to real-world problems is controversial regarding the uncertainties in defining and measuring key realistic characteristics. Hence, there is a need to establish insightful metrics (and to measure the validity of synthetic data), as well as quantitative criteria regarding privacy restrictions. We propose the use of Generative Adversarial Networks to help satisfy requirements for realistic characteristics and acceptable values of privacy metrics, simultaneously. The present study makes several unique contributions to synthetic data generation in the healthcare domain. First, we propose a novel domain-agnostic metric to evaluate the quality of synthetic data. Second, by utilizing 1-D Convolutional Neural Networks, we devise a new approach to capturing the correlation between adjacent diagnosis records. Third, we employ ConvolutionalAutoencoders for creating a robust and compact feature space to handle the mixture of discrete and continuous data. Finally, we devise a privacy-preserving framework that enforcesRényi differential privacy as a new notion of differential privacy.	en
dc.description.abstractgeneral	Computers programs have been widely used for clinical diagnosis but are often designed with assumptions limiting their scalability and interoperability. The recent proliferation of abundant health data, significant increases in computer processing power, and superior performance of data-driven methods enable a trending paradigm shift in healthcare technology. This involves the adoption of artificial intelligence methods, such as deep learning, to improve healthcare knowledge and practice. Despite the success in using deep learning in many different domains, in the healthcare field, privacy challenges make collaborative research difficult, as working with data-driven methods may jeopardize patients' privacy. To overcome these challenges, researchers propose to generate and utilize realistic synthetic data that can be used instead of real private data. Existing methods for artificial data generation are limited by being bound to special use cases. Furthermore, their generalizability to real-world problems is questionable. There is a need to establish valid synthetic data that overcomes privacy restrictions and functions as a real-world analog for healthcare deep learning data training. We propose the use of Generative Adversarial Networks to simultaneously overcome the realism and privacy challenges associated with healthcare data.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:27189	en
dc.identifier.uri	http://hdl.handle.net/10919/99856	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Deep learning	en
dc.subject	healthcare	en
dc.subject	synthetic data generation	en
dc.subject	generative adversarial networks	en
dc.subject	privacy.	en
dc.title	Privacy-Preserving Synthetic Medical Data Generation with Deep Learning	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Torfi_A_D_2020.pdf
Size:: 2.69 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations