Privacy-Preserving Synthetic Medical Data Generation with Deep Learning

Torfi, Amirsina

Privacy-Preserving Synthetic Medical Data Generation with Deep Learning

Files

Torfi_A_D_2020.pdf (2.69 MB)

Downloads: 3192

Date

2020-08-26

Authors

Torfi, Amirsina

Publisher

Virginia Tech

Abstract

Deep learning models demonstrated good performance in various domains such as ComputerVision and Natural Language Processing. However, the utilization of data-driven methods in healthcare raises privacy concerns, which creates limitations for collaborative research. A remedy to this problem is to generate and employ synthetic data to address privacy concerns. Existing methods for artificial data generation suffer from different limitations, such as being bound to particular use cases. Furthermore, their generalizability to real-world problems is controversial regarding the uncertainties in defining and measuring key realistic characteristics. Hence, there is a need to establish insightful metrics (and to measure the validity of synthetic data), as well as quantitative criteria regarding privacy restrictions. We propose the use of Generative Adversarial Networks to help satisfy requirements for realistic characteristics and acceptable values of privacy metrics, simultaneously. The present study makes several unique contributions to synthetic data generation in the healthcare domain. First, we propose a novel domain-agnostic metric to evaluate the quality of synthetic data. Second, by utilizing 1-D Convolutional Neural Networks, we devise a new approach to capturing the correlation between adjacent diagnosis records. Third, we employ ConvolutionalAutoencoders for creating a robust and compact feature space to handle the mixture of discrete and continuous data. Finally, we devise a privacy-preserving framework that enforcesRényi differential privacy as a new notion of differential privacy.

Keywords

Deep learning, healthcare, synthetic data generation, generative adversarial networks, privacy.

Persistent link

http://hdl.handle.net/10919/99856

Collections

Doctoral Dissertations

Full item page

Privacy-Preserving Synthetic Medical Data Generation with Deep Learning

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections