Synthetic Electronic Medical Record Generation using Generative Adversarial Networks

Files
TR Number
Date
2021-08-13
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

It has been a while that computers have replaced our record books, and medical records are no exception. Electronic Health Records (EHR) are digital version of a patient's medical records. EHRs are available to authorized users, and they contain the medical records of the patient, which should help doctors understand a patient's condition quickly. In recent years, Deep Learning models have proved their value and have become state-of-the-art in computer vision, natural language processing, speech and other areas. The private nature of EHR data has prevented public access to EHR datasets. There are many obstacles to create a deep learning model with EHR data. Because EHR data are primarily consisting of huge sparse matrices, these challenges are mostly unique to this field. Due to this, research in this area is limited, and we can improve existing research substantially. In this study, we focus on high-performance synthetic data generation in EHR datasets. Artificial data generation can help reduce privacy leakage for dataset owners as it is proven that de-identification methods are prone to re-identification attacks. We propose a novel approach we call Improved Correlation Capturing Wasserstein Generative Adversarial Network (SCorGAN) to create EHR data. This work, leverages Deep Convolutional Neural Networks to extract and understand spatial dependencies in EHR data. To improve our model's performance, we focus on our Deep Convolutional AutoEncoder to better map our real EHR data to our latent space where we train the Generator. To assess our model's performance, we demonstrate that our generative model can create excellent data that are statistically close to the input dataset. Additionally, we evaluate our synthetic dataset against the original data using our previous work that focused on GAN Performance Evaluation. This work is publicly available at https://github.com/mohibeyki/SCorGAN

Description
Keywords
Deep learning (Machine learning), Healthcare, Generative Adversarial Networks
Citation
Collections