Synthetic Data Generation and Sampling for Online Training of DNN in Manufacturing Supervised Learning Problems

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


The deployment of Industrial Internet offers abundant passive data from manufacturing systems and networks, which enables data-driven modeling with high-data-demand, advanced statistical models such as Deep Neural Networks (DNNs). Deep Neural Networks (DNNs) have proven to be remarkably effective in supervised learning in critical manufacturing applications, such as AI-enabled automatic inspection, quality modeling, etc. However, there is a lack of performance guarantee of DNN models primarily due to data class imbalance, shifting distribution, multi-modality variables (e.g., time series and images) in training and testing datasets collected in manufacturing. Moreover, implementing these models on the manufacturing shop floor is difficult due to limitations in human-machine interaction. Inspired by active data generation through Design of Experiments (DoE) and passive observational data collection for manufacturing data analytics, we propose a SynthetIc Data gEneration and Sampling (SIDES) framework with a Graphical User Interface named SIDESync. This framework is designed to streamline SIDES execution within manufacturing environments, to provide adequate DNN model performance through the improvement of training data preparation and enhancing human-machine interaction. In the SIDES framework, a bi-level Hierarchical Contextual Bandits is proposed to provide a scientific way to integrate DoE and observational data sampling, which optimizes DNNs' online learning performance. Multimodality-aligned variational Autoencoder transforms the multimodal predictors from manufacturing into a shared low-dimensional latent space for controlled data generation from DoE and effective sampling from observational data. The SIDESync Graphical User Interface (GUI), developed using the Streamlit library in Python, simplifies the configuration, monitoring, and analysis of SIDES experiments. This streamlined approach facilitates access to the SIDES framework and enhances human-machine interaction capabilities. The merits of SIDES are evaluated by a real case study of printed electronics with a binary multimodal data classification problem. Results show the advantages of the cost-effective integration of DoE in improving the DNNs' online learning performance.



Data Generation, Data Sampling, Deep Neural Networks, Industrial Internet