Data Exchange for Artificial Intelligence Incubation in Manufacturing Industrial Internet
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Industrial Cyber-physical Systems (ICPSs) connect industrial equipment and manufacturing processes via ubiquitous sensors, actuators, and computer units, forming the Manufacturing Industrial Internet (MII). With the data generated from MII, Artificial Intelligence (AI) greatly advances the data-driven decision making for manufacturing efficiency, quality improvement, and cost reduction. However, data with poor quality have posed significant challenges to the incubation (i.e., training, validation, and deployment) of AI models. In the offline training phase, training data with poor quality will result in inaccurate AI models. In the online training and deployment phases, high-volume and informative-poor data lead to the discrepancy of the AI modeling performance in different phases, and also lead to high communication and computation workload, and high cost in data acquisition and storage. In the incubation of AI models for multiple manufacturing stages or systems, exchanging and sharing datasets can significantly improve the efficiency of data collection for single manufacturing enterprise, and improve the quality of training datasets. However, inaccurate estimation of the value of datasets can cause ineffective dataset exchange and hamper the scaling up of AI systems. High-quality and high-value data not only enhance the modeling performance during AI incubation, but also contribute to effective data exchange for potential synergistic intelligence in MII. Therefore, it is important to assess and ensure the data quality in terms of its value for AI models. In this dissertation, our ultimate goal is to establish a data exchange paradigm to provide high-quality and high-value data for AI incubation in MII. To achieve the goal, three research tasks are proposed for different phases in AI incubation: (1) a prediction-oriented data generation method to actively generate highly informative data in the offline training phase for high prediction performance (Chapter 2); (2) an ensemble active learning by contextual bandits framework for acquisition and evaluation of passively collected online data for the continuous improvement and resilient modeling performance during the online training and deployment phases (Chapter 3); and (3) a context-aware, performance-oriented, and privacy-preserving dataset-sharing framework to efficiently share and exchange small-but-high-quality datasets between trusted stakeholders to allow their on-demand usage (Chapter 4). All the proposed methodologies have been evaluated and validated through simulation studies and applications to real manufacturing case studies. In Chapter 5, the contribution of the work is summarized and the future research directions are proposed.