Data Analytics and Machine Learning Applications in Fermentation Processes and Molecular Property Prediction
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Multivariate data analytics (MVDA) and machine learning (ML) have been playing a crucial role in bioprocesses and molecular property prediction. Our study encompasses three main aspects: 1) using data analytics to analyze the occurrence of foaming in batch fermentation processes using multiway partial least square (MPLS) approaches; 2) using hyperparameter optimization methods in deep learning for the improvement of molecular property prediction, and 3) using machine learning models to predict and reduce contamination risk.
For the first project, MPLS methods are used to develop interpretative correlation models to monitor the foaming occurrence and, hence, improve batch fermentation. The exhaust differential pressure is chosen as a quality variable to quantify the foaming occurrence and considers three-dimensional datasets of different batches, process variables, and measurements. Batch-wise unfolding (BWU) and observation-wise unfolding (OWU) of plant datasets are also integrated with standard, dynamic, and kernel PLS methods. The results show that dynamic PLS (DPLS) with OWU and time-lagged quality variables is the most efficient, accurate, and easy to implement. The BWU approach is useful for analyzing the differences between batches and identifying abnormalities and outliers, while the OWU quantifies the variation within a batch. With OWU, the DPLS method with one unit of time lag in the quality variable is the most effective, accurate, and easy to implement. With both BWU and OWU, the quantitative effects of process variables on the quality variable are identified and then used to guide to improve fermentation performance.
The second project presents a methodology for hyperparameter optimization (HPO) in deep neural networks for accurate and efficient molecular property prediction (MPP). Most prior applications of deep neural networks for MPP have paid only limited or no attention at all to HPO. Thus resulting in suboptimal values of predicted properties. To improve the efficiency and accuracy of deep learning models for MPP, we must optimize as many hyperparameters as possible and choose a software platform to enable the parallel execution of HPO. This project compares the random search, Bayesian optimization, and hyperband algorithms, together with the Bayesian-hyperband combination within the software packages of Kernas Turner and Optuna for HPO. In the end, the conclusion is that the hyperband algorithm, which has not been used in previous MPP studies, is most computationally efficient; it gives MPP results that are optimal or nearly optimal in terms of prediction accuracy. Based on two case studies, the use of the Python library Kernas Turner for HPO is recommended.
Last but not least, the third project demonstrates an accurate and efficient methodology for fermentation contamination detection and reduction using machine learning methods. We identify two different machine learning methods including one-class support vector machine (OCSVM) and autoencoders (AEs), optimize as many hyperparameters as possible, and choose an open, user-friendly, and powerful Python platform called Optuna, a software platform to enable the parallel execution of hyperparameter optimization (HPO). We recommend using Bayesian optimization with a hyperband HPO algorithm to carry out comprehensive HPO. Results show that we have been able to predict contaminated fermentation batches with recall up to 1.0 without sacrificing the precision and specificity of non-contaminated batches, which read up to 0.958 and 0.996, respectively. OCSVM outperforms AEs in terms of precision and specificity even though they both achieve an outstanding recall of 1.0. Lastly, we identify important independent variables contributing to the contaminated batches and give recommendations on how to regulate them to reduce the likelihood of contamination.