Machine Learning and Multivariate Statistics for Optimizing Bioprocessing and Polyolefin Manufacturing
Chemical engineers have routinely used computational tools for modeling, optimizing, and debottlenecking chemical processes. Because of the advances in computational science over the past decade, multivariate statistics and machine learning have become an integral part of the computerization of chemical processes. In this research, we look into using multivariate statistics, machine learning tools, and their combinations through a series of case studies including a case with a successful industrial deployment of machine learning models for fermentation. We use both commercially-available software tools, Aspen ProMV and Python, to demonstrate the feasibility of the computational tools.
This work demonstrates a novel application of ensemble-based machine learning methods in bioprocessing, particularly for the prediction of different fermenter types in a fermentation process (to allow for successful data integration) and the prediction of the onset of foaming. We apply two ensemble frameworks, Extreme Gradient Boosting (XGBoost) and Random Forest (RF), to build classification and regression models. Excessive foaming can interfere with the mixing of reactants and lead to problems, such as decreasing effective reactor volume, microbial contamination, product loss, and increased reaction time. Physical modeling of foaming is an arduous process as it requires estimation of foam height, which is dynamic in nature and varies for different processes.
In addition to foaming prediction, we extend our work to control and prevent foaming by allowing data-driven ad hoc addition of antifoam using exhaust differential pressure as an indicator of foaming. We use large-scale real fermentation data for six different types of sporulating microorganisms to predict foaming over multiple strains of microorganisms and build exploratory time-series driven antifoam profiles for four different fermenter types. In order to successfully predict the antifoam addition from the large-scale multivariate dataset (about half a million instances for 163 batches), we use TPOT (Tree-based Pipeline Optimization Tool), an automated genetic programming algorithm, to find the best pipeline from 600 other pipelines. Our antifoam profiles are able to decrease hourly volume retention by over 53% for a specific fermenter. A decrease in hourly volume retention leads to an increase in fermentation product yield.
We also study two different cases associated with the manufacturing of polyolefins, particularly LDPE (low-density polyethylene) and HDPE (high-density polyethylene). Through these cases, we showcase the usage of machine learning and multivariate statistical tools to improve process understanding and enhance the predictive capability for process optimization.
By using indirect measurements such as temperature profiles, we demonstrate the viability of such measures in the prediction of polyolefin quality parameters, anomaly detection, and statistical monitoring and control of the chemical processes associated with a LDPE plant. We use dimensionality reduction, visualization tools, and regression analysis to achieve our goals. Using advanced analytical tools and a combination of algorithms such as PCA (Principal Component Analysis), PLS (Partial Least Squares), Random Forest, etc., we identify predictive models that can be used to create inferential schemes.
Soft-sensors are widely used for on-line monitoring and real-time prediction of process variables. In one of our cases, we use advanced machine learning algorithms to predict the polymer melt index, which is crucial in determining the product quality of polymers. We use real industrial data from one of the leading chemical engineering companies in the Asia-Pacific region to build a predictive model for a HDPE plant. Lastly, we show an end-to-end workflow for deep learning on both industrial and simulated polyolefin datasets.
Thus, using these five cases, we explore the usage of advanced machine learning and multivariate statistical techniques in the optimization of chemical and biochemical processes. The recent advances in computational hardware allow engineers to design such data-driven models, which enhances their capacity to effectively and efficiently monitor and control a process. We showcase that even non-expert chemical engineers can implement such machine learning algorithms with ease using open-source or commercially available software tools.