Data Analytics and Machine Learning Applications in Fermentation Processes and Molecular Property Prediction

Nguyen, Xuan Dung

Data Analytics and Machine Learning Applications in Fermentation Processes and Molecular Property Prediction

dc.contributor.author	Nguyen, Xuan Dung	en
dc.contributor.committeechair	Liu, Yih-An	en
dc.contributor.committeechair	Deshmukh, Sanket A.	en
dc.contributor.committeemember	McDowell, Christopher Carroll	en
dc.contributor.committeemember	Wrenn, Steven Parker	en
dc.contributor.department	Chemical Engineering	en
dc.date.accessioned	2025-05-24T08:03:35Z	en
dc.date.available	2025-05-24T08:03:35Z	en
dc.date.issued	2025-05-23	en
dc.description.abstract	Multivariate data analytics (MVDA) and machine learning (ML) have been playing a crucial role in bioprocesses and molecular property prediction. Our study encompasses three main aspects: 1) using data analytics to analyze the occurrence of foaming in batch fermentation processes using multiway partial least square (MPLS) approaches; 2) using hyperparameter optimization methods in deep learning for the improvement of molecular property prediction, and 3) using machine learning models to predict and reduce contamination risk. For the first project, MPLS methods are used to develop interpretative correlation models to monitor the foaming occurrence and, hence, improve batch fermentation. The exhaust differential pressure is chosen as a quality variable to quantify the foaming occurrence and considers three-dimensional datasets of different batches, process variables, and measurements. Batch-wise unfolding (BWU) and observation-wise unfolding (OWU) of plant datasets are also integrated with standard, dynamic, and kernel PLS methods. The results show that dynamic PLS (DPLS) with OWU and time-lagged quality variables is the most efficient, accurate, and easy to implement. The BWU approach is useful for analyzing the differences between batches and identifying abnormalities and outliers, while the OWU quantifies the variation within a batch. With OWU, the DPLS method with one unit of time lag in the quality variable is the most effective, accurate, and easy to implement. With both BWU and OWU, the quantitative effects of process variables on the quality variable are identified and then used to guide to improve fermentation performance. The second project presents a methodology for hyperparameter optimization (HPO) in deep neural networks for accurate and efficient molecular property prediction (MPP). Most prior applications of deep neural networks for MPP have paid only limited or no attention at all to HPO. Thus resulting in suboptimal values of predicted properties. To improve the efficiency and accuracy of deep learning models for MPP, we must optimize as many hyperparameters as possible and choose a software platform to enable the parallel execution of HPO. This project compares the random search, Bayesian optimization, and hyperband algorithms, together with the Bayesian-hyperband combination within the software packages of Kernas Turner and Optuna for HPO. In the end, the conclusion is that the hyperband algorithm, which has not been used in previous MPP studies, is most computationally efficient; it gives MPP results that are optimal or nearly optimal in terms of prediction accuracy. Based on two case studies, the use of the Python library Kernas Turner for HPO is recommended. Last but not least, the third project demonstrates an accurate and efficient methodology for fermentation contamination detection and reduction using machine learning methods. We identify two different machine learning methods including one-class support vector machine (OCSVM) and autoencoders (AEs), optimize as many hyperparameters as possible, and choose an open, user-friendly, and powerful Python platform called Optuna, a software platform to enable the parallel execution of hyperparameter optimization (HPO). We recommend using Bayesian optimization with a hyperband HPO algorithm to carry out comprehensive HPO. Results show that we have been able to predict contaminated fermentation batches with recall up to 1.0 without sacrificing the precision and specificity of non-contaminated batches, which read up to 0.958 and 0.996, respectively. OCSVM outperforms AEs in terms of precision and specificity even though they both achieve an outstanding recall of 1.0. Lastly, we identify important independent variables contributing to the contaminated batches and give recommendations on how to regulate them to reduce the likelihood of contamination.	en
dc.description.abstractgeneral	Biotechnology processes, like fermentation used in producing medicines or food, can be complex and sensitive to problems like contamination or foaming. In our research, we used advanced data analysis and machine learning to make these processes more reliable and efficient. Our work focused on three main goals. First, we studied how to detect and prevent excessive foaming during fermentation, which can disrupt production. By analyzing data collected during the process, we built models that help predict when foaming will happen and which conditions cause it so that manufacturers can take action before problems arise. Second, we worked on improving how computers predict the behavior of molecules, which is important in drug discovery and other chemical industries. We found that a smart way of tuning the settings in deep learning models – using a method called hyperband – gave faster and more accurate results than older approaches. Finally, we tackled the problem of contamination in fermentation. Contamination can ruin entire batches, so early detection is critical. Using machine learning, we were able to identify contaminated batches with very high accuracy. We recommend specific methods, like autoencoders and support vector machines, that work especially well with the complex nature of fermentation. Our models also revealed which factors are most likely to lead to contamination, providing helpful guidance for preventing it in the future. Overall, our study shows how modern data science tools can solve real-world problems in biotechnology, leading to safer, cleaner, and more productive processes.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:43799	en
dc.identifier.uri	https://hdl.handle.net/10919/134219	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	en
dc.subject	multivariate statistics	en
dc.subject	machine learning	en
dc.subject	fermentation	en
dc.title	Data Analytics and Machine Learning Applications in Fermentation Processes and Molecular Property Prediction	en
dc.type	Dissertation	en
thesis.degree.discipline	Chemical Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en