Data Analytics and Machine Learning Applications in Fermentation Processes and Molecular Property Prediction

dc.contributor.authorNguyen, Xuan Dungen
dc.contributor.committeechairLiu, Yih-Anen
dc.contributor.committeechairDeshmukh, Sanket A.en
dc.contributor.committeememberMcDowell, Christopher Carrollen
dc.contributor.committeememberWrenn, Steven Parkeren
dc.contributor.departmentChemical Engineeringen
dc.date.accessioned2025-05-24T08:03:35Zen
dc.date.available2025-05-24T08:03:35Zen
dc.date.issued2025-05-23en
dc.description.abstractMultivariate data analytics (MVDA) and machine learning (ML) have been playing a crucial role in bioprocesses and molecular property prediction. Our study encompasses three main aspects: 1) using data analytics to analyze the occurrence of foaming in batch fermentation processes using multiway partial least square (MPLS) approaches; 2) using hyperparameter optimization methods in deep learning for the improvement of molecular property prediction, and 3) using machine learning models to predict and reduce contamination risk. For the first project, MPLS methods are used to develop interpretative correlation models to monitor the foaming occurrence and, hence, improve batch fermentation. The exhaust differential pressure is chosen as a quality variable to quantify the foaming occurrence and considers three-dimensional datasets of different batches, process variables, and measurements. Batch-wise unfolding (BWU) and observation-wise unfolding (OWU) of plant datasets are also integrated with standard, dynamic, and kernel PLS methods. The results show that dynamic PLS (DPLS) with OWU and time-lagged quality variables is the most efficient, accurate, and easy to implement. The BWU approach is useful for analyzing the differences between batches and identifying abnormalities and outliers, while the OWU quantifies the variation within a batch. With OWU, the DPLS method with one unit of time lag in the quality variable is the most effective, accurate, and easy to implement. With both BWU and OWU, the quantitative effects of process variables on the quality variable are identified and then used to guide to improve fermentation performance. The second project presents a methodology for hyperparameter optimization (HPO) in deep neural networks for accurate and efficient molecular property prediction (MPP). Most prior applications of deep neural networks for MPP have paid only limited or no attention at all to HPO. Thus resulting in suboptimal values of predicted properties. To improve the efficiency and accuracy of deep learning models for MPP, we must optimize as many hyperparameters as possible and choose a software platform to enable the parallel execution of HPO. This project compares the random search, Bayesian optimization, and hyperband algorithms, together with the Bayesian-hyperband combination within the software packages of Kernas Turner and Optuna for HPO. In the end, the conclusion is that the hyperband algorithm, which has not been used in previous MPP studies, is most computationally efficient; it gives MPP results that are optimal or nearly optimal in terms of prediction accuracy. Based on two case studies, the use of the Python library Kernas Turner for HPO is recommended. Last but not least, the third project demonstrates an accurate and efficient methodology for fermentation contamination detection and reduction using machine learning methods. We identify two different machine learning methods including one-class support vector machine (OCSVM) and autoencoders (AEs), optimize as many hyperparameters as possible, and choose an open, user-friendly, and powerful Python platform called Optuna, a software platform to enable the parallel execution of hyperparameter optimization (HPO). We recommend using Bayesian optimization with a hyperband HPO algorithm to carry out comprehensive HPO. Results show that we have been able to predict contaminated fermentation batches with recall up to 1.0 without sacrificing the precision and specificity of non-contaminated batches, which read up to 0.958 and 0.996, respectively. OCSVM outperforms AEs in terms of precision and specificity even though they both achieve an outstanding recall of 1.0. Lastly, we identify important independent variables contributing to the contaminated batches and give recommendations on how to regulate them to reduce the likelihood of contamination.en
dc.description.abstractgeneralBiotechnology processes, like fermentation used in producing medicines or food, can be complex and sensitive to problems like contamination or foaming. In our research, we used advanced data analysis and machine learning to make these processes more reliable and efficient. Our work focused on three main goals. First, we studied how to detect and prevent excessive foaming during fermentation, which can disrupt production. By analyzing data collected during the process, we built models that help predict when foaming will happen and which conditions cause it so that manufacturers can take action before problems arise. Second, we worked on improving how computers predict the behavior of molecules, which is important in drug discovery and other chemical industries. We found that a smart way of tuning the settings in deep learning models – using a method called hyperband – gave faster and more accurate results than older approaches. Finally, we tackled the problem of contamination in fermentation. Contamination can ruin entire batches, so early detection is critical. Using machine learning, we were able to identify contaminated batches with very high accuracy. We recommend specific methods, like autoencoders and support vector machines, that work especially well with the complex nature of fermentation. Our models also revealed which factors are most likely to lead to contamination, providing helpful guidance for preventing it in the future. Overall, our study shows how modern data science tools can solve real-world problems in biotechnology, leading to safer, cleaner, and more productive processes.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:43799en
dc.identifier.urihttps://hdl.handle.net/10919/134219en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectmultivariate statisticsen
dc.subjectmachine learningen
dc.subjectfermentationen
dc.titleData Analytics and Machine Learning Applications in Fermentation Processes and Molecular Property Predictionen
dc.typeDissertationen
thesis.degree.disciplineChemical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
Nguyen_X_D_2025.pdf
Size:
6.25 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Nguyen_X_D_2025_support_3.pdf
Size:
244.93 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents
Loading...
Thumbnail Image
Name:
Nguyen_X_D_2025_support_1.pdf
Size:
456.37 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents