Science Guided Machine Learning: Incorporating Scientific Domain Knowledge for Learning Under Data Paucity and Noisy Contexts

dc.contributor.authorMuralidhar, Nikhilen
dc.contributor.committeechairRamakrishnan, Narendranen
dc.contributor.committeememberTafti, Danesh K.en
dc.contributor.committeememberLu, Chang Tienen
dc.contributor.committeememberKarpatne, Anujen
dc.contributor.committeememberErmon, Stefanoen
dc.contributor.departmentComputer Science and Applicationsen
dc.date.accessioned2022-08-19T08:00:29Zen
dc.date.available2022-08-19T08:00:29Zen
dc.date.issued2022-08-18en
dc.description.abstractIn recent years, the large amount of labeled data available has helped tend machine learning (ML) research toward using purely data driven end-to-end pipelines, e.g., in deep neural network research. However, in many situations, data is limited and of poor quality. Traditional ML pipelines are known to be susceptible to various issues when trained on low volumes of non-representative, noisy datasets. We investigate the question of whether prior domain knowledge about the problem being modeled can be employed within the ML pipeline to improve model performance under data paucity and in noisy contexts? This report presents recent developments as well as details, novel contributions in the context of incorporating prior domain knowledge in various data-driven modeling (i.e., machine learning - ML) pipelines particularly geared towards scientific applications. Such domain knowledge exists in various forms and can be incorporated into the machine learning pipeline using different implicit and explicit methods (termed: science-guided machine learning (SGML)). All the novel techniques proposed in this report have been presented in the context of developing SGML to model fluid dynamics applications, but can be easily generalized to other applications. Specifically, we present SGML pipelines to (i) incorporate prior domain knowledge into the ML model architecture (ii) incorporate knowledge about the distribution of the target process as statistical priors for improved prediction performance (iii) leverage prior knowledge to quantify consistency of ML decisions with scientific principles (iv) explicitly incorporate known mathematical relationships of scientific phenomena to influence the ML pipeline (v) develop science-guided transfer learning to improve performance under data paucity. Each technique that is presented, has been designed with a focus on simplicity and minimal cost of implementation with a goal of yielding significant improvements in model performance especially under low data volumes or under noisy data conditions. In each application, we demonstrate through rigorous qualitative and quantitative experiments that our SGML pipelines achieve significant improvements in performance and interpretability over corresponding models that are purely data-driven and agnostic to scientific knowledge.en
dc.description.abstractgeneralIn this work, we present techniques for incorporating scientific knowledge into machine learning (ML) pipelines. We demonstrate these techniques with ML models trained with low data volumes as well as with non-representative, noisy datasets. In both these cases, we demonstrate through rigorous experimentation that incorporating scientific domain knowledge into the ML pipeline using our proposed science guided machine learning (SGML) techniques, leads to significant performance improvement.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:35411en
dc.identifier.urihttp://hdl.handle.net/10919/111558en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectMachine Learningen
dc.subjectPhysics-Guided Machine Learningen
dc.subjectNeural Networksen
dc.subjectData Paucityen
dc.subjectComputational Fluid Dynamicsen
dc.titleScience Guided Machine Learning: Incorporating Scientific Domain Knowledge for Learning Under Data Paucity and Noisy Contextsen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Muralidhar_N_D_2022.pdf
Size:
27.78 MB
Format:
Adobe Portable Document Format