Science Guided Machine Learning: Incorporating Scientific Domain Knowledge for Learning Under Data Paucity and Noisy Contexts

Muralidhar, Nikhil

Science Guided Machine Learning: Incorporating Scientific Domain Knowledge for Learning Under Data Paucity and Noisy Contexts

Files

Muralidhar_N_D_2022.pdf (27.78 MB)

Downloads: 721

Date

2022-08-18

Authors

Muralidhar, Nikhil

Publisher

Virginia Tech

Abstract

In recent years, the large amount of labeled data available has helped tend machine learning (ML) research toward using purely data driven end-to-end pipelines, e.g., in deep neural network research. However, in many situations, data is limited and of poor quality. Traditional ML pipelines are known to be susceptible to various issues when trained on low volumes of non-representative, noisy datasets. We investigate the question of whether prior domain knowledge about the problem being modeled can be employed within the ML pipeline to improve model performance under data paucity and in noisy contexts? This report presents recent developments as well as details, novel contributions in the context of incorporating prior domain knowledge in various data-driven modeling (i.e., machine learning - ML) pipelines particularly geared towards scientific applications. Such domain knowledge exists in various forms and can be incorporated into the machine learning pipeline using different implicit and explicit methods (termed: science-guided machine learning (SGML)). All the novel techniques proposed in this report have been presented in the context of developing SGML to model fluid dynamics applications, but can be easily generalized to other applications. Specifically, we present SGML pipelines to (i) incorporate prior domain knowledge into the ML model architecture (ii) incorporate knowledge about the distribution of the target process as statistical priors for improved prediction performance (iii) leverage prior knowledge to quantify consistency of ML decisions with scientific principles (iv) explicitly incorporate known mathematical relationships of scientific phenomena to influence the ML pipeline (v) develop science-guided transfer learning to improve performance under data paucity. Each technique that is presented, has been designed with a focus on simplicity and minimal cost of implementation with a goal of yielding significant improvements in model performance especially under low data volumes or under noisy data conditions. In each application, we demonstrate through rigorous qualitative and quantitative experiments that our SGML pipelines achieve significant improvements in performance and interpretability over corresponding models that are purely data-driven and agnostic to scientific knowledge.

Keywords

Machine Learning, Physics-Guided Machine Learning, Neural Networks, Data Paucity, Computational Fluid Dynamics

Persistent link

http://hdl.handle.net/10919/111558

Collections

Doctoral Dissertations

Full item page

Science Guided Machine Learning: Incorporating Scientific Domain Knowledge for Learning Under Data Paucity and Noisy Contexts

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections