Achieving More with Less: Learning Generalizable Neural Networks With Less Labeled Data and Computational Overheads
MetadataShow full item record
Recent advancements in deep learning have demonstrated its incredible ability to learn generalizable patterns and relationships automatically from data in a number of mainstream applications. However, the generalization power of deep learning methods largely comes at the costs of working with very large datasets and using highly compute-intensive models. Many applications cannot afford these costs needed to ensure generalizability of deep learning models. For instance, obtaining labeled data can be costly in scientific applications, and using large models may not be feasible in resource-constrained environments involving portable devices. This dissertation aims to improve efficiency in machine learning by exploring different ways to learn generalizable neural networks that require less labeled data and computational resources. We demonstrate that using physics supervision in scientific problems can reduce the need for labeled data, thereby improving data efficiency without compromising model generalizability. Additionally, we investigate the potential of transfer learning powered by transformers in scientific applications as a promising direction for further improving data efficiency. On the computational efficiency side, we present two efforts for increasing parameter efficiency of neural networks through novel architectures and structured network pruning.
General Audience Abstract
Deep learning is a powerful technique that can help us solve complex problems, but it often requires a lot of data and resources. This research aims to make deep learning more efficient, so it can be applied in more situations. We propose ways to make the deep learning models require less data and less computer power. For example, we leverage the physics rules as additional information for training the neural network to learn from less labeled data and we use a technique called transfer learning to leverage knowledge from data that is from other distribution. Transfer learning may allow us to further reduce the need for labeled data in scientific applications. We also look at ways to make the deep learning models use less computational resources, by effectively reducing their sizes via novel architectures or pruning out redundant structures.
- Doctoral Dissertations