Bilevel Optimization in the Deep Learning Era: Methods and Applications

Zhang, Lei

Bilevel Optimization in the Deep Learning Era: Methods and Applications

Files

Zhang_L_D_2024.pdf (1.89 MB)

Downloads: 340

Date

2024-01-05

Authors

Zhang, Lei

Publisher

Virginia Tech

Abstract

Neural networks, coupled with their associated optimization algorithms, have demonstrated remarkable efficacy and versatility across an extensive array of tasks, encompassing image recognition, speech recognition, object detection, sentiment analysis, and more. The inherent strength of neural networks lies in their capability to autonomously learn intricate representations that map input data to corresponding output labels seamlessly. Nevertheless, not all tasks can be neatly encapsulated within the confines of an end-to-end learning paradigm. The complexity and diversity of real-world challenges necessitate innovative approaches that extend beyond conventional formulations. This calls for the exploration of specialized architectures and optimization strategies tailored to the unique intricacies of specific tasks, ensuring a more nuanced and effective solution to the myriad demands of diverse applications. The bi-level optimization problem stands out as a distinctive form of optimization, characterized by the embedding or nesting of one problem within another. Its relevance persists significantly in the current era dominated by deep learning. A notable instance of its application in the realm of deep learning is observed in hyperparameter optimization. In the context of neural networks, the automatic training of weights through backpropagation represents a crucial aspect. However, certain hyperparameters, such as the learning rate (lr) and the number of layers, must be predetermined and cannot be optimized through the conventional chain rule employed in backpropagation. This underscores the importance of bi-level optimization in addressing the intricate task of fine-tuning these hyperparameters to enhance the overall performance of deep learning models. The domain of deep learning presents a fertile ground for further exploration and discoveries in optimization. The untapped potential for refining hyperparameters and optimizing various aspects of neural network architectures highlights the ongoing opportunities for advancements and breakthroughs in this dynamic field. Within this thesis, we delve into significant bi-level optimization challenges, applying these techniques to pertinent real-world tasks. Given that bi-level optimization entails dual layers of optimization, we explore scenarios where neural networks are present in the upper-level, the inner-level, or both. To be more specific, we systematically investigate four distinct tasks: optimizing neural networks towards optimizing neural networks, optimizing attractors towards optimizing neural networks, optimizing graph structures towards optimizing neural network performance, and optimizing architecture towards optimizing neural networks. For each of these tasks, we formulate the problems using the bi-level optimization approach mathematically, introducing more efficient optimization strategies. Furthermore, we meticulously evaluate the performance and efficiency of our proposed techniques. Importantly, our methodologies and insights transcend the realm of bi-level optimization, extending their applicability broadly to various deep learning models. The contributions made in this thesis offer valuable perspectives and tools for advancing optimization techniques in the broader landscape of deep learning.

Keywords

NAS, Graph, GNN, Architecture

Persistent link

https://hdl.handle.net/10919/117311

Collections

Doctoral Dissertations

Full item page

Bilevel Optimization in the Deep Learning Era: Methods and Applications

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections