Bilevel Optimization in the Deep Learning Era: Methods and Applications

Zhang, Lei

Bilevel Optimization in the Deep Learning Era: Methods and Applications

dc.contributor.author	Zhang, Lei	en
dc.contributor.committeechair	Lu, Chang Tien	en
dc.contributor.committeemember	Ramakrishnan, Narendran	en
dc.contributor.committeemember	Cho, Jin-Hee	en
dc.contributor.committeemember	Wu, Lingfei	en
dc.contributor.committeemember	Prakash, Bodicherla Aditya	en
dc.contributor.department	Computer Science and Applications	en
dc.date.accessioned	2024-01-06T09:00:28Z	en
dc.date.available	2024-01-06T09:00:28Z	en
dc.date.issued	2024-01-05	en
dc.description.abstract	Neural networks, coupled with their associated optimization algorithms, have demonstrated remarkable efficacy and versatility across an extensive array of tasks, encompassing image recognition, speech recognition, object detection, sentiment analysis, and more. The inherent strength of neural networks lies in their capability to autonomously learn intricate representations that map input data to corresponding output labels seamlessly. Nevertheless, not all tasks can be neatly encapsulated within the confines of an end-to-end learning paradigm. The complexity and diversity of real-world challenges necessitate innovative approaches that extend beyond conventional formulations. This calls for the exploration of specialized architectures and optimization strategies tailored to the unique intricacies of specific tasks, ensuring a more nuanced and effective solution to the myriad demands of diverse applications. The bi-level optimization problem stands out as a distinctive form of optimization, characterized by the embedding or nesting of one problem within another. Its relevance persists significantly in the current era dominated by deep learning. A notable instance of its application in the realm of deep learning is observed in hyperparameter optimization. In the context of neural networks, the automatic training of weights through backpropagation represents a crucial aspect. However, certain hyperparameters, such as the learning rate (lr) and the number of layers, must be predetermined and cannot be optimized through the conventional chain rule employed in backpropagation. This underscores the importance of bi-level optimization in addressing the intricate task of fine-tuning these hyperparameters to enhance the overall performance of deep learning models. The domain of deep learning presents a fertile ground for further exploration and discoveries in optimization. The untapped potential for refining hyperparameters and optimizing various aspects of neural network architectures highlights the ongoing opportunities for advancements and breakthroughs in this dynamic field. Within this thesis, we delve into significant bi-level optimization challenges, applying these techniques to pertinent real-world tasks. Given that bi-level optimization entails dual layers of optimization, we explore scenarios where neural networks are present in the upper-level, the inner-level, or both. To be more specific, we systematically investigate four distinct tasks: optimizing neural networks towards optimizing neural networks, optimizing attractors towards optimizing neural networks, optimizing graph structures towards optimizing neural network performance, and optimizing architecture towards optimizing neural networks. For each of these tasks, we formulate the problems using the bi-level optimization approach mathematically, introducing more efficient optimization strategies. Furthermore, we meticulously evaluate the performance and efficiency of our proposed techniques. Importantly, our methodologies and insights transcend the realm of bi-level optimization, extending their applicability broadly to various deep learning models. The contributions made in this thesis offer valuable perspectives and tools for advancing optimization techniques in the broader landscape of deep learning.	en
dc.description.abstractgeneral	Bilevel optimization proves to be a valuable technique across various applications. Mathematically, it entails optimizing an objective at the upper level while concurrently addressing another optimization problem at the lower level. The key challenge lies in finding optimal solutions at both levels simultaneously, considering the interdependence between decisions made at each level. The complexity of bilevel optimization escalates when integrated with deep learning. Firstly, deep learning models typically undergo iterative optimization, presenting challenges in streamlining the process within a bilevel optimization framework. Secondly, the bilevel setting introduces complexity, making it difficult to achieve end-to-end optimization for deep learning models. This thesis delves into the bilevel optimization problem through four distinct approaches that incorporate deep learning. These approaches represent different tasks spanning various domains of machine learning, including neural architecture search, graph structure learning, implicit model, and causal inference. Notably, the proposed methods not only address specific types of bilevel optimization problems but also offer theoretical guarantees. The insights and methodologies presented in this thesis have the potential to aid individuals in solving problems involving high-order decisions.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:38775	en
dc.identifier.uri	https://hdl.handle.net/10919/117311	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	NAS	en
dc.subject	Graph	en
dc.subject	GNN	en
dc.subject	Architecture	en
dc.title	Bilevel Optimization in the Deep Learning Era: Methods and Applications	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhang_L_D_2024.pdf
Size:: 1.89 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations