Bilevel Optimization in the Deep Learning Era: Methods and Applications

dc.contributor.authorZhang, Leien
dc.contributor.committeechairLu, Chang Tienen
dc.contributor.committeememberRamakrishnan, Narendranen
dc.contributor.committeememberCho, Jin-Heeen
dc.contributor.committeememberWu, Lingfeien
dc.contributor.committeememberPrakash, Bodicherla Adityaen
dc.contributor.departmentComputer Science and Applicationsen
dc.date.accessioned2024-01-06T09:00:28Zen
dc.date.available2024-01-06T09:00:28Zen
dc.date.issued2024-01-05en
dc.description.abstractNeural networks, coupled with their associated optimization algorithms, have demonstrated remarkable efficacy and versatility across an extensive array of tasks, encompassing image recognition, speech recognition, object detection, sentiment analysis, and more. The inherent strength of neural networks lies in their capability to autonomously learn intricate representations that map input data to corresponding output labels seamlessly. Nevertheless, not all tasks can be neatly encapsulated within the confines of an end-to-end learning paradigm. The complexity and diversity of real-world challenges necessitate innovative approaches that extend beyond conventional formulations. This calls for the exploration of specialized architectures and optimization strategies tailored to the unique intricacies of specific tasks, ensuring a more nuanced and effective solution to the myriad demands of diverse applications. The bi-level optimization problem stands out as a distinctive form of optimization, characterized by the embedding or nesting of one problem within another. Its relevance persists significantly in the current era dominated by deep learning. A notable instance of its application in the realm of deep learning is observed in hyperparameter optimization. In the context of neural networks, the automatic training of weights through backpropagation represents a crucial aspect. However, certain hyperparameters, such as the learning rate (lr) and the number of layers, must be predetermined and cannot be optimized through the conventional chain rule employed in backpropagation. This underscores the importance of bi-level optimization in addressing the intricate task of fine-tuning these hyperparameters to enhance the overall performance of deep learning models. The domain of deep learning presents a fertile ground for further exploration and discoveries in optimization. The untapped potential for refining hyperparameters and optimizing various aspects of neural network architectures highlights the ongoing opportunities for advancements and breakthroughs in this dynamic field. Within this thesis, we delve into significant bi-level optimization challenges, applying these techniques to pertinent real-world tasks. Given that bi-level optimization entails dual layers of optimization, we explore scenarios where neural networks are present in the upper-level, the inner-level, or both. To be more specific, we systematically investigate four distinct tasks: optimizing neural networks towards optimizing neural networks, optimizing attractors towards optimizing neural networks, optimizing graph structures towards optimizing neural network performance, and optimizing architecture towards optimizing neural networks. For each of these tasks, we formulate the problems using the bi-level optimization approach mathematically, introducing more efficient optimization strategies. Furthermore, we meticulously evaluate the performance and efficiency of our proposed techniques. Importantly, our methodologies and insights transcend the realm of bi-level optimization, extending their applicability broadly to various deep learning models. The contributions made in this thesis offer valuable perspectives and tools for advancing optimization techniques in the broader landscape of deep learning.en
dc.description.abstractgeneralBilevel optimization proves to be a valuable technique across various applications. Mathematically, it entails optimizing an objective at the upper level while concurrently addressing another optimization problem at the lower level. The key challenge lies in finding optimal solutions at both levels simultaneously, considering the interdependence between decisions made at each level. The complexity of bilevel optimization escalates when integrated with deep learning. Firstly, deep learning models typically undergo iterative optimization, presenting challenges in streamlining the process within a bilevel optimization framework. Secondly, the bilevel setting introduces complexity, making it difficult to achieve end-to-end optimization for deep learning models. This thesis delves into the bilevel optimization problem through four distinct approaches that incorporate deep learning. These approaches represent different tasks spanning various domains of machine learning, including neural architecture search, graph structure learning, implicit model, and causal inference. Notably, the proposed methods not only address specific types of bilevel optimization problems but also offer theoretical guarantees. The insights and methodologies presented in this thesis have the potential to aid individuals in solving problems involving high-order decisions.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:38775en
dc.identifier.urihttps://hdl.handle.net/10919/117311en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectNASen
dc.subjectGraphen
dc.subjectGNNen
dc.subjectArchitectureen
dc.titleBilevel Optimization in the Deep Learning Era: Methods and Applicationsen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_L_D_2024.pdf
Size:
1.89 MB
Format:
Adobe Portable Document Format