Deep Reinforcement Learning for Next Generation Wireless Networks with Echo State Networks

dc.contributor.authorChang, Hao-Hsuanen
dc.contributor.committeechairLiu, Lingjiaen
dc.contributor.committeememberBuehrer, Richard M.en
dc.contributor.committeememberYi, Yangen
dc.contributor.committeememberReed, Jeffrey H.en
dc.contributor.committeememberKong, Zhenyuen
dc.contributor.departmentElectrical Engineeringen
dc.date.accessioned2023-02-18T07:00:06Zen
dc.date.available2023-02-18T07:00:06Zen
dc.date.issued2021-08-26en
dc.description.abstractThis dissertation considers a deep reinforcement learning (DRL) setting under the practical challenges of real-world wireless communication systems. The non-stationary and partially observable wireless environments make the learning and the convergence of the DRL agent challenging. One way to facilitate learning in partially observable environments is to combine recurrent neural network (RNN) and DRL to capture temporal information inherent in the system, which is referred to as deep recurrent Q-network (DRQN). However, training DRQN is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted wireless applications in the 5G and future 6G wireless networks, the available training data is very limited. Therefore, it is important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment that only requires limited training overhead. In this dissertation, we design efficient DRL frameworks by utilizing echo state network (ESN), which is a special type of RNNs where only the output weights are trained. To be specific, we first introduce the deep echo state Q-network (DEQN) by adopting ESN as the kernel of deep Q-networks. Next, we introduce federated ESN-based policy gradient (Fed-EPG) approach that enables multiple agents collaboratively learn a shared policy to achieve the system goal. We designed computationally efficient training algorithms by utilizing the special structure of ESNs, which have the advantage of learning a good policy in a short time with few training data. Theoretical analyses are conducted for DEQN and Fed-EPG approaches to show the convergence properties and to provide a guide to hyperparameter tuning. Furthermore, we evaluate the performance under the dynamic spectrum sharing (DSS) scenario, which is a key enabling technology that aims to utilize the precious spectrum resources more efficiently. Compared to a conventional spectrum management policy that usually grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to dynamically share the spectrum with the primary system. Our work sheds light on the real deployments of DRL techniques in next generation wireless systems.en
dc.description.abstractgeneralModel-free reinforcement learning (RL) algorithms such as Q-learning are widely used because it can learn the policy directly through interactions with the environment without estimating a model of the environment, which is useful when the underlying system model is complex. Q-learning performs poorly for large-scale models because the training has to updates every element in a large Q-table, which makes training difficult or even impossible. Therefore, deep reinforcement learning (DRL) exploits the powerful deep neural network to approximate the Q-table. Furthermore, a deep recurrent Q-network (DRQN) is introduced to facilitate learning in partially observable environments. However, DRQN training requires a large amount of training data and a long training time to achieve convergence, which is impractical in wireless systems with non-stationary environments and limited training data. Therefore, in this dissertation, we introduce two efficient DRL approaches: deep echo state Q-network (DEQN) and federated ESN-based policy gradient (Fed-EPG) approaches. Theoretical analyses of DEQN and Fed-EPG are conducted to provide the convergence properties and the guideline for designing hyperparameters. We evaluate and demonstrate the performance benefits of the DEQN and Fed-EPG under the dynamic spectrum sharing (DSS) scenario, which is a critical technology to efficiently utilize the precious spectrum resources in 5G and future 6G wireless networks.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:32309en
dc.identifier.urihttp://hdl.handle.net/10919/113863en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectDeep Reinforcement Learningen
dc.subjectEcho State Networken
dc.subjectDynamic Spectrum Sharingen
dc.titleDeep Reinforcement Learning for Next Generation Wireless Networks with Echo State Networksen
dc.typeDissertationen
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chang_H_D_2021.pdf
Size:
8.35 MB
Format:
Adobe Portable Document Format