Deep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial Observations

Xu, Yue

Deep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial Observations

dc.contributor.author	Xu, Yue	en
dc.contributor.committeechair	Buehrer, Richard M.	en
dc.contributor.committeemember	Headley, William C.	en
dc.contributor.committeemember	Dhillon, Harpreet Singh	en
dc.contributor.committeemember	Liu, Lingjia	en
dc.contributor.committeemember	Wang, Yue J.	en
dc.contributor.department	Electrical Engineering	en
dc.date.accessioned	2022-09-24T08:00:14Z	en
dc.date.available	2022-09-24T08:00:14Z	en
dc.date.issued	2022-09-23	en
dc.description.abstract	Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. DRL does not require the explicit estimation of transition probability matrices and prohibitively large matrix computations as compared to traditional reinforcement learning methods. Further, since many learning approaches cannot solve the resulting online Partially-Observable Markov Decision Process (POMDP), Deep Recurrent Q-Networks (DRQN) have been proposed to determine the optimal channel access policy via online learning. The fundamental goal of this dissertation is to develop DRL-based solutions to address this POMDP-DSA problem. We mainly consider three aspects in this work: (1) optimal transmission strategies, (2) combined intelligent sensing and transmission strategies, and (c) learning efficiency or online convergence speed. Four key challenges in this problem are (1) the proposed DRQN-based node does not know the other nodes' behavior patterns a priori and must to predict the future channel state based on previous observations; (2) the impact to primary user throughput during learning and even after learning must be limited; (3) resources can be wasted the sensing/observation; and (4) convergence speed must be improved without impacting performance performance. We demonstrate in this dissertation, that the proposed DRQN can learn: (1) the optimal transmission strategy in a variety of environments under partial observations; (2) a sensing strategy that provides near-optimal throughput in different environments while dramatically reducing the needed sensing resources; (3) robustness to imperfect observations; (4) a sufficiently flexible approach that can accommodate dynamic environments, multi-channel transmission and the presence of multiple agents; (5) in an accelerated fashion utilizing one of three different approaches.	en
dc.description.abstractgeneral	With the development of wireless communication, such as 5G, global mobile data traffic has experienced tremendous growth, which makes spectrum resources even more critical for future networks. However, the spectrum is an exorbitant and scarce resource. Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. Compared with traditional reinforcement learning methods, DRL does not require explicit estimation of transition probability matrices and extensive matrix computations. Furthermore, since many learning methods cannot solve the resulting online partially observable Markov decision process (POMDP), a deep recurrent Q-network (DRQN) is proposed to determine the optimal channel access policy through online learning. The basic goal of this paper is to develop a DRL-based solution to this POMDP-DSA problem. This paper mainly focuses on improving performance from three directions. 1. Find the optimal (or sub-optimal) channel access strategy based on fixed partial observation mode; 2. Based on work 1, propose a more intelligent way to dynamically and efficiently find more reasonable (higher efficiency) sensing/observation policy and corresponding channel access strategy; 3. On the premise of ensuring performance, use different machine learning algorithms or structures to improve learning efficiency and avoid users waiting too long for expected performance. Through the research in these three main directions, we have found an efficient and diverse solution, namely DRQN-based technology.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:35609	en
dc.identifier.uri	http://hdl.handle.net/10919/111994	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Dynamic Spectrum Access	en
dc.subject	Partial Knowledge	en
dc.subject	Deep Recurrent Neural Networks	en
dc.subject	Parallel Learning	en
dc.subject	Transfer Learning	en
dc.subject	Meta-Learning	en
dc.subject	Sensing Prediction	en
dc.subject	Imperfect System Feedback	en
dc.subject	Multi-Rate and Multi-Agent	en
dc.subject	Dynamic environments	en
dc.subject	Cache	en
dc.title	Deep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial Observations	en
dc.type	Dissertation	en
thesis.degree.discipline	Electrical Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Xu_Y_D_2022.pdf
Size:: 3.59 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations