Deep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial Observations

dc.contributor.authorXu, Yueen
dc.contributor.committeechairBuehrer, Richard M.en
dc.contributor.committeememberHeadley, William C.en
dc.contributor.committeememberDhillon, Harpreet Singhen
dc.contributor.committeememberLiu, Lingjiaen
dc.contributor.committeememberWang, Yue J.en
dc.contributor.departmentElectrical Engineeringen
dc.date.accessioned2022-09-24T08:00:14Zen
dc.date.available2022-09-24T08:00:14Zen
dc.date.issued2022-09-23en
dc.description.abstractDynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. DRL does not require the explicit estimation of transition probability matrices and prohibitively large matrix computations as compared to traditional reinforcement learning methods. Further, since many learning approaches cannot solve the resulting online Partially-Observable Markov Decision Process (POMDP), Deep Recurrent Q-Networks (DRQN) have been proposed to determine the optimal channel access policy via online learning. The fundamental goal of this dissertation is to develop DRL-based solutions to address this POMDP-DSA problem. We mainly consider three aspects in this work: (1) optimal transmission strategies, (2) combined intelligent sensing and transmission strategies, and (c) learning efficiency or online convergence speed. Four key challenges in this problem are (1) the proposed DRQN-based node does not know the other nodes' behavior patterns a priori and must to predict the future channel state based on previous observations; (2) the impact to primary user throughput during learning and even after learning must be limited; (3) resources can be wasted the sensing/observation; and (4) convergence speed must be improved without impacting performance performance. We demonstrate in this dissertation, that the proposed DRQN can learn: (1) the optimal transmission strategy in a variety of environments under partial observations; (2) a sensing strategy that provides near-optimal throughput in different environments while dramatically reducing the needed sensing resources; (3) robustness to imperfect observations; (4) a sufficiently flexible approach that can accommodate dynamic environments, multi-channel transmission and the presence of multiple agents; (5) in an accelerated fashion utilizing one of three different approaches.en
dc.description.abstractgeneralWith the development of wireless communication, such as 5G, global mobile data traffic has experienced tremendous growth, which makes spectrum resources even more critical for future networks. However, the spectrum is an exorbitant and scarce resource. Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. Compared with traditional reinforcement learning methods, DRL does not require explicit estimation of transition probability matrices and extensive matrix computations. Furthermore, since many learning methods cannot solve the resulting online partially observable Markov decision process (POMDP), a deep recurrent Q-network (DRQN) is proposed to determine the optimal channel access policy through online learning. The basic goal of this paper is to develop a DRL-based solution to this POMDP-DSA problem. This paper mainly focuses on improving performance from three directions. 1. Find the optimal (or sub-optimal) channel access strategy based on fixed partial observation mode; 2. Based on work 1, propose a more intelligent way to dynamically and efficiently find more reasonable (higher efficiency) sensing/observation policy and corresponding channel access strategy; 3. On the premise of ensuring performance, use different machine learning algorithms or structures to improve learning efficiency and avoid users waiting too long for expected performance. Through the research in these three main directions, we have found an efficient and diverse solution, namely DRQN-based technology.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:35609en
dc.identifier.urihttp://hdl.handle.net/10919/111994en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectDynamic Spectrum Accessen
dc.subjectPartial Knowledgeen
dc.subjectDeep Recurrent Neural Networksen
dc.subjectParallel Learningen
dc.subjectTransfer Learningen
dc.subjectMeta-Learningen
dc.subjectSensing Predictionen
dc.subjectImperfect System Feedbacken
dc.subjectMulti-Rate and Multi-Agenten
dc.subjectDynamic environmentsen
dc.subjectCacheen
dc.titleDeep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial Observationsen
dc.typeDissertationen
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xu_Y_D_2022.pdf
Size:
3.59 MB
Format:
Adobe Portable Document Format