Use of Reinforcement Learning for Interference Avoidance or Efficient Jamming in Wireless Communications

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


We implement reinforcement learning in the context of wireless communications in two very different settings. In the first setting, we study the use of reinforcement learning in an underwater acoustic communications network to adapt its transmission frequencies to avoid interference and potential malicious jammers. To that effect, we implement a reinforcement learning algorithm called contextual bandits. The harsh environment of an underwater channel provides a challenging problem. The channel may induce multipath and time delays which lead to time-varying, frequency-selective attenuation. These factors are also influenced by the distance between the transmitter and receiver, the subbands the interference is located within, and the power of the transmitter. We show that the agent is effectively able to avoid frequency bands that have degraded channel quality or that contain interference, both of which are dynamic or time-varying . In the second setting, we study the use of reinforcement learning to adapt the modulation and power scheme of a jammer seeking to disrupt a wireless communications system. To achieve this, we make use of a linear contextual bandit to learn to jam the victim system. Prior work has shown that with the use of linear bandits, improved convergence is achieved to jam a single-carrier system using time-domain jamming schemes. However, communications systems today typically employ orthogonal frequency division multiplexing (OFDM) to transmit data, particularly in 4G/5G networks. This work explores the use of linear Thompson Sampling (TS) to jam OFDM-modulated signals. The jammer may select from both time-domain and frequency-domain jamming schemes. We demonstrate that the linear TS algorithm is able to perform better than a traditional reinforcement learning algorithm, upper confidence bound-1 (UCB-1), in terms of maximizing the victim's symbol error rate. We also draw novel insights by observing the action states, to which the reinforcement learning algorithm converges. We then investigate the design and modification of the context vector in the hope of in- creasing overall performance of the bandit, such as decreased learning period and increased symbol error rate caused to the victim. This includes running experiments on particular features and examining how the bandit weights the importance of the features in the context vector. Lastly, we study how to jam an OFDM-modulated signal which employs forward error correction coding. We extend this to leverage reinforcement learning to jam a 5G-based system implementing some aspects of the 5G protocol. This model is then modified to introduce unreliable reward feedback in the form of ACK/NACK observations to the jammer to understand the effect of how imperfect observations of errors can affect the jammer's ability to learn. We gain insights into the convergence time of the jammer and its ability to jam the victim, as well as improvements to the algorithm, and insights into the vulnerabilities of wireless communications for reinforcement learning based jamming.



Reinforcement Learning, OFDM, Interference Avoidance, Jamming, Underwater Channel, 5G