The Importance of Data in RF Machine Learning

Files

TR Number

Date

2022-11-17

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

While the toolset known as Machine Learning (ML) is not new, several of the tools available within the toolset have seen revitalization with improved hardware, and have been applied across several domains in the last two decades. Deep Neural Network (DNN) applications have contributed to significant research within Radio Frequency (RF) problems over the last decade, spurred by results in image and audio processing. Machine Learning (ML), and Deep Learning (DL) specifically, are driven by access to relevant data during the training phase of the application due to the learned feature sets that are derived from vast amounts of similar data. Despite this critical reliance on data, the literature provides insufficient answers on how to quantify the data training needs of an application in order to achieve a desired performance.

This dissertation first aims to create a practical definition that bounds the problem space of Radio Frequency Machine Learning (RFML), which we take to mean the application of Machine Learning (ML) as close to the sampled baseband signal directly after digitization as is possible, while allowing for preprocessing when reasonably defined and justified. After constraining the problem to the Radio Frequency Machine Learning (RFML) domain space, an understanding of what kinds of Machine Learning (ML) have been applied as well as the techniques that have shown benefits will be reviewed from the literature. With the problem space defined and the trends in the literature examined, the next goal aims at providing a better understanding for the concept of data quality through quantification. This quantification helps explain how the quality of data: affects Machine Learning (ML) systems with regard to final performance, drives required data observation quantity within that space, and impacts can be generalized and contrasted. With the understanding of how data quality and quantity can affect the performance of a system in the Radio Frequency Machine Learning (RFML) space, an examination of the data generation techniques and realizations from conceptual through real-time hardware implementations are discussed. Consequently, the results of this dissertation provide a foundation for estimating the investment required to realize a performance goal within a Deep Learning (DL) framework as well as a rough order of magnitude for common goals within the Radio Frequency Machine Learning (RFML) problem space.

Description

Keywords

machine learning, rfml, radio frequency machine learning, data generation, data collection, software defined radio

Citation