The Importance of Data in RF Machine Learning
dc.contributor.author | Clark IV, William Henry | en |
dc.contributor.committeechair | Michaels, Alan J. | en |
dc.contributor.committeechair | Buehrer, Richard M. | en |
dc.contributor.committeemember | Plassmann, Paul E. | en |
dc.contributor.committeemember | Clancy, Thomas Charles | en |
dc.contributor.committeemember | Embree, Mark P. | en |
dc.contributor.committeemember | Ernst, Joseph M. | en |
dc.contributor.department | Electrical Engineering | en |
dc.date.accessioned | 2022-11-18T09:00:09Z | en |
dc.date.available | 2022-11-18T09:00:09Z | en |
dc.date.issued | 2022-11-17 | en |
dc.description.abstract | While the toolset known as Machine Learning (ML) is not new, several of the tools available within the toolset have seen revitalization with improved hardware, and have been applied across several domains in the last two decades. Deep Neural Network (DNN) applications have contributed to significant research within Radio Frequency (RF) problems over the last decade, spurred by results in image and audio processing. Machine Learning (ML), and Deep Learning (DL) specifically, are driven by access to relevant data during the training phase of the application due to the learned feature sets that are derived from vast amounts of similar data. Despite this critical reliance on data, the literature provides insufficient answers on how to quantify the data training needs of an application in order to achieve a desired performance. This dissertation first aims to create a practical definition that bounds the problem space of Radio Frequency Machine Learning (RFML), which we take to mean the application of Machine Learning (ML) as close to the sampled baseband signal directly after digitization as is possible, while allowing for preprocessing when reasonably defined and justified. After constraining the problem to the Radio Frequency Machine Learning (RFML) domain space, an understanding of what kinds of Machine Learning (ML) have been applied as well as the techniques that have shown benefits will be reviewed from the literature. With the problem space defined and the trends in the literature examined, the next goal aims at providing a better understanding for the concept of data quality through quantification. This quantification helps explain how the quality of data: affects Machine Learning (ML) systems with regard to final performance, drives required data observation quantity within that space, and impacts can be generalized and contrasted. With the understanding of how data quality and quantity can affect the performance of a system in the Radio Frequency Machine Learning (RFML) space, an examination of the data generation techniques and realizations from conceptual through real-time hardware implementations are discussed. Consequently, the results of this dissertation provide a foundation for estimating the investment required to realize a performance goal within a Deep Learning (DL) framework as well as a rough order of magnitude for common goals within the Radio Frequency Machine Learning (RFML) problem space. | en |
dc.description.abstractgeneral | Machine Learning (ML) is a powerful toolset capable of solving difficult problems across many domains. A fundamental part of this toolset is the representative data used to train a system. Unlike the domains of image or audio processing, for which datasets are constantly being developed thanks to usage agreements with entities such as Facebook, Google, and Amazon, the field of Machine Learning (ML) within the Radio Frequency (RF) domain, or Radio Frequency Machine Learning (RFML), does not have access to such crowdsourcing means of creating labeled datasets. Therefore data within the Radio Frequency Machine Learning (RFML) problem space must be intentionally cultivated to address the target problem. This dissertation explains the problem space of Radio Frequency Machine Learning (RFML) and then quantifies the effect of quality on data used during the training of Radio Frequency Machine Learning (RFML) systems. Taking this one step further, the work then goes on to provide a means of estimating data quantity needs to achieve high levels of performance based on the current Deep Learning (DL) approach to solve the problem, which in turn can be used as guidance to better refine the approach when the real-world data quantity requirements exceed practical acquisition levels. Finally, the problem of data generation is examined and provides context for the difficulties associated with procuring high quality data for problems in the Radio Frequency Machine Learning (RFML) space. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:35822 | en |
dc.identifier.uri | http://hdl.handle.net/10919/112668 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | Creative Commons Attribution 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | en |
dc.subject | machine learning | en |
dc.subject | rfml | en |
dc.subject | radio frequency machine learning | en |
dc.subject | data generation | en |
dc.subject | data collection | en |
dc.subject | software defined radio | en |
dc.title | The Importance of Data in RF Machine Learning | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Electrical Engineering | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1