Scholarly Works, Virginia Tech National Security Institute

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 22
  • Training from Zero: Forecasting of Radio Frequency Machine Learning Data Quantity
    Clark, William H.; Michaels, Alan J. (MDPI, 2024-07-18)
    The data used during training in any given application space are directly tied to the performance of the system once deployed. While there are many other factors that are attributed to producing high-performance models based on the Neural Scaling Law within Machine Learning, there is no doubt that the data used to train a system provide the foundation from which to build. One of the underlying heuristics used within the Machine Learning space is that having more data leads to better models, but there is no easy answer to the question, “How much data is needed to achieve the desired level of performance?” This work examines a modulation classification problem in the Radio Frequency domain space, attempting to answer the question of how many training data are required to achieve a desired level of performance, but the procedure readily applies to classification problems across modalities. The ultimate goal is to determine an approach that requires the lowest amount of data collection to better inform a more thorough collection effort to achieve the desired performance metric. By focusing on forecasting the performance of the model rather than the loss value, this approach allows for a greater intuitive understanding of data volume requirements. While this approach will require an initial dataset, the goal is to allow for the initial data collection to be orders of magnitude smaller than what is required for delivering a system that achieves the desired performance. An additional benefit of the techniques presented here is that the quality of different datasets can be numerically evaluated and tied together with the quantity of data, and ultimately, the performance of the architecture in the problem domain.
  • Assessing the Value of Transfer Learning Metrics for Radio Frequency Domain Adaptation
    Wong, Lauren J.; Muller, Braeden P.; McPherson, Sean; Michaels, Alan J. (MDPI, 2024-07-25)
    The use of transfer learning (TL) techniques has become common practice in fields such as computer vision (CV) and natural language processing (NLP). Leveraging prior knowledge gained from data with different distributions, TL offers higher performance and reduced training time, but has yet to be fully utilized in applications of machine learning (ML) and deep learning (DL) techniques and applications related to wireless communications, a field loosely termed radio frequency machine learning (RFML). This work examines whether existing transferability metrics, used in other modalities, might be useful in the context of RFML. Results show that the two existing metrics tested, Log Expected Empirical Prediction (LEEP) and Logarithm of Maximum Evidence (LogME), correlate well with post-transfer accuracy and can therefore be used to select source models for radio frequency (RF) domain adaptation and to predict post-transfer accuracy.
  • Use and Abuse of Personal Information, Part I: Design of a Scalable OSINT Collection Engine
    Rheault, Elliott; Nerayo, Mary; Leonard, Jaden; Kolenbrander, Jack; Henshaw, Christopher; Boswell, Madison; Michaels, Alan J. (MDPI, 2024-08-13)
    In most open-source intelligence (OSINT) research efforts, the collection of information is performed in an entirely passive manner as an observer to third-party communication streams. This paper describes ongoing work that seeks to insert itself into that communication loop, fusing openly available data with requested content that is representative of what is sent to second parties. The mechanism for performing this is based on the sharing of falsified personal information through one-time online transactions that facilitate signup for newsletters, establish online accounts, or otherwise interact with resources on the Internet. The work has resulted in the real-time Use and Abuse of Personal Information OSINT collection engine that can ingest email, SMS text, and voicemail content at an enterprise scale. Foundations of this OSINT collection infrastructure are also laid to incorporate an artificial intelligence (AI)-driven interaction engine that shifts collection from a passive process to one that can effectively engage with different classes of content for improved real-world privacy experimentation and quantitative social science research.
  • Use & Abuse of Personal Information, Part II: Robust Generation of Fake IDs for Privacy Experimentation
    Kolenbrander, Jack; Husmann, Ethan; Henshaw, Christopher; Rheault, Elliott; Boswell, Madison; Michaels, Alan J. (MDPI, 2024-08-11)
    When personal information is shared across the Internet, we have limited confidence that the designated second party will safeguard it as we would prefer. Privacy policies offer insight into the best practices and intent of the organization, yet most are written so loosely that sharing with undefined third parties is to be anticipated. Tracking these sharing behaviors and identifying the source of unwanted content is exceedingly difficult when personal information is shared with multiple such second parties. This paper formulates a model for realistic fake identities, constructs a robust fake identity generator, and outlines management methods targeted towards online transactions (email, phone, text) that pass both cursory machine and human examination for use in personal privacy experimentation. This fake ID generator, combined with a custom account signup engine, are the core front-end components of our larger Use and Abuse of Personal Information system that performs one-time transactions that, similar to a cryptographic one-time pad, ensure that we can attribute the sharing back to the single one-time transaction and/or specific second party. The flexibility and richness of the fake IDs also serve as a foundational set of control variables for a wide range of social science research questions revolving around personal information. Collectively, these fake identity models address multiple inter-disciplinary areas of common interest and serve as a foundation for eliciting and quantifying personal information-sharing behaviors.
  • Multi-Hop User Equipment (UE) to UE Relays for MANET/Mesh Leveraging 5G NR Sidelink
    Shyy, DJ; Luu, Cuong; Xu, John D.; Liu, Lingjia; Erpek, Tugba; Gabay, David; Bate, David (ACM, 2023-12-06)
    This paper provides use cases to adapt 5G sidelink technology to enable multi-hop User Equipment (UE)-to-UE (U2U) and UE-to- Network relaying in 3GPP standards. Such a capability could enable groups of users to communicate with each other when operating at the periphery or outside a network’s coverage area, with commercial and public safety benefits. This paper compares routing protocols to enable sidelink with U2U relay to support a Mobile Ad hoc Network (MANET). A gap analysis of current 3rd Generation Partnership Project (3GPP) Release 18 (R-18) specifications is performed to determine the missing procedures to enable multi-hop U2U relaying, along with a proposed candidate protocol to fill the gap. The candidate protocol can be submitted as a contribution to 3GPP TSG Service and System Aspects (SA) Working Group 2 (WG2) as proposed changes to the 5G architecture in 3GPP Release 19 (R-19).
  • A Combinatorial Approach to Hyperparameter Optimization
    Khadka, Krishna; Chandrasekaran, Jaganmohan; Lei, Yu; Kacker, Raghu N.; Kuhn, D. Richard (ACM, 2024-04-14)
    In machine learning, hyperparameter optimization (HPO) is essential for effective model training and significantly impacts model performance. Hyperparameters are predefined model settings which fine-tune the model’s behavior and are critical to modeling complex data patterns. Traditional HPO approaches such as Grid Search, Random Search, and Bayesian Optimization have been widely used in this field. However, as datasets grow and models increase in complexity, these approaches often require a significant amount of time and resources for HPO. This research introduces a novel approach using 𝑡-way testing—a combinatorial approach to software testing used for identifying faults with a test set that covers all 𝑡-way interactions—for HPO. 𝑇 -way testing substantially narrows the search space and effectively covers parameter interactions. Our experimental results show that our approach reduces the number of necessary model evaluations and significantly cuts computational expenses while still outperforming traditional HPO approaches for the models studied in our experiments.
  • An Analysis of Radio Frequency Transfer Learning Behavior
    Wong, Lauren J.; Muller, Braeden; McPherson, Sean; Michaels, Alan J. (MDPI, 2024-06-03)
    Transfer learning (TL) techniques, which leverage prior knowledge gained from data with different distributions to achieve higher performance and reduced training time, are often used in computer vision (CV) and natural language processing (NLP), but have yet to be fully utilized in the field of radio frequency machine learning (RFML). This work systematically evaluates how the training domain and task, characterized by the transmitter (Tx)/receiver (Rx) hardware and channel environment, impact radio frequency (RF) TL performance for example automatic modulation classification (AMC) and specific emitter identification (SEI) use-cases. Through exhaustive experimentation using carefully curated synthetic and captured datasets with varying signal types, channel types, signal to noise ratios (SNRs), carrier/center frequencys (CFs), frequency offsets (FOs), and Tx and Rx devices, actionable and generalized conclusions are drawn regarding how best to use RF TL techniques for domain adaptation and sequential learning. Consistent with trends identified in other modalities, our results show that RF TL performance is highly dependent on the similarity between the source and target domains/tasks, but also on the relative difficulty of the source and target domains/tasks. Results also discuss the impacts of channel environment and hardware variations on RF TL performance and compare RF TL performance using head re-training and model fine-tuning methods.
  • Transferring Learned Behaviors between Similar and Different Radios
    Muller, Braeden P.; Olds, Brennan E.; Wong, Lauren J.; Michaels, Alan J. (MDPI, 2024-06-01)
    Transfer learning (TL) techniques have proven useful in a wide variety of applications traditionally dominated by machine learning (ML), such as natural language processing, computer vision, and computer-aided design. Recent extrapolations of TL to the radio frequency (RF) domain are being used to increase the potential applicability of RFML algorithms, seeking to improve the portability of models for spectrum situational awareness and transmission source identification. Unlike most of the computer vision and natural language processing applications of TL, applications within the RF modality must contend with inherent hardware distortions and channel condition variations. This paper seeks to evaluate the feasibility and performance trade-offs when transferring learned behaviors from functional RFML classification algorithms, specifically those designed for automatic modulation classification (AMC) and specific emitter identification (SEI), between homogeneous radios of similar construction and quality and heterogeneous radios of different construction and quality. Results derived from both synthetic data and over-the-air experimental collection show promising performance benefits from the application of TL to the RFML algorithms of SEI and AMC.
  • Generative AI tools can enhance climate literacy but must be checked for biases and inaccuracies
    Atkins, Carmen; Girgente, Gina; Shirzaei, Manoochehr; Kim, Junghwan (Springer Nature, 2024-04)
    In the face of climate change, climate literacy is becoming increasingly important. With wide access to generative AI tools, such as OpenAI’s ChatGPT, we explore the potential of AI platforms for ordinary citizens asking climate literacy questions. Here, we focus on a global scale and collect responses from ChatGPT (GPT-3.5 and GPT-4) on climate change-related hazard prompts over multiple iterations by utilizing the OpenAI’s API and comparing the results with credible hazard risk indices.Wefind a general sense of agreement in comparisons and consistency in ChatGPT over the iterations. GPT-4 displayed fewer errors than GPT-3.5. Generative AI tools may be used in climate literacy, a timely topic of importance, but must be scrutinized for potential biases and inaccuracies moving forward and considered in a social context. Future work should identify and disseminate best practices for optimal use across various generative AI tools.
  • Low-Latency Wireless Network Extension for Industrial Internet of Things
    Fletcher, Michael; Paulz, Eric; Ridge, Devin; Michaels, Alan J. (MDPI, 2024-03-26)
    The timely delivery of critical messages in real-time environments is an increasing requirement for industrial Internet of Things (IIoT) networks. Similar to wired time-sensitive networking (TSN) techniques, which bifurcate traffic flows based on priority, the proposed wireless method aims to ensure that critical traffic arrives rapidly across multiple hops to enable numerous IIoT use cases. IIoT architectures are migrating toward wirelessly connected edges, creating a desire to extend TSN-like functionality to a wireless format. Existing protocols possess inherent challenges to achieving this prioritized low-latency communication, ranging from rigidly scheduled time division transmissions, scalability/jitter of carrier-sense multiple access (CSMA) protocols, and encryption-induced latency. This paper presents a hardware-validated low-latency technique built upon receiver-assigned code division multiple access (RA-CDMA) techniques to implement a secure wireless TSN-like extension suitable for the IIoT. Results from our hardware prototype, constructed on the IntelFPGA Arria 10 platform, show that (sub-)millisecond single-hop latencies can be achieved for each of the available message types, ranging from 12 bits up to 224 bits of payload. By achieving one-way transmission of under 1 ms, a reliable wireless TSN extension with comparable timelines to 802.1Q and/or 5G is achievable and proven in concept through our hardware prototype.
  • Disappearing cities on US coasts
    Ohenhen, Leonard O.; Shirzaei, Manoochehr; Ojha, Chandrakanta; Sherpa, Sonam F.; Nicholls, Robert J. (Nature Research, 2024-03-06)
    The sea level along the US coastlines is projected to rise by 0.25–0.3 m by 2050, increasing the probability of more destructive flooding and inundation in major cities. However, these impacts may be exacerbated by coastal subsidence— the sinking of coastal land areas—a factor that is often underrepresented in coastal-management policies and long-term urban planning. In this study, we combine high-resolution vertical land motion (that is, raising or lowering of land) and elevation datasets with projections of sea-level rise to quantify the potential inundated areas in 32 major US coastal cities. Here we show that, even when considering the current coastal-defence structures, further land area of between 1,006 and 1,389 km² is threatened by relative sea-level rise by 2050, posing a threat to a population of 55,000–273,000 people and 31,000–171,000 properties. Our analysis shows that not accounting for spatially variable land subsidence within the cities may lead to inaccurate projections of expected exposure. These potential consequences show the scale of the adaptation challenge, which is not appreciated in most US coastal cities.
  • Slowly but surely: Exposure of communities and infrastructure to subsidence on the US east coast
    Ohenhen, Leonard; Shirzaei, Manoochehr; Barnard, Patrick L. (Oxford University Press, 2024-01-02)
    Coastal communities are vulnerable to multihazards, which are exacerbated by land subsidence. On the US east coast, the high density of population and assets amplifies the region's exposure to coastal hazards. We utilized measurements of vertical land motion rates obtained from analysis of radar datasets to evaluate the subsidence-hazard exposure to population, assets, and infrastructure systems/facilities along the US east coast. Here, we show that 2,000 to 74,000 km² land area, 1.2 to 14 million people, 476,000 to 6.3 million properties, and >50% of infrastructures in major cities such as New York, Baltimore, and Norfolk are exposed to subsidence rates between 1 and 2 mm per year. Additionally, our analysis indicates a notable trend: as subsidence rates increase, the extent of area exposed to these hazards correspondingly decreases. Our analysis has far-reaching implications for community and infrastructure resilience planning, emphasizing the need for a targeted approach in transitioning from reactive to proactive hazard mitigation strategies in the era of climate change.
  • A statistical framework for domain shape estimation in Stokes flows
    Borggaard, Jeffrey T.; Glatt-Holtz, Nathan E.; Krometis, Justin (IOP, 2023-08-01)
    We develop and implement a Bayesian approach for the estimation of the shape of a two dimensional annular domain enclosing a Stokes flow from sparse and noisy observations of the enclosed fluid. Our setup includes the case of direct observations of the flow field as well as the measurement of concentrations of a solute passively advected by and diffusing within the flow. Adopting a statistical approach provides estimates of uncertainty in the shape due both to the non-invertibility of the forward map and to error in the measurements. When the shape represents a design problem of attempting to match desired target outcomes, this ‘uncertainty’ can be interpreted as identifying remaining degrees of freedom available to the designer. We demonstrate the viability of our framework on three concrete test problems. These problems illustrate the promise of our framework for applications while providing a collection of test cases for recently developed Markov chain Monte Carlo algorithms designed to resolve infinite-dimensional statistical quantities.
  • Deep-Learning-Based Digitization of Protein-Self-Assembly to Print Biodegradable Physically Unclonable Labels for Device Security
    Pradhan, Sayantan; Rajagopala, Abhi D.; Meno, Emma; Adams, Stephen; Elks, Carl R.; Beling, Peter A.; Yadavalli, Vamsi K. (MDPI, 2023-08-28)
    The increasingly pervasive problem of counterfeiting affects both individuals and industry. In particular, public health and medical fields face threats to device authenticity and patient privacy, especially in the post-pandemic era. Physical unclonable functions (PUFs) present a modern solution using counterfeit-proof security labels to securely authenticate and identify physical objects. PUFs harness innately entropic information generators to create a unique fingerprint for an authentication protocol. This paper proposes a facile protein self-assembly process as an entropy generator for a unique biological PUF. The posited image digitization process applies a deep learning model to extract a feature vector from the self-assembly image. This is then binarized and debiased to produce a cryptographic key. The NIST SP 800-22 Statistical Test Suite was used to evaluate the randomness of the generated keys, which proved sufficiently stochastic. To facilitate deployment on physical objects, the PUF images were printed on flexible silk-fibroin-based biodegradable labels using functional protein bioinks. Images from the labels were captured using a cellphone camera and referenced against the source image for error rate comparison. The deep-learning-based biological PUF has potential as a low-cost, scalable, highly randomized strategy for anti-counterfeiting technology.
  • How Can the Adversary Effectively Identify Cellular IoT Devices Using LSTM Networks?
    Luo, Zhengping Jay; Pitera, Will; Zhao, Shangqing; Lu, Zhuo; Sagduyu, Yalin (ACM, 2023-06-01)
    The Internet of Things (IoT) has become a key enabler for connecting edge devices with each other and the internet. Massive IoT services provided by cellular networks offer various applications such as smart metering and smart cities. Security of the massive IoT devices working alongside traditional devices such as smartphones and laptops has become a major concern. Protecting these IoT devices from being identified by malicious attackers is often the first line of defense for cellular IoT devices. In this paper, we provide an effective attacking method for identifying cellular IoT devices from cellular networks. Inspired by the characteristics of Long Short-Term Memory (LSTM) networks, we have developed a method that can not only capture context information but also adapt to the dynamic changes of the environment over time. Experimental validation shows a high detection rate with less than 10 epochs of training on public datasets.
  • Disruptive Role of Vertical Land Motion in Future Assessments of Climate Change-Driven Sea-Level Rise and Coastal Flooding Hazards in the Chesapeake Bay
    Sherpa, Sonam Futi; Shirzaei, Manoochehr; Ojha, Chandrakanta (American Geophysical Union, 2023-04)
    Future projections of sea-level rise (SLR) used to assess coastal flooding hazards and exposure throughout the 21st century and devise risk mitigation efforts often lack an accurate estimate of coastal vertical land motion (VLM) rate, driven by anthropogenic or non-climate factors in addition to climatic factors. The Chesapeake Bay (CB) region of the United States is experiencing one of the fastest rates of relative sea-level rise on the Atlantic coast of the United States. This study uses a combination of space-borne Interferometric Synthetic Aperture Radar (InSAR), Global Navigation Satellite System (GNSS), Light Detecting and Ranging (LiDAR) data sets, available National Oceanic and Atmospheric Administration (NOAA) long-term tide gauge data, and SLR projections from the Intergovernmental Panel on Climate Change (IPCC), AR6 WG1 to quantify the regional rate of relative SLR and future flooding hazards for the years 2030, 2050, and 2100. By the year 2100, the total inundated areas from SLR and subsidence are projected to be 454(316–549)–600(535𝐴𝐴–690) km² for Shared Socioeconomic Pathways (SSPs) 1–1.9 to 5–8.5, respectively, and 342(132–552)–627(526–735) 𝐴𝐴 km2 only from SLR. The effect of storm surges based on Hurricane Isabel can increase the inundated area to 849(832–867)–1,117(1,054–1,205) km² under different VLM and SLR scenarios. We suggest that accurate estimates of VLM rate, such as those obtained here, are essential to revise IPCC projections and obtain accurate maps of coastal flooding and inundation hazards. The results provided here inform policymakers when assessing hazards associated with global climate changes and local factors in CB, required for developing risk management and disaster resilience plans.
  • How to Attack and Defend NextG Radio Access Network Slicing With Reinforcement Learning
    Shi, Yi; Sagduyu, Yalin E.; Erpek, Tugba; Gursoy, M. Cenk (IEEE, 2023)
    In this paper, reinforcement learning (RL) for network slicing is considered in next generation (NextG) radio access networks, where the base station (gNodeB) allocates resource blocks (RBs) to the requests of user equipments and aims to maximize the total reward of accepted requests over time. Based on adversarial machine learning, a novel over-the-air attack is introduced to manipulate the RL algorithm and disrupt NextG network slicing. The adversary observes the spectrum and builds its own RL based surrogate model that selects which RBs to jam subject to an energy budget with the objective of maximizing the number of failed requests due to jammed RBs. By jamming the RBs, the adversary reduces the RL algorithm's reward. As this reward is used as the input to update the RL algorithm, the performance does not recover even after the adversary stops jamming. This attack is evaluated in terms of both the recovery time and the (maximum and total) reward loss, and it is shown to be much more effective than benchmark (random and myopic) jamming attacks. Different reactive and proactive defense schemes such as suspending the RL algorithm's update once an attack is detected, introducing randomness to the decision process in RL to mislead the learning process of the adversary, or manipulating the feedback (NACK) mechanism such that the adversary may not obtain reliable information are introduced to show that it is viable to defend NextG network slicing against this attack, in terms of improving the RL algorithm's reward.
  • Collaborative Multi-Robot Multi-Human Teams in Search and Rescue
    Williams, Ryan K.; Abaid, Nicole; McClure, James; Lau, Nathan; Heintzman, Larkin; Hashimoto, Amanda; Wang, Tianzi; Patnayak, Chinmaya; Kumar, Akshay (2022-04-30)
    Robots such as unmanned aerial vehicles (UAVs) deployed for search and rescue (SAR) can explore areas where human searchers cannot easily go and gather information on scales that can transform SAR strategy. Multi-UAV teams therefore have the potential to transform SAR by augmenting the capabilities of human teams and providing information that would otherwise be inaccessible. Our research aims to develop new theory and technologies for field deploying autonomous UAVs and managing multi-UAV teams working in concert with multi-human teams for SAR. Specifically, in this paper we summarize our work in progress towards these goals, including: (1) a multi-UAV search path planner that adapts to human behavior; (2) an in-field distributed computing prototype that supports multi-UAV computation and communication; (3) behavioral modeling that yields spatially localized predictions of lost person location; and (4) an interface between human searchers and UAVs that facilitates human-UAV interaction over a wide range of autonomy.
  • Adversarial Machine Learning for NextG Covert Communications Using Multiple Antennas
    Kim, Brian; Sagduyu, Yalin; Davaslioglu, Kemal; Erpek, Tugba; Ulukus, Sennur (MDPI, 2022-07-29)
    This paper studies the privacy of wireless communications from an eavesdropper that employs a deep learning (DL) classifier to detect transmissions of interest. There exists one transmitter that transmits to its receiver in the presence of an eavesdropper. In the meantime, a cooperative jammer (CJ) with multiple antennas transmits carefully crafted adversarial perturbations over the air to fool the eavesdropper into classifying the received superposition of signals as noise. While generating the adversarial perturbation at the CJ, multiple antennas are utilized to improve the attack performance in terms of fooling the eavesdropper. Two main points are considered while exploiting the multiple antennas at the adversary, namely the power allocation among antennas and the utilization of channel diversity. To limit the impact on the bit error rate (BER) at the receiver, the CJ puts an upper bound on the strength of the perturbation signal. Performance results show that this adversarial perturbation causes the eavesdropper to misclassify the received signals as noise with a high probability while increasing the BER at the legitimate receiver only slightly. Furthermore, the adversarial perturbation is shown to become more effective when multiple antennas are utilized.