Browsing by Author "Freeman, Laura J."
Now showing 1 - 15 of 15
Results Per Page
Sort Options
- Active Learning with Combinatorial CoverageKatragadda, Sai Prathyush (Virginia Tech, 2022-08-04)Active learning is a practical field of machine learning as labeling data or determining which data to label can be a time consuming and inefficient task. Active learning automates the process of selecting which data to label, but current methods are heavily model reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment. We propose active learning methods utilizing Combinatorial Coverage to overcome these issues. The proposed methods are data-centric, and through our experiments we show that the inclusion of coverage in active learning leads to sampling data that tends to be the best in transferring to different models and has a competitive sampling bias compared to benchmark methods.
- Advancements on the Interface of Computer Experiments and Survival AnalysisWang, Yueyao (Virginia Tech, 2022-07-20)Design and analysis of computer experiments is an area focusing on efficient data collection (e.g., space-filling designs), surrogate modeling (e.g., Gaussian process models), and uncertainty quantification. Survival analysis focuses on modeling the period of time until a certain event happens. Data collection, prediction, and uncertainty quantification are also fundamental in survival models. In this dissertation, the proposed methods are motivated by a wide range of real world applications, including high-performance computing (HPC) variability data, jet engine reliability data, Titan GPU lifetime data, and pine tree survival data. This dissertation is to explore interfaces on computer experiments and survival analysis with the above applications. Chapter 1 provides a general introduction to computer experiments and survival analysis. Chapter 2 focuses on the HPC variability management application. We investigate the applicability of space-filling designs and statistical surrogates in the HPC variability management setting, in terms of design efficiency, prediction accuracy, and scalability. A comprehensive comparison of the design strategies and predictive methods is conducted to study the combinations' performance in prediction accuracy. Chapter 3 focuses on the reliability prediction application. With the availability of multi-channel sensor data, a single degradation index is needed to be compatible with most existing models. We propose a flexible framework with multi-sensory data to model the nonlinear relationship between sensors and the degradation process. We also involve the automatic variable selection to exclude sensors that have no effect on the underlying degradation process. Chapter 4 investigates inference approaches for spatial survival analysis under the Bayesian framework. The Markov chain Monte Carlo (MCMC) approaches and variational inferences performance are studied for two survival models, the cumulative exposure model and the proportional hazard (PH) model. The Titan GPU data and pine tree survival data are used to illustrate the capability of variational inference on spatial survival models. Chapter 5 provides some general conclusions.
- Analysis of Reliability Experiments with Random Blocks and SubsamplingKensler, Jennifer Lin Karam (Virginia Tech, 2012-07-20)Reliability experiments provide important information regarding the life of a product, including how various factors may affect product life. Current analyses of reliability data usually assume a completely randomized design. However, reliability experiments frequently contain subsampling which is a restriction on randomization. A typical experiment involves applying treatments to test stands, with several items placed on each test stand. In addition, raw materials used in experiments are often produced in batches. In some cases one batch may not be large enough to provide materials for the entire experiment and more than one batch must be used. These batches lead to a design involving blocks. This dissertation proposes two methods for analyzing reliability experiments with random blocks and subsampling. The first method is a two-stage method which can be implemented in software used by most practitioners, but has some limitations. Therefore, a more rigorous nonlinear mixed model method is proposed.
- Building trustworthy machine learning systems in adversarial environmentsWang, Ning (Virginia Tech, 2023-05-26)Modern AI systems, particularly with the rise of big data and deep learning in the last decade, have greatly improved our daily life and at the same time created a long list of controversies. AI systems are often subject to malicious and stealthy subversion that jeopardizes their efficacy. Many of these issues stem from the data-driven nature of machine learning. While big data and deep models significantly boost the accuracy of machine learning models, they also create opportunities for adversaries to tamper with models or extract sensitive data. Malicious data providers can compromise machine learning systems by supplying false data and intermediate computation results. Even a well-trained model can be deceived to misbehave by an adversary who provides carefully designed inputs. Furthermore, curious parties can derive sensitive information of the training data by interacting with a machine-learning model. These adversarial scenarios, known as poisoning attack, adversarial example attack, and inference attack, have demonstrated that security, privacy, and robustness have become more important than ever for AI to gain wider adoption and societal trust. To address these problems, we proposed the following solutions: (1) FLARE, which detects and mitigates stealthy poisoning attacks by leveraging latent space representations; (2) MANDA, which detects adversarial examples by utilizing evaluations from diverse sources, i.e, model-based prediction and data-based evaluation; (3) FeCo which enhances the robustness of machine learning-based network intrusion detection systems by introducing a novel representation learning method; and (4) DP-FedMeta, which preserves data privacy and improves the privacy-accuracy trade-off in machine learning systems through a novel adaptive clipping mechanism.
- Contributions to the Interface between Experimental Design and Machine LearningLian, Jiayi (Virginia Tech, 2023-07-31)In data science, machine learning methods, such as deep learning and other AI algorithms, have been widely used in many applications. These machine learning methods often have complicated model structures with a large number of model parameters and a set of hyperparameters. Moreover, these machine learning methods are data-driven in nature. Thus, it is not easy to provide a comprehensive evaluation on the performance of these machine learning methods with respect to the data quality and hyper-parameters of the algorithms. In the statistical literature, design of experiments (DoE) is a set of systematical methods to effectively investigate the effects of input factors for the complex systems. There are few works focusing on the use of DoE methodology for evaluating the quality assurance of AI algorithms, while an AI algorithm is naturally a complex system. An understanding of the quality of Artificial Intelligence (AI) algorithms is important for confidently deploying them in real applications such as cybersecurity, healthcare, and autonomous driving. In this proposal, I aim to develop a set of novel methods on the interface between experimental design and machine learning, providing a systematical framework of using DoE methodology for AI algorithms. This proposal contains six chapters. Chapter 1 provides a general introduction of design of experiments, machine learning, and surrogate modeling. Chapter 2 focuses on investigating the robustness of AI classification algorithms by conducting a comprehensive set of mixture experiments. Chapter 3 proposes a so-called Do-AIQ framework of using DoE for evaluating the AI algorithm’s quality assurance. I establish a design-of-experiment framework to construct an efficient space-filling design in a high-dimensional constraint space and develop an effective surrogate model using additive Gaussian process to enable the quality assessment of AI algorithms. Chapter 4 introduces a framework to generate continual learning (CL) datsets for cybersecurity applications. Chapter 5 presents a variable selection method under cumulative exposure model for time-to-event data with time-varying covariates. Chapter 6 provides the summary of the entire dissertation.
- Enabling Artificial Intelligence Adoption through AssuranceFreeman, Laura J.; Rahman, Abdul; Batarseh, Feras A. (MDPI, 2021-08-25)The wide scale adoption of Artificial Intelligence (AI) will require that AI engineers and developers can provide assurances to the user base that an algorithm will perform as intended and without failure. Assurance is the safety valve for reliable, dependable, explainable, and fair intelligent systems. AI assurance provides the necessary tools to enable AI adoption into applications, software, hardware, and complex systems. AI assurance involves quantifying capabilities and associating risks across deployments including: data quality to include inherent biases, algorithm performance, statistical errors, and algorithm trustworthiness and security. Data, algorithmic, and context/domain-specific factors may change over time and impact the ability of AI systems in delivering accurate outcomes. In this paper, we discuss the importance and different angles of AI assurance, and present a general framework that addresses its challenges.
- Explainable Neural Claim Verification Using RationalizationGurrapu, Sai Charan (Virginia Tech, 2022-06-15)The dependence on Natural Language Processing (NLP) systems has grown significantly in the last decade. Recent advances in deep learning have enabled language models to generate high-quality text at the same level as human-written text. If this growth continues, it can potentially lead to increased misinformation, which is a significant challenge. Although claim verification techniques exist, they lack proper explainability. Numerical scores such as Attention and Lime and visualization techniques such as saliency heat maps are insufficient because they require specialized knowledge. It is inaccessible and challenging for the nonexpert to understand black-box NLP systems. We propose a novel approach called, ExClaim for explainable claim verification using NLP rationalization. We demonstrate that our approach can predict a verdict for the claim but also justify and rationalize its output as a natural language explanation (NLE). We extensively evaluate the system using statistical and Explainable AI (XAI) metrics to ensure the outcomes are valid, verified, and trustworthy to help reinforce the human-AI trust. We propose a new subfield in XAI called Rational AI (RAI) to improve research progress on rationalization and NLE-based explainability techniques. Ensuring that claim verification systems are assured and explainable is a step towards trustworthy AI systems and ultimately helps mitigate misinformation.
- Improving Deep Learning for Maritime Remote Sensing through Data Augmentation and Latent SpaceSobien, Daniel; Higgins, Erik; Krometis, Justin; Kauffman, Justin; Freeman, Laura J. (MDPI, 2022-07-07)Training deep learning models requires having the right data for the problem and understanding both your data and the models’ performance on that data. Training deep learning models is difficult when data are limited, so in this paper, we seek to answer the following question: how can we train a deep learning model to increase its performance on a targeted area with limited data? We do this by applying rotation data augmentations to a simulated synthetic aperture radar (SAR) image dataset. We use the Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction technique to understand the effects of augmentations on the data in latent space. Using this latent space representation, we can understand the data and choose specific training samples aimed at boosting model performance in targeted under-performing regions without the need to increase training set sizes. Results show that using latent space to choose training data significantly improves model performance in some cases; however, there are other cases where no improvements are made. We show that linking patterns in latent space is a possible predictor of model performance, but results require some experimentation and domain knowledge to determine the best options.
- Latent Walking Techniques for Conditioning GAN-Generated MusicEisenbeiser, Logan Ryan (Virginia Tech, 2020-09-21)Artificial music generation is a rapidly developing field focused on the complex task of creating neural networks that can produce realistic-sounding music. Generating music is very difficult; components like long and short term structure present time complexity, which can be difficult for neural networks to capture. Additionally, the acoustics of musical features like harmonies and chords, as well as timbre and instrumentation require complex representations for a network to accurately generate them. Various techniques for both music representation and network architecture have been used in the past decade to address these challenges in music generation. The focus of this thesis extends beyond generating music to the challenge of controlling and/or conditioning that generation. Conditional generation involves an additional piece or pieces of information which are input to the generator and constrain aspects of the results. Conditioning can be used to specify a tempo for the generated song, increase the density of notes, or even change the genre. Latent walking is one of the most popular techniques in conditional image generation, but its effectiveness on music-domain generation is largely unexplored. This paper focuses on latent walking techniques for conditioning the music generation network MuseGAN and examines the impact of this conditioning on the generated music.
- Machine Learning and Data Fusion of Simulated Remote Sensing DataHiggins, Erik Tracy (Virginia Tech, 2023-07-27)Modeling and simulation tools are described and implemented in a single workflow to develop a means of simulating a ship wake followed by simulated synthetic aperture radar (SAR) and infra-red (IR) images of these ship wakes. A parametric study across several different ocean environments and simulated remote sensing platforms is conducted to generate a preliminary data set that is used for training and testing neural network--based ship wake detection models. Several different model architectures are trained and tested, which are able to provide a high degree of accuracy in classifying whether input SAR images contain a persistent ship wake. Several data fusion models are explored to understand how fusing data from different SAR bands may improve ship wake detection, with some combinations of neural networks and data fusion models achieving perfect or near-perfect performance. Finally, an outline for a future study into multi-physics data fusion across multiple sensor modalities is created and discussed.
- Review: Is design data collection still relevant in the big data era? With extensions to machine learningFreeman, Laura J. (Wiley, 2023-06)
- Statistical Methods for Improving and Maintaining Product ReliabilityDickinson, Rebecca (Virginia Tech, 2014-09-17)When a reliability experiment is used, practitioners can understand better what lifetimes to expect of a product under different operating conditions and what factors are important to designing reliability into a product. Reliability experiments, however, can be very challenging to analyze because often the reliability or lifetime data tend to follow distinctly non-normal distributions and the experiments typically involve censoring. Time and cost constraints may also lead to reliability experiments with experimental protocols that are not completely randomized. In many industrial experiments, for example, the split-plot structure arises when the randomization of the experimental runs is restricted. Additionally, for many reliability experiments, it is often cost effective to apply a treatment combination to a stand with multiple units on it as opposed to each unit individually, which introduces subsampling. The analysis of lifetime data assuming a completely randomized design has been well studied, but until recently analysis methodologies for more complex experimental designs with multiple error terms have not been a focus of the reliability field. This dissertation provides two analysis methods for analyzing right-censored Weibull distributed lifetime data from a split-plot experiment with subsampling. We evaluate the proposed methods through a simulation study. Companies also routinely perform life tests on their products to ensure that products meet requirements. Each of these life tests typically involves testing several units simultaneously with interest in the times to failure. Again, the fact that lifetime data tend to be nonnormally distributed and censored make the development of a control charting procedure more demanding. In this dissertation, one-sided lower and upper likelihood ratio based cumulative sum (CUSUM) control charting procedures are developed for right-censored Weibull lifetime data to monitor changes in the scale parameter, also known as the characteristic life, for a fixed value of the Weibull shape parameter. Because a decrease in the characteristic life indicates a decrease in the mean lifetime of a product, a one-sided lower CUSUM chart is the main focus. We illustrate the development and implementation of the chart and evaluate the properties through a simulation study.
- Statistical Methods for Reliability Data from Designed ExperimentsFreeman, Laura J. (Virginia Tech, 2010-05-06)Product reliability is an important characteristic for all manufacturers, engineers and consumers. Industrial statisticians have been planning experiments for years to improve product quality and reliability. However, rarely do experts in the field of reliability have expertise in design of experiments (DOE) and the implications that experimental protocol have on data analysis. Additionally, statisticians who focus on DOE rarely work with reliability data. As a result, analysis methods for lifetime data for experimental designs that are more complex than a completely randomized design are extremely limited. This dissertation provides two new analysis methods for reliability data from life tests. We focus on data from a sub-sampling experimental design. The new analysis methods are illustrated on a popular reliability data set, which contains sub-sampling. Monte Carlo simulation studies evaluate the capabilities of the new modeling methods. Additionally, Monte Carlo simulation studies highlight the principles of experimental design in a reliability context. The dissertation provides multiple methods for statistical inference for the new analysis methods. Finally, implications for the reliability field are discussed, especially in future applications of the new analysis methods.
- A survey on artificial intelligence assuranceBatarseh, Feras A.; Freeman, Laura J.; Huang, Chih-Hao (2021-04-26)Artificial Intelligence (AI) algorithms are increasingly providing decision making and operational support across multiple domains. AI includes a wide (and growing) library of algorithms that could be applied for different problems. One important notion for the adoption of AI algorithms into operational decision processes is the concept of assurance. The literature on assurance, unfortunately, conceals its outcomes within a tangled landscape of conflicting approaches, driven by contradicting motivations, assumptions, and intuitions. Accordingly, albeit a rising and novel area, this manuscript provides a systematic review of research works that are relevant to AI assurance, between years 1985 and 2021, and aims to provide a structured alternative to the landscape. A new AI assurance definition is adopted and presented, and assurance methods are contrasted and tabulated. Additionally, a ten-metric scoring system is developed and introduced to evaluate and compare existing methods. Lastly, in this manuscript, we provide foundational insights, discussions, future directions, a roadmap, and applicable recommendations for the development and deployment of AI assurance.
- Technical Report on the Evaluation of Median Rank Regression and Maximum Likelihood Estimation Techniques for a Two-Parameter Weibull DistributionOlteanu, Denisa; Freeman, Laura J. (Virginia Tech, 2008)Practitioners frequently model failure times in reliability analysis via the Weibull distribution. Often risk managers must make decisions after only a few failures. Thus, an important question is how to estimate the parameters of this distribution for small sample sizes. This study evaluates two methods: maximum likelihood estimation and median rank regression. Asymptotically, we know that maximum likelihood estimation has superior properties; however, this study seeks to evaluate these two methods for small numbers of failures and high degrees of censoring. Specifically, this paper compares the two estimation methods based on their ability to estimate the individual parameters, and the methods’ ability to predict future failures. The last section of the paper provides recommendations on which method to use based on sample size, the parameter values, and the degree of censoring present in the data.