Browsing by Author "Hong, Yili"
Now showing 1 - 20 of 37
Results Per Page
Sort Options
- Accelerated Life Test Modeling Using Median Rank RegressionRhodes, Austin James (Virginia Tech, 2016-11-01)Accelerated life tests (ALT) are appealing to practitioners seeking to maximize information gleaned from reliability studies, while navigating resource constraints due to time and specimen costs. A popular approach to accelerated life testing is to design test regimes such that experimental specimens are exposed to variable stress levels across time. Such ALT experiments allow the practitioner to observe lifetime behavior across various stress levels and infer product life at use conditions using a greater number of failures than would otherwise be observed with a constant stress experiment. The downside to accelerated life tests, however, particularly for those that utilize non-constant stress levels across time on test, is that the corresponding lifetime models are largely dependent upon assumptions pertaining to variant stress. Although these assumptions drive inference at product use conditions, little to no statistical methods exist for assessing their validity. One popular assumption that is prevalent in both literature and practice is the cumulative exposure model which assumes that, at a given time on test, specimen life is solely driven by the integrated stress history and that current lifetime behavior is path independent of the stress trajectory. This dissertation challenges such black box ALT modeling procedures and focuses on the cumulative exposure model in particular. For a simple strep-stress accelerated life test, using two constant stress levels across time on test, we propose a four-parameter Weibull lifetime model that utilizes a threshold parameter to account for the stress transition. To circumvent regularity conditions imposed by maximum likelihood procedures, we use median rank regression to fit and assess our lifetime model. We improve the model fit using a novel incorporation of desirability functions and ultimately evaluate our proposed methods using an extensive simulation study. Finally, we provide an illustrative example to highlight the implementation of our method, comparing it to a corresponding Bayesian analysis.
- Advancements in Degradation Modeling, Uncertainty Quantification and Spatial Variable SelectionXie, Yimeng (Virginia Tech, 2016-06-30)This dissertation focuses on three research projects: 1) construction of simultaneous prediction intervals/bounds for at least k out of m future observations; 2) semi-parametric degradation model for accelerated destructive degradation test (ADDT) data; and 3) spatial variable selection and application to Lyme disease data in Virginia. Followed by the general introduction in Chapter 1, the rest of the dissertation consists of three main chapters. Chapter 2 presents the construction of two-sided simultaneous prediction intervals (SPIs) or one-sided simultaneous prediction bounds (SPBs) to contain at least k out of m future observations, based on complete or right censored data from (log)-location-scale family of distributions. SPI/SPB calculated by the proposed procedure has exact coverage probability for complete and Type II censored data. In Type I censoring case, it has asymptotically correct coverage probability and reasonably good results for small samples. The proposed procedures can be extended to multiply-censored data or randomly censored data. Chapter 3 focuses on the analysis of ADDT data. We use a general degradation path model with correlated covariance structure to describe ADDT data. Monotone B-splines are used to modeling the underlying degradation process. A likelihood based iterative procedure for parameter estimation is developed. The confidence intervals of parameters are calculated using the nonparametric bootstrap procedure. Both simulated data and real datasets are used to compare the semi-parametric model with the existing parametric models. Chapter 4 studies the Lyme disease emergence in Virginia. The objective is to find important environmental and demographical covariates that are associated with Lyme disease emergence. To address the high-dimentional integral problem in the loglikelihood function, we consider the penalized quasi loglikelihood and the approximated loglikelihood based on Laplace approximation. We impose the adaptive elastic net penalty to obtain sparse estimation of parameters and thus to achieve variable selection of important variables. The proposed methods are investigated in simulation studies. We also apply the proposed methods to Lyme disease data in Virginia. Finally, Chapter 5 contains general conclusions and discussions for future work.
- Advancements on the Interface of Computer Experiments and Survival AnalysisWang, Yueyao (Virginia Tech, 2022-07-20)Design and analysis of computer experiments is an area focusing on efficient data collection (e.g., space-filling designs), surrogate modeling (e.g., Gaussian process models), and uncertainty quantification. Survival analysis focuses on modeling the period of time until a certain event happens. Data collection, prediction, and uncertainty quantification are also fundamental in survival models. In this dissertation, the proposed methods are motivated by a wide range of real world applications, including high-performance computing (HPC) variability data, jet engine reliability data, Titan GPU lifetime data, and pine tree survival data. This dissertation is to explore interfaces on computer experiments and survival analysis with the above applications. Chapter 1 provides a general introduction to computer experiments and survival analysis. Chapter 2 focuses on the HPC variability management application. We investigate the applicability of space-filling designs and statistical surrogates in the HPC variability management setting, in terms of design efficiency, prediction accuracy, and scalability. A comprehensive comparison of the design strategies and predictive methods is conducted to study the combinations' performance in prediction accuracy. Chapter 3 focuses on the reliability prediction application. With the availability of multi-channel sensor data, a single degradation index is needed to be compatible with most existing models. We propose a flexible framework with multi-sensory data to model the nonlinear relationship between sensors and the degradation process. We also involve the automatic variable selection to exclude sensors that have no effect on the underlying degradation process. Chapter 4 investigates inference approaches for spatial survival analysis under the Bayesian framework. The Markov chain Monte Carlo (MCMC) approaches and variational inferences performance are studied for two survival models, the cumulative exposure model and the proportional hazard (PH) model. The Titan GPU data and pine tree survival data are used to illustrate the capability of variational inference on spatial survival models. Chapter 5 provides some general conclusions.
- Analysis of Reliability Experiments with Random Blocks and SubsamplingKensler, Jennifer Lin Karam (Virginia Tech, 2012-07-20)Reliability experiments provide important information regarding the life of a product, including how various factors may affect product life. Current analyses of reliability data usually assume a completely randomized design. However, reliability experiments frequently contain subsampling which is a restriction on randomization. A typical experiment involves applying treatments to test stands, with several items placed on each test stand. In addition, raw materials used in experiments are often produced in batches. In some cases one batch may not be large enough to provide materials for the entire experiment and more than one batch must be used. These batches lead to a design involving blocks. This dissertation proposes two methods for analyzing reliability experiments with random blocks and subsampling. The first method is a two-stage method which can be implemented in software used by most practitioners, but has some limitations. Therefore, a more rigorous nonlinear mixed model method is proposed.
- Bridging Machine Learning and Experimental Design for Enhanced Data Analysis and OptimizationGuo, Qing (Virginia Tech, 2024-07-19)Experimental design is a powerful tool for gathering highly informative observations using a small number of experiments. The demand for smart data collection strategies is increasing due to the need to save time and budget, especially in online experiments and machine learning. However, the traditional experimental design method falls short in systematically assessing changing variables' effects. Specifically within Artificial Intelligence (AI), the challenge lies in assessing the impacts of model structures and training strategies on task performances with a limited number of trials. This shortfall underscores the necessity for the development of novel approaches. On the other side, the optimal design criterion has typically been model-based in classic design literature, which leads to restricting the flexibility of experimental design strategies. However, machine learning's inherent flexibility can empower the estimation of metrics efficiently using nonparametric and optimization techniques, thereby broadening the horizons of experimental design possibilities. In this dissertation, the aim is to develop a set of novel methods to bridge the merits between these two domains: 1) applying ideas from statistical experimental design to enhance data efficiency in machine learning, and 2) leveraging powerful deep neural networks to optimize experimental design strategies. This dissertation consists of 5 chapters. Chapter 1 provides a general introduction to mutual information, fractional factorial design, hyper-parameter tuning, multi-modality, etc. In Chapter 2, I propose a new mutual information estimator FLO by integrating techniques from variational inference (VAE), contrastive learning, and convex optimization. I apply FLO to broad data science applications, such as efficient data collection, transfer learning, fair learning, etc. Chapter 3 introduces a new design strategy called multi-layer sliced design (MLSD) with the application of AI assurance. It focuses on exploring the effects of hyper-parameters under different models and optimization strategies. Chapter 4 investigates classic vision challenges via multimodal large language models by implicitly optimizing mutual information and thoroughly exploring training strategies. Chapter 5 concludes this proposal and discusses several future research topics.
- Bridging the Gap: Selected Problems in Model Specification, Estimation, and Optimal Design from Reliability and Lifetime Data AnalysisKing, Caleb B. (Virginia Tech, 2015-04-13)Understanding the lifetime behavior of their products is crucial to the success of any company in the manufacturing and engineering industries. Statistical methods for lifetime data are a key component to achieving this level of understanding. Sometimes a statistical procedure must be updated to be adequate for modeling specific data as is discussed in Chapter 2. However, there are cases in which the methods used in industrial standards are themselves inadequate. This is distressing as more appropriate statistical methods are available but remain unused. The research in Chapter 4 deals with such a situation. The research in Chapter 3 serves as a combination of both scenarios and represents how both statisticians and engineers from the industry can join together to yield beautiful results. After introducing basic concepts and notation in Chapter 1, Chapter 2 focuses on lifetime prediction for a product consisting of multiple components. During the production period, some components may be upgraded or replaced, resulting in a new ``generation" of component. Incorporating this information into a competing risks model can greatly improve the accuracy of lifetime prediction. A generalized competing risks model is proposed and simulation is used to assess its performance. In Chapter 3, optimal and compromise test plans are proposed for constant amplitude fatigue testing. These test plans are based on a nonlinear physical model from the fatigue literature that is able to better capture the nonlinear behavior of fatigue life and account for effects from the testing environment. Sensitivity to the design parameters and modeling assumptions are investigated and suggestions for planning strategies are proposed. Chapter 4 considers the analysis of ADDT data for the purposes of estimating a thermal index. The current industry standards use a two-step procedure involving least squares regression in each step. The methodology preferred in the statistical literature is the maximum likelihood procedure. A comparison of the procedures is performed and two published datasets are used as motivating examples. The maximum likelihood procedure is presented as a more viable alternative to the two-step procedure due to its ability to quantify uncertainty in data inference and modeling flexibility.
- Change in Reports of Unmet Need For Help with ADL or Mobility DisabilitiesSands, Laura P.; Yuan, Miao; Xie, Yimeng; Hong, Yili (Virginia Tech, 2015)Self-care (SC) and Mobility (MO) disabled older adults require the help of others to successfully complete daily tasks. Thirty percent of respondents to the 2011 NHATS survey reported unmet need for one or more SC or MO disabilities. Reports of unmet need for disabilities is associated with: Future hospitalization¹ Readmission² Emergency Department use³ Mortality⁴ Little is known about patterns of unmet need over time, especially the degree to which unmet need resolves, varies, or begins. Determination of predictors of change in unmet need status would inform the development of interventions to reduce unmet need.
- Contributions to Data Reduction and Statistical Model of Data with Complex StructuresWei, Yanran (Virginia Tech, 2022-08-30)With advanced technology and information explosion, the data of interest often have complex structures, with the large size and dimensions in the form of continuous or discrete features. There is an emerging need for data reduction, efficient modeling, and model inference. For example, data can contain millions of observations with thousands of features. Traditional methods, such as linear regression or LASSO regression, cannot effectively deal with such a large dataset directly. This dissertation aims to develop several techniques to effectively analyze large datasets with complex structures in the observational, experimental and time series data. In Chapter 2, I focus on the data reduction for model estimation of sparse regression. The commonly-used subdata selection method often considers sampling or feature screening. Un- der the case of data with both large number of observation and predictors, we proposed a filtering approach for model estimation (FAME) to reduce both the size of data points and features. The proposed algorithm can be easily extended for data with discrete response or discrete predictors. Through simulations and case studies, the proposed method provides a good performance for parameter estimation with efficient computation. In Chapter 3, I focus on modeling the experimental data with quantitative-sequence (QS) factor. Here the QS factor concerns both quantities and sequence orders of several compo- nents in the experiment. Existing methods usually can only focus on the sequence orders or quantities of the multiple components. To fill this gap, we propose a QS transformation to transform the QS factor to a generalized permutation matrix, and consequently develop a simple Gaussian process approach to model the experimental data with QS factors. In Chapter 4, I focus on forecasting multivariate time series data by leveraging the au- toregression and clustering. Existing time series forecasting method treat each series data independently and ignore their inherent correlation. To fill this gap, I proposed a clustering based on autoregression and control the sparsity of the transition matrix estimation by adap- tive lasso and clustering coefficient. The clustering-based cross prediction can outperforms the conventional time series forecasting methods. Moreover, the the clustering result can also enhance the forecasting accuracy of other forecasting methods. The proposed method can be applied on practical data, such as stock forecasting, topic trend detection.
- Contributions to the Interface between Experimental Design and Machine LearningLian, Jiayi (Virginia Tech, 2023-07-31)In data science, machine learning methods, such as deep learning and other AI algorithms, have been widely used in many applications. These machine learning methods often have complicated model structures with a large number of model parameters and a set of hyperparameters. Moreover, these machine learning methods are data-driven in nature. Thus, it is not easy to provide a comprehensive evaluation on the performance of these machine learning methods with respect to the data quality and hyper-parameters of the algorithms. In the statistical literature, design of experiments (DoE) is a set of systematical methods to effectively investigate the effects of input factors for the complex systems. There are few works focusing on the use of DoE methodology for evaluating the quality assurance of AI algorithms, while an AI algorithm is naturally a complex system. An understanding of the quality of Artificial Intelligence (AI) algorithms is important for confidently deploying them in real applications such as cybersecurity, healthcare, and autonomous driving. In this proposal, I aim to develop a set of novel methods on the interface between experimental design and machine learning, providing a systematical framework of using DoE methodology for AI algorithms. This proposal contains six chapters. Chapter 1 provides a general introduction of design of experiments, machine learning, and surrogate modeling. Chapter 2 focuses on investigating the robustness of AI classification algorithms by conducting a comprehensive set of mixture experiments. Chapter 3 proposes a so-called Do-AIQ framework of using DoE for evaluating the AI algorithm’s quality assurance. I establish a design-of-experiment framework to construct an efficient space-filling design in a high-dimensional constraint space and develop an effective surrogate model using additive Gaussian process to enable the quality assessment of AI algorithms. Chapter 4 introduces a framework to generate continual learning (CL) datsets for cybersecurity applications. Chapter 5 presents a variable selection method under cumulative exposure model for time-to-event data with time-varying covariates. Chapter 6 provides the summary of the entire dissertation.
- Corporate Default Predictions and Methods for Uncertainty QuantificationsYuan, Miao (Virginia Tech, 2016-08-01)Regarding quantifying uncertainties in prediction, two projects with different perspectives and application backgrounds are presented in this dissertation. The goal of the first project is to predict the corporate default risks based on large-scale time-to-event and covariate data in the context of controlling credit risks. Specifically, we propose a competing risks model to incorporate exits of companies due to default and other reasons. Because of the stochastic and dynamic nature of the corporate risks, we incorporate both company-level and market-level covariate processes into the event intensities. We propose a parsimonious Markovian time series model and a dynamic factor model (DFM) to efficiently capture the mean and correlation structure of the high-dimensional covariate dynamics. For estimating parameters in the DFM, we derive an expectation maximization (EM) algorithm in explicit forms under necessary constraints. For multi-period default risks, we consider both the corporate-level and the market-level predictions. We also develop prediction interval (PI) procedures that synthetically take uncertainties in the future observation, parameter estimation, and the future covariate processes into account. In the second project, to quantify the uncertainties in the maximum likelihood (ML) estimators and compute the exact tolerance interval (TI) factors regarding the nominal confidence level, we propose algorithms for two-sided control-the-center and control-both-tails TI for complete or Type II censored data following the (log)-location-scale family of distributions. Our approaches are based on pivotal properties of ML estimators of parameters for the (log)-location-scale family and utilize the Monte-Carlo simulations. While for Type I censored data, only approximate pivotal quantities exist. An adjusted procedure is developed to compute the approximate factors. The observed CP is shown to be asymptotically accurate by our simulation study. Our proposed methods are illustrated using real-data examples.
- Cure Rate Models with Nonparametric Form of Covariate EffectsChen, Tianlei (Virginia Tech, 2015-06-02)This thesis focuses on development of spline-based hazard estimation models for cure rate data. Such data can be found in survival studies with long term survivors. Consequently, the population consists of the susceptible and non-susceptible sub-populations with the latter termed as "cured". The modeling of both the cure probability and the hazard function of the susceptible sub-population is of practical interest. Here we propose two smoothing-splines based models falling respectively into the popular classes of two component mixture cure rate models and promotion time cure rate models. Under the framework of two component mixture cure rate model, Wang, Du and Liang (2012) have developed a nonparametric model where the covariate effects on both the cure probability and the hazard component are estimated by smoothing splines. Our first development falls under the same framework but estimates the hazard component based on the accelerated failure time model, instead of the proportional hazards model in Wang, Du and Liang (2012). Our new model has better interpretation in practice. The promotion time cure rate model, motivated from a simplified biological interpretation of cancer metastasis, was first proposed only a few decades ago. Nonetheless, it has quickly become a competitor to the mixture models. Our second development aims to provide a nonparametric alternative to the existing parametric or semiparametric promotion time models.
- The Design of GLR Control Charts for Process MonitoringXu, Liaosa (Virginia Tech, 2013-02-27)Generalized likelihood ratio (GLR) control charts are investigated for two types of statistical process monitoring (SPC) problems. The first part of this dissertation considers the problem of monitoring a normally distributed process variable when a special cause may produce a time varying linear drift in the mean. The design and application of a GLR control chart for drift detection is investigated. The GLR drift chart does not require specification of any tuning parameters by the practitioner, and has the advantage that, at the time of the signal, estimates of both the change point and the drift rate are immediately available. An equation is provided to accurately approximate the control limit. The performance of the GLR drift chart is compared to other control charts such as a standard CUSUM chart and a CUSCORE chart designed for drift detection. We also compare the GLR chart designed for drift detection to the GLR chart designed for sustained shift detection since both of them require only a control limit to be specified. In terms of the expected time for detection and in terms of the bias and mean squared error of the change-point estimators, the GLR drift chart has better performance for a wide range of drift rates relative to the GLR shift chart when the out-of-control process is truly a linear drift. The second part of the dissertation considers the problem of monitoring a linear functional relationship between a response variable and one or more explanatory variables (a linear profile). The design and application of GLR control charts for this problem are investigated. The likelihood ratio test of the GLR chart is generalized over the regression coefficients, the variance of the error term, and the possible change-point. The performance of the GLR chart is compared to various existing control charts. We show that the overall performance of the GLR chart is much better than other options in detecting a wide range of shift sizes. The existing control charts designed for certain shifts that may be of particular interest have several chart parameters that need to be specified by the user, which makes the design of such control charts more difficult. The GLR chart is very simple to design, as it is invariant to the choice of design matrix and the values of in-control parameters. Therefore there is only one design parameter (the control limit) that needs to be specified. Especially, the GLR chart can be constructed based on the sample size of n=1 at each sampling point, whereas other charts cannot be applied. Another advantage of the GLR chart is its built-in diagnostic aids that provide estimates of both the change-point and the values of linear profile parameters.
- Dynamic Probability Control Limits for Risk-Adjusted Bernoulli Cumulative Sum ChartsZhang, Xiang (Virginia Tech, 2015-12-12)The risk-adjusted Bernoulli cumulative sum (CUSUM) chart developed by Steiner et al. (2000) is an increasingly popular tool for monitoring clinical and surgical performance. In practice, however, use of a fixed control limit for the chart leads to quite variable in-control average run length (ARL) performance for patient populations with different risk score distributions. To overcome this problem, the simulation-based dynamic probability control limits (DPCLs) patient-by-patient for the risk-adjusted Bernoulli CUSUM charts is determined in this study. By maintaining the probability of a false alarm at a constant level conditional on no false alarm for previous observations, the risk-adjusted CUSUM charts with DPCLs have consistent in-control performance at the desired level with approximately geometrically distributed run lengths. Simulation results demonstrate that the proposed method does not rely on any information or assumptions about the patients' risk distributions. The use of DPCLs for risk-adjusted Bernoulli CUSUM charts allows each chart to be designed for the corresponding particular sequence of patients for a surgeon or hospital. The effect of estimation error on performance of risk-adjusted Bernoulli CUSUM chart with DPCLs is also examined. Our simulation results show that the in-control performance of risk-adjusted Bernoulli CUSUM chart with DPCLs is affected by the estimation error. The most influential factors are the specified desired in-control average run length, the Phase I sample size and the overall adverse event rate. However, the effect of estimation error is uniformly smaller for the risk-adjusted Bernoulli CUSUM chart with DPCLs than for the corresponding chart with a constant control limit under various realistic scenarios. In addition, there is a substantial reduction in the standard deviation of the in-control run length when DPCLs are used. Therefore, use of DPCLs has yet another advantage when designing a risk-adjusted Bernoulli CUSUM chart. These researches are results of joint work with Dr. William H. Woodall (Department of Statistics, Virginia Tech). Moreover, DPCLs are adapted to design the risk-adjusted CUSUM charts for multiresponses developed by Tang et al. (2015). It is shown that the in-control performance of the charts with DPCLs can be controlled for different patient populations because these limits are determined for each specific sequence of patients. Thus, the risk-adjusted CUSUM chart for multiresponses with DPCLs is more practical and should be applied to effectively monitor surgical performance by hospitals and healthcare practitioners. This research is a result of joint work with Dr. William H. Woodall (Department of Statistics, Virginia Tech) and Mr. Justin Loda (Department of Statistics, Virginia Tech).
- Evaluating Time-varying Effect in Single-type and Multi-type Semi-parametric Recurrent Event ModelsChen, Chen (Virginia Tech, 2015-12-11)This dissertation aims to develop statistical methodologies for estimating the effects of time-fixed and time-varying factors in recurrent events modeling context. The research is motivated by the traffic safety research question of evaluating the influence of crash on driving risk and driver behavior. The methodologies developed, however, are general and can be applied to other fields. Four alternative approaches based on various data settings are elaborated and applied to 100-Car Naturalistic Driving Study in the following Chapters. Chapter 1 provides a general introduction and background of each method, with a sketch of 100-Car Naturalistic Driving Study. In Chapter 2, I assessed the impact of crash on driving behavior by comparing the frequency of distraction events in per-defined windows. A count-based approach based on mixed-effect binomial regression models was used. In Chapter 3, I introduced intensity-based recurrent event models by treating number of Safety Critical Incidents and Near Crash over time as a counting process. Recurrent event models fit the natural generation scheme of the data in this study. Four semi-parametric models are explored: Andersen-Gill model, Andersen-Gill model with stratified baseline functions, frailty model, and frailty model with stratified baseline functions. I derived model estimation procedure and and conducted model comparison via simulation and application. The recurrent event models in Chapter 3 are all based on proportional assumption, where effects are constant. However, the change of effects over time is often of primary interest. In Chapter 4, I developed time-varying coefficient model using penalized B-spline function to approximate varying coefficients. Shared frailty terms was used to incorporate correlation within subjects. Inference and statistical test are also provided. Frailty representation was proposed to link time-varying coefficient model with regular frailty model. In Chapter 5, I further extended framework to accommodate multi-type recurrent events with time-varying coefficient. Two types of recurrent-event models were developed. These models incorporate correlation among intensity functions from different type of events by correlated frailty terms. Chapter 6 gives a general review on the contributions of this dissertation and discussion of future research directions.
- Frequentist-Bayesian Hybrid Tests in Semi-parametric and Non-parametric Models with Low/High-Dimensional CovariateXu, Yangyi (Virginia Tech, 2014-12-03)We provide a Frequentist-Bayesian hybrid test statistic in this dissertation for two testing problems. The first one is to design a test for the significant differences between non-parametric functions and the second one is to design a test allowing any departure of predictors of high dimensional X from constant. The implementation is also given in construction of the proposal test statistics for both problems. For the first testing problem, we consider the statistical difference among massive outcomes or signals to be of interest in many diverse fields including neurophysiology, imaging, engineering, and other related fields. However, such data often have nonlinear system, including to row/column patterns, having non-normal distribution, and other hard-to-identifying internal relationship, which lead to difficulties in testing the significance in difference between them for both unknown relationship and high-dimensionality. In this dissertation, we propose an Adaptive Bayes Sum Test capable of testing the significance between two nonlinear system basing on universal non-parametric mathematical decomposition/smoothing components. Our approach is developed from adapting the Bayes sum test statistic by Hart (2009). Any internal pattern is treated through Fourier transformation. Resampling techniques are applied to construct the empirical distribution of test statistic to reduce the effect of non-normal distribution. A simulation study suggests our approach performs better than the alternative method, the Adaptive Neyman Test by Fan and Lin (1998). The usefulness of our approach is demonstrated with an application in the identification of electronic chips as well as an application to test the change of pattern of precipitations. For the second testing problem, currently numerous statistical methods have been developed for analyzing high-dimensional data. These methods mainly focus on variable selection approach, but are limited for purpose of testing with high-dimensional data, and often are required to have explicit derivative likelihood functions. In this dissertation, we propose ``Hybrid Omnibus Test'' for high-dimensional data testing purpose with much less requirements. Our Hybrid Omnibus Test is developed under semi-parametric framework where likelihood function is no longer necessary. Our Hybrid Omnibus Test is a version of Freqentist-Bayesian hybrid score-type test for a functional generalized partial linear single index model, which has link being functional of predictors through a generalized partially linear single index. We propose an efficient score based on estimating equation to the mathematical difficulty in likelihood derivation and construct our Hybrid Omnibus Test. We compare our approach with a empirical likelihood ratio test and Bayesian inference based on Bayes factor using simulation study in terms of false positive rate and true positive rate. Our simulation results suggest that our approach outperforms in terms of false positive rate, true positive rate, and computation cost in high-dimensional case and low-dimensional case. The advantage of our approach is also demonstrated by published biological results with application to a genetic pathway data of type II diabetes.
- Functional Data Models for Raman Spectral Data and Degradation AnalysisDo, Quyen Ngoc (Virginia Tech, 2022-08-16)Functional data analysis (FDA) studies data in the form of measurements over a domain as whole entities. Our first focus is on the post-hoc analysis with pairwise and contrast comparisons of the popular functional ANOVA model comparing groups of functional data. Existing contrast tests assume independent functional observations within group. In reality, this assumption may not be satisfactory since functional data are often collected continually overtime on a subject. In this work, we introduce a new linear contrast test that accounts for time dependency among functional group members. For a significant contrast test, it can be beneficial to identify the region of significant difference. In the second part, we propose a non-parametric regression procedure to obtain a locally sparse estimate of functional contrast. Our work is motivated by a biomedical study using Raman spectroscopy to monitor hemodialysis treatment near real-time. With contrast test and sparse estimation, practitioners can monitor the progress of the hemodialysis within session and identify important chemicals for dialysis adequacy monitoring. In the third part, we propose a functional data model for degradation analysis of functional data. Motivated by degradation analysis application of rechargeable Li-ion batteries, we combine state-of-the-art functional linear models to produce fully functional prediction for curves on heterogenous domains. Simulation studies and data analysis demonstrate the advantage of the proposed method in predicting degradation measure than existing method using aggregation method.
- Future Lyme disease risk in the south-eastern United States based on projected land coverStevens, Logan K.; Kolivras, Korine N.; Hong, Yili; Thomas, Valerie A.; Campbell, James B. Jr.; Prisley, Stephen P. (Page Press, 2019-03-11)Lyme disease is the most significant vector-borne disease in the United States, and its southward advance over several decades has been quantified. Previous research has examined the potential role of climate change on the disease’s expansion, but no studies have considered the role of future land cover upon its distribution. This research examines Lyme disease risk in the south-eastern U.S. based on projected land cover developed under four Intergovernmental Panel on Climate Change Scenarios: A1B, A2, B1, and B2. Land cover types and edge indices significantly associated with Lyme disease in Virginia were incorporated into a spatial Poisson regression model to quantify potential land cover suitability for Lyme disease in the south-eastern U.S. under each scenario. Our results indicate an intensification of potential land cover suitability for Lyme disease under the A scenarios and a decrease of potential land cover suitability under the B scenarios. The decrease under the B scenarios is a critical result, indicating that Lyme disease risk can be decreased by making different land cover choices. Additionally, health officials can focus efforts in projected high incidence areas.
- GLR Control Charts for Process Monitoring with Sequential SamplingPeng, Yiming (Virginia Tech, 2014-11-06)The objective of this dissertation is to investigate GLR control charts based on a sequential sampling scheme (SS GLR charts). Phase II monitoring is considered and the goal is to quickly detect a wide range of changes in the univariate normal process mean parameter and/or the variance parameter. The performance of the SS GLR charts is evaluated and design guidelines for SS GLR charts are provided so that practitioners can easily apply the SS GLR charts in applications. More specifically, the structure of this dissertation is as follows: We first develop a two-sided SS GLR chart for monitoring the mean μ of a normal process. The performance of the SS GLR chart is evaluated and compared with other control charts. The SS GLR chart has much better performance than that of the fixed sampling rate GLR chart. It is also shown that the overall performance of the SS GLR chart is better than that of the variable sampling interval (VSI) GLR chart and the variable sampling rate (VSR) CUSUM chart. The SS GLR chart has the additional advantage that it requires fewer parameters to be specified than other VSR charts. The optimal parameter choices are given, and regression equations are provided to find the limits for the SS GLR chart. If detecting one-sided shifts in μ is of interest, the above SS GLR chart can be modified to be a one-sided chart. The performance of this modified SS GLR chart is investigated. Next we develop an SS GLR chart for simultaneously monitoring the mean μ and the variance 𝜎² of a normal process. The performance and properties of this chart are evaluated. The design methodology and some illustrative examples are provided so that the SS GLR chart can be easily used in applications. The optimal parameter choices are given, and the performance of the SS GLR chart remains very good as long as the parameter choices are not too far away from the optimized choices.
- Interpolants, Error Bounds, and Mathematical Software for Modeling and Predicting Variability in Computer SystemsLux, Thomas Christian Hansen (Virginia Tech, 2020-09-23)Function approximation is an important problem. This work presents applications of interpolants to modeling random variables. Specifically, this work studies the prediction of distributions of random variables applied to computer system throughput variability. Existing approximation methods including multivariate adaptive regression splines, support vector regressors, multilayer perceptrons, Shepard variants, and the Delaunay mesh are investigated in the context of computer variability modeling. New methods of approximation using Box splines, Voronoi cells, and Delaunay for interpolating distributions of data with moderately high dimension are presented and compared with existing approaches. Novel theoretical error bounds are constructed for piecewise linear interpolants over functions with a Lipschitz continuous gradient. Finally, a mathematical software that constructs monotone quintic spline interpolants for distribution approximation from data samples is proposed.
- MOANA: Modeling and Analyzing I/O Variability in Parallel System Experimental DesignCameron, Kirk W.; Anwar, Ali; Cheng, Yue; Xu, Li; Li, Bo; Ananth, Uday; Lux, Thomas; Hong, Yili; Watson, Layne T.; Butt, Ali R. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2018-04-19)Exponential increases in complexity and scale make variability a growing threat to sustaining HPC performance at exascale. Performance variability in HPC I/O is common, acute, and formidable. We take the first step towards comprehensively studying linear and nonlinear approaches to modeling HPC I/O system variability. We create a modeling and analysis approach (MOANA) that predicts HPC I/O variability for thousands of software and hardware configurations on highly parallel shared-memory systems. Our findings indicate nonlinear approaches to I/O variability prediction are an order of magnitude more accurate than linear regression techniques. We demonstrate the use of MOANA to accurately predict the confidence intervals of unmeasured I/O system configurations for a given number of repeat runs – enabling users to quantitatively balance experiment duration with statistical confidence.