Browsing by Author "Birch, Jeffrey B."
Now showing 1 - 20 of 94
Results Per Page
Sort Options
- Adapting Response Surface Methods for the Optimization of Black-Box SystemsZielinski, Jacob Jonathan (Virginia Tech, 2010-08-16)Complex mathematical models are often built to describe a physical process that would otherwise be extremely difficult, too costly or sometimes impossible to analyze. Generally, these models require solutions to many partial differential equations. As a result, the computer codes may take a considerable amount of time to complete a single evaluation. A time tested method of analysis for such models is Monte Carlo simulation. These simulations, however, often require many model evaluations, making this approach too computationally expensive. To limit the number of experimental runs, it is common practice to model the departure as a Gaussian stochastic process (GaSP) to develop an emulator of the computer model. One advantage for using an emulator is that once a GaSP is fit to realized outcomes, the computer model is easy to predict in unsampled regions of the input space. This is an attempt to 'characterize' the overall model of the computer code. Most of the historical work on design and analysis of computer experiments focus on the characterization of the computer model over a large region of interest. However, many practitioners seek other objectives, such as input screening (Welch et al., 1992), mapping a response surface, or optimization (Jones et al., 1998). Only recently have researchers begun to consider these topics in the design and analysis of computer experiments. In this dissertation, we explore a more traditional response surface approach (Myers, Montgomery and Anderson-Cook, 2009) in conjunction with traditional computer experiment methods to search for the optimum response of a process. For global optimization, Jones, Schonlau, and Welch's (1998) Efficient Global Optimization (EGO) algorithm remains a benchmark for subsequent research of computer experiments. We compare the proposed method in this paper to this leading benchmark. Our goal is to show that response surface methods can be effective means towards estimating an optimum response in the computer experiment framework.
- Algorithm XXX: SHEPPACK: Modified Shepard Algorithm for Interpolation of Scattered Multivariate DataThacker, William I.; Zhang, Jingwei; Watson, Layne T.; Birch, Jeffrey B.; Iyer, Manjula A.; Berry, Michael W. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2009)Scattered data interpolation problems arise in many applications. Shepard’s method for constructing a global interpolant by blending local interpolants using local-support weight functions usually creates reasonable approximations. SHEPPACK is a Fortran 95 package containing five versions of the modified Shepard algorithm: quadratic (Fortran 95 translations of Algorithms 660, 661, and 798), cubic (Fortran 95 translation of Algorithm 791), and linear variations of the original Shepard algorithm. An option to the linear Shepard code is a statistically robust fit, intended to be used when the data is known to contain outliers. SHEPPACK also includes a hybrid robust piecewise linear estimation algorithm RIPPLE (residual initiated polynomial-time piecewise linear estimation) intended for data from piecewise linear functions in arbitrary dimension m. The main goal of SHEPPACK is to provide users with a single consistent package containing most existing polynomial variations of Shepard’s algorithm. The algorithms target data of different dimensions. The linear Shepard algorithm, robust linear Shepard algorithm, and RIPPLE are the only algorithms in the package that are applicable to arbitrary dimensional data.
- Applications of Control Charts in Medicine and EpidemiologySego, Landon Hugh (Virginia Tech, 2006-04-05)We consider two applications of control charts in health care. The first involves the comparison of four methods designed to detect an increase in the incidence rate of a rare health event, such as a congenital malformation. A number of methods have been proposed: among these are the Sets method, two modifications of the Sets method, and the CUSUM method based on the Poisson distribution. Many of the previously published comparisons of these methods used unrealistic assumptions or ignored implicit assumptions which led to misleading conclusions. We consider the situation where data are observed as a sequence of Bernoulli trials and propose the Bernoulli CUSUM chart as a desirable method for the surveillance of rare health events. We compare the steady-state average run length performance of the Sets methods and its modifications to the Bernoulli CUSUM chart under a wide variety of circumstances. Except in a very few instances we find that the Bernoulli CUSUM chart performs better than the Sets method and its modifications for the extensive number of cases considered. The second application area involves monitoring clinical outcomes, which requires accounting for the fact that each patient has a different risk of death prior to undergoing a health care procedure. We propose a risk-adjusted survival time CUSUM chart (RAST CUSUM) for monitoring clinical outcomes where the primary endpoint is a continuous, time-to-event variable that is right censored. Risk adjustment is accomplished using accelerated failure time regression models. We compare the average run length performance of the RAST CUSUM chart to the risk-adjusted Bernoulli CUSUM chart, using data from cardiac surgeries to motivate the details of the comparison. The comparisons show that the RAST CUSUM chart is more efficient at detecting deterioration in the quality of a clinical procedure than the risk-adjusted Bernoulli CUSUM chart, especially when the fraction of censored observations is not too high. We address details regarding the implementation of a prospective monitoring scheme using the RAST CUSUM chart.
- Asymptotic Results for Model Robust RegressionStarnes, Brett Alden (Virginia Tech, 1999-12-14)Since the mid 1980's many statisticians have studied methods for combining parametric and nonparametric esimates to improve the quality of fits in a regression problem. Notably in 1987, Einsporn and Birch proposed the Model Robust Regression estimate (MRR1) in which estimates of the parametric function, ƒ, and the nonparametric function, 𝑔, were combined in a straightforward fashion via the use of a mixing parameter, λ. This technique was studied extensively at small samples and was shown to be quite effective at modeling various unusual functions. In 1995, Mays and Birch developed the MRR2 estimate as an alternative to MRR1. This model involved first forming the parametric fit to the data, and then adding in an estimate of 𝑔 according to the lack of fit demonstrated by the error terms. Using small samples, they illustrated the superiority of MRR2 to MRR1 in most situations. In this dissertation we have developed asymptotic convergence rates for both MRR1 and MRR2 in OLS and GLS (maximum likelihood) settings. In many of these settings, it is demonstrated that the user of MRR1 or MRR2 achieves the best convergence rates available regardless of whether or not the model is properly specified. This is the "Golden Result of Model Robust Regression". It turns out that the selection of the mixing parameter is paramount in determining whether or not this result is attained.
- Bandwidth Selection Concerns for Jump Point Discontinuity Preservation in the Regression Setting Using M-smoothers and the Extension to hypothesis TestingBurt, David Allan (Virginia Tech, 2000-03-23)Most traditional parametric and nonparametric regression methods operate under the assumption that the true function is continuous over the design space. For methods such as ordinary least squares polynomial regression and local polynomial regression the functional estimates are constrained to be continuous. Fitting a function that is not continuous with a continuous estimate will have practical scientific implications as well as important model misspecification effects. Scientifically, breaks in the continuity of the underlying mean function may correspond to specific physical phenomena that will be hidden from the researcher by a continuous regression estimate. Statistically, misspecifying a mean function as continuous when it is not will result in an increased bias in the estimate. One recently developed nonparametric regression technique that does not constrain the fit to be continuous is the jump preserving M-smooth procedure of Chu, Glad, Godtliebsen & Marron (1998),`Edge-preserving smoothers for image processing', Journal of the American Statistical Association 93(442), 526-541. Chu et al.'s (1998) M-smoother is defined in such a way that the noise about the mean function is smoothed out while jumps in the mean function are preserved. Before the jump preserving M-smoother can be used in practice the choice of the bandwidth parameters must be addressed. The jump preserving M-smoother requires two bandwidth parameters h and g. These two parameters determine the amount of noise that is smoothed out as well as the size of the jumps which are preserved. If these parameters are chosen haphazardly the resulting fit could exhibit worse bias properties than traditional regression methods which assume a continuous mean function. Currently there are no automatic bandwidth selection procedures available for the jump preserving M-smoother of Chu et al. (1998). One of the main objectives of this dissertation is to develop an automatic data driven bandwidth selection procedure for Chu et al.'s (1998) M-smoother. We actually present two bandwidth selection procedures. The first is a crude rule of thumb method and the second is a more sophistocated direct plug in method. Our bandwidth selection procedures are modeled after the methods of Chu et al. (1998) with two significant modifications which make the methods robust to possible jump points. Another objective of this dissertation is to provide a nonparametric hypothesis test, based on Chu et al.'s (1998) M-smoother, to test for a break in the continuity of an underlying regression mean function. Our proposed hypothesis test is nonparametric in the sense that the mean function away from the jump point(s) is not required to follow a specific parametric model. In addition the test does not require the user to specify the number, position, or size of the jump points in the alternative hypothesis as do many current methods. Thus the null and alternative hypotheses for our test are: H0: The mean function is continuous (i.e. no jump points) vs. HA: The mean function is not continuous (i.e. there is at least one jump point). Our testing procedure takes the form of a critical bandwidth hypothesis test. The test statistic is essentially the largest bandwidth that allows Chu et al.'s (1998) M-smoother to satisfy the null hypothesis. The significance of the test is then calculated via a bootstrap method. This test is currently in the experimental stage of its development. In this dissertation we outline the steps required to calculate the test as well as assess the power based on a small simulation study. Future work such as a faster calculation algorithm is required before the testing procedure will be practical for the general user.
- Bayesian Hierarchical Latent Model for Gene Set AnalysisChao, Yi (Virginia Tech, 2009-04-29)Pathway is a set of genes which are predefined and serve a particular celluar or physiological function. Ranking pathways relevant to a particular phenotype can help researchers focus on a few sets of genes in pathways. In this thesis, a Bayesian hierarchical latent model was proposed using generalized linear random effects model. The advantage of the approach was that it can easily incorporate prior knowledges when the sample size was small and the number of genes was large. For the covariance matrix of a set of random variables, two Gaussian random processes were considered to construct the dependencies among genes in a pathway. One was based on the polynomial kernel and the other was based on the Gaussian kernel. Then these two kernels were compared with constant covariance matrix of the random effect by using the ratio, which was based on the joint posterior distribution with respect to each model. For mixture models, log-likelihood values were computed at different values of the mixture proportion, compared among mixtures of selected kernels and point-mass density (or constant covariance matrix). The approach was applied to a data set (Mootha et al., 2003) containing the expression profiles of type II diabetes where the motivation was to identify pathways that can discriminate between normal patients and patients with type II diabetes.
- Bayesian Hierarchical Methods and the Use of Ecological Thresholds and Changepoints for Habitat Selection ModelsPooler, Penelope S. (Virginia Tech, 2005-12-02)Modeling the complex relationships between habitat characteristics and a species' habitat preferences pose many difficult problems for ecological researchers. These problems are complicated further when information is collected over a range of time or space. Additionally, the variety of factors affecting these choices is difficult to understand and even more difficult to accurately collect information about. In light of these concerns, we evaluate the performance of current standard habitat preference models that are based on Bayesian methods and then present some extensions and supplements to those methods that prove to be very useful. More specifically, we demonstrate the value of extending the standard Bayesian hierarchical model using finite mixture model methods. Additionally, we demonstrate that an extension of the Bayesian hierarchical changepoint model to allow for estimating multiple changepoints simultaneously can be very informative when applied to data about multiple habitat locations or species. These models allow the researcher to compare the sites or species with respect to a very specific ecological question and consequently provide definitive answers that are often not available with more commonly used models containing many explanatory factors. Throughout our work we use a complex data set containing information about horseshoe crab spawning habitat preferences in the Delaware Bay over a five-year period. These data epitomize some of the difficult issues inherent to studying habitat preferences. The data are collected over time at many sites, have missing observations, and include explanatory variables that, at best, only provide surrogate information for what researchers feel is important in explaining spawning preferences throughout the bay. We also looked at a smaller data set of freshwater mussel habitat selection preferences in relation to bridge construction on the Kennerdell River in Western Pennsylvania. Together, these two data sets provided us with insight in developing and refining the methods we present. They also help illustrate the strengths and weaknesses of the methods we discuss by assessing their performance in real situations where data are inevitably complex and relationships are difficult to discern.
- Bayesian Model Averaging and Variable Selection in Multivariate Ecological ModelsLipkovich, Ilya A. (Virginia Tech, 2002-04-09)Bayesian Model Averaging (BMA) is a new area in modern applied statistics that provides data analysts with an efficient tool for discovering promising models and obtaining esti-mates of their posterior probabilities via Markov chain Monte Carlo (MCMC). These probabilities can be further used as weights for model averaged predictions and estimates of the parameters of interest. As a result, variance components due to model selection are estimated and accounted for, contrary to the practice of conventional data analysis (such as, for example, stepwise model selection). In addition, variable activation probabilities can be obtained for each variable of interest. This dissertation is aimed at connecting BMA and various ramifications of the multivari-ate technique called Reduced-Rank Regression (RRR). In particular, we are concerned with Canonical Correspondence Analysis (CCA) in ecological applications where the data are represented by a site by species abundance matrix with site-specific covariates. Our goal is to incorporate the multivariate techniques, such as Redundancy Analysis and Ca-nonical Correspondence Analysis into the general machinery of BMA, taking into account such complicating phenomena as outliers and clustering of observations within a single data-analysis strategy. Traditional implementations of model averaging are concerned with selection of variables. We extend the methodology of BMA to selection of subgroups of observations and im-plement several approaches to cluster and outlier analysis in the context of the multivari-ate regression model. The proposed algorithm of cluster analysis can accommodate re-strictions on the resulting partition of observations when some of them form sub-clusters that have to be preserved when larger clusters are formed.
- Canopy light environment influences apple leaf physiology and fruit qualityCampbell, Richard J. (Virginia Tech, 1991-04-11)Several experiments were conducted to determine: the influence of canopy position, girdling, and defoliation on nectar production; whether instantaneous light measurements yield reliable estimates of cumulative seasonal light levels within the canopy; and the effect of the canopy light environment on spur leaf physiology and fruit quality. Defoliation of nongirdled flowering spurs had no effect on nectar production or composition, while defoliation of girdled spurs induced nectar sugar concentration by 24%. Canopy position had no influence on nectar production or composition. At full bloom there were differences in photosynthetic potential of spur leaves from different canopy positions. Exterior leaves had a greater maximum photosynthetic rate and an unique photosynthetic light response curve compared to the intermediate and interior leaves. Differences among positions persisted throughout the season. Stomatal conductance, specific leaf weight, dark respiration, and light levels were greater for the exterior leaves throughout the season. Instantaneous light measurements made on a single uniformly overcast day after the canopy was fully-developed (average of four times during the day) provided reliable estimates (predictive R2 > 0.90, n = 30) of total cumulative seasonal photosynthetic photon density (PPD). There was a I-to-l relationship between instantaneous and cumulative PPD after canopy development was complete providing both measures were expressed as a percentage. The relationships were equal over multiple dates for two consecutive years. Cloudless conditions provided poor estimates (predictive R2 = 0.49 to 0.80, n = 30). Light environment and harvest date influenced fruit quality characteristics within the canopy. Fruit red color, intensity of red color, and soluble solids concentration were all positively related to light level, with the highest R 2 on the early harvest dates. Fruit weight, firmness, length/ diameter ratio, starch index, and seed number were not consistently influenced by the light environment. The number of hours above an average photosynthetic photon flux density threshold of 250 I-£mol. m-2• sec'! explained slightly more of the variation in fruit quality characteristics than any other expressions of light.
- Causal Gene Network Inference from Genetical Genomics Experiments via Structural Equation ModelingLiu, Bing (Virginia Tech, 2006-09-11)The goal of this research is to construct causal gene networks for genetical genomics experiments using expression Quantitative Trait Loci (eQTL) mapping and Structural Equation Modeling (SEM). Unlike Bayesian Networks, this approach is able to construct cyclic networks, while cyclic relationships are expected to be common in gene networks. Reconstruction of gene networks provides important knowledge about the molecular basis of complex human diseases and generally about living systems. In genetical genomics, a segregating population is expression profiled and DNA marker genotyped. An Encompassing Directed Network (EDN) of causal regulatory relationships among genes can be constructed with eQTL mapping and selection of candidate causal regulators. Several eQTL mapping approaches and local structural models were evaluated in their ability to construct an EDN. The edges in an EDN correspond to either direct or indirect causal relationships, and the EDN is likely to contain cycles or feedback loops. We implemented SEM with genetics algorithms to produce sub-models of the EDN containing fewer edges and being well supported by the data. The EDN construction and sparsification methods were tested on a yeast genetical genomics data set, as well as the simulated data. For the simulated networks, the SEM approach has an average detection power of around ninety percent, and an average false discovery rate of around ten percent.
- Cluster-Based Bounded Influence RegressionLawrence, David E. (Virginia Tech, 2003-07-17)In the field of linear regression analysis, a single outlier can dramatically influence ordinary least squares estimation while low-breakdown procedures such as M regression and bounded influence regression may be unable to combat a small percentage of outliers. A high-breakdown procedure such as least trimmed squares (LTS) regression can accommodate up to 50% of the data (in the limit) being outlying with respect to the general trend. Two available one-step improvement procedures based on LTS are Mallows 1-step (M1S) regression and Schweppe 1-step (S1S) regression (the current state-of-the-art method). Issues with these methods include (1) computational approximations and sub-sampling variability, (2) dramatic coefficient sensitivity with respect to very slight differences in initial values, (3) internal instability when determining the general trend and (4) performance in low-breakdown scenarios. A new high-breakdown regression procedure is introduced that addresses these issues, plus offers an insightful summary regarding the presence and structure of multivariate outliers. This proposed method blends a cluster analysis phase with a controlled bounded influence regression phase, thereby referred to as cluster-based bounded influence regression, or CBI. Representing the data space via a special set of anchor points, a collection of point-addition OLS regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster "halfset" of observations, with the remaining observations becoming one or more minor clusters. An initial regression estimator arises from the main cluster, with a multiple point addition DFFITS argument used to carefully activate the minor clusters through a bounded influence regression framework. CBI achieves a 50% breakdown point, is regression equivariant, scale equivariant and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo studies demonstrate the performance advantage of CBI over S1S and the other high breakdown methods regarding coefficient stability, scale estimation and standard errors. A dendrogram of the clustering process is one graphical display available for multivariate outlier detection. Overall, the proposed methodology represents advancement in the field of robust regression, offering a distinct philosophical viewpoint towards data analysis and the marriage of estimation with diagnostic summary.
- Cluster-Based Bounded Influence RegressionLawrence, David E.; Birch, Jeffrey B.; Chen, Yajuan (Virginia Tech, 2012)A regression methodology is introduced that obtains competitive, robust, efficient, high breakdown regression parameter estimates as well as providing an informative summary regarding possible multiple outlier structure. The proposed method blends a cluster analysis phase with a controlled bounded influence regression phase, thereby referred to as cluster-based bounded influence regression, or CBI. Representing the data space via a special set of anchor points, a collection of point-addition OLS regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster “half-set” of observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stability, scale estimation and standard errors. The dendrogram of the clustering process and the weight plot are graphical displays available for multivariate outlier detection. Overall, the proposed methodology represents advancement in the field of robust regression, offering a distinct philosophical view point towards data analysis and the marriage of estimation with diagnostic summary.
- Cluster-Based Profile Monitoring in Phase I AnalysisChen, Yajuan; Birch, Jeffrey B. (Virginia Tech, 2012)An innovative profile monitoring methodology is introduced for Phase I analysis. The proposed technique, which is referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase to aid in determining if non conforming profiles are present in the historical data set (HDS). To cluster the profiles, the proposed method first replaces the data for each profile with an estimated profile curve, using some appropriate regression method, and clusters the profiles based on their estimated parameter vectors. This cluster phase then yields a main cluster which contains more than half of the profiles. The initial estimated population average (PA) parameters are obtained by fitting a linear mixed model to those profiles in the main cluster. In-control profiles, determined using the Hotelling’s T² statistic, that are not contained in the initial main cluster are iteratively added to the main cluster and the mixed model is used to update the estimated PA parameters. A simulated example and Monte Carlo results demonstrate the performance advantage of this proposed method over a current noncluster based method with respect to more accurate estimates of the PA parameters and better classification performance in determining those profiles from an in-control process from those from an out-of-control process in Phase I.
- Cluster_Based Profile Monitoring in Phase I AnalysisChen, Yajuan (Virginia Tech, 2014-03-26)Profile monitoring is a well-known approach used in statistical process control where the quality of the product or process is characterized by a profile or a relationship between a response variable and one or more explanatory variables. Profile monitoring is conducted over two phases, labeled as Phase I and Phase II. In Phase I profile monitoring, regression methods are used to model each profile and to detect the possible presence of out-of-control profiles in the historical data set (HDS). The out-of-control profiles can be detected by using the statis-tic. However, previous methods of calculating the statistic are based on using all the data in the HDS including the data from the out-of-control process. Consequently, the ability of using this method can be distorted if the HDS contains data from the out-of-control process. This work provides a new profile monitoring methodology for Phase I analysis. The proposed method, referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase before calculating the statistic. Before introducing our proposed cluster-based method in profile monitoring, this cluster-based method is demonstrated to work efficiently in robust regression, referred to as cluster-based bounded influence regression or CBI. It will be demonstrated that the CBI method provides a robust, efficient and high breakdown regression parameter estimator. The CBI method first represents the data space via a special set of points, referred to as anchor points. Then a collection of single-point-added ordinary least squares regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster containing at least half the observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stabil-ity, scale estimation and standard errors. The cluster-based method in Phase I profile monitoring first replaces the data from each sampled unit with an estimated profile, using some appropriate regression method. The estimated parameters for the parametric profiles are obtained from parametric models while the estimated parameters for the nonparametric profiles are obtained from the p-spline model. The cluster phase clusters the profiles based on their estimated parameters and this yields an initial main cluster which contains at least half the profiles. The initial estimated parameters for the population average (PA) profile are obtained by fitting a mixed model (parametric or nonparametric) to those profiles in the main cluster. Profiles that are not contained in the initial main cluster are iteratively added to the main cluster provided their statistics are "small" and the mixed model (parametric or nonparametric) is used to update the estimated parameters for the PA profile. Those profiles contained in the final main cluster are considered as resulting from the in-control process while those not included are considered as resulting from an out-of-control process. This cluster-based method has been applied to monitor both parametric and nonparametric profiles. A simulated example, a Monte Carlo study and an application to a real data set demonstrates the detail of the algorithm and the performance advantage of this proposed method over a non-cluster-based method is demonstrated with respect to more accurate estimates of the PA parameters and improved classification performance criteria. When the profiles can be represented by vectors, the profile monitoring process is equivalent to the detection of multivariate outliers. For this reason, we also compared our proposed method to a popular method used to identify outliers when dealing with a multivariate response. Our study demonstrated that when the out-of-control process corresponds to a sustained shift, the cluster-based method using the successive difference estimator is clearly the superior method, among those methods we considered, based on all performance criteria. In addition, the influence of accurate Phase I estimates on the performance of Phase II control charts is presented to show the further advantage of the proposed method. A simple example and Monte Carlo results show that more accurate estimates from Phase I would provide more efficient Phase II control charts.
- Contributions to Profile Monitoring and Multivariate Statistical Process ControlWilliams, James Dickson (Virginia Tech, 2004-12-01)The content of this dissertation is divided into two main topics: 1) nonlinear profile monitoring and 2) an improved approximate distribution for the T² statistic based on the successive differences covariance matrix estimator. Part 1: Nonlinear Profile Monitoring In an increasing number of cases the quality of a product or process cannot adequately be represented by the distribution of a univariate quality variable or the multivariate distribution of a vector of quality variables. Rather, a series of measurements are taken across some continuum, such as time or space, to create a profile. The profile determines the product quality at that sampling period. We propose Phase I methods to analyze profiles in a baseline dataset where the profiles can be modeled through either a parametric nonlinear regression function or a nonparametric regression function. We illustrate our methods using data from Walker and Wright (2002) and from dose-response data from DuPont Crop Protection. Part 2: Approximate Distribution of T² Although the T² statistic based on the successive differences estimator has been shown to be effective in detecting a shift in the mean vector (Sullivan and Woodall (1996) and Vargas (2003)), the exact distribution of this statistic is unknown. An accurate upper control limit (UCL) for the T² chart based on this statistic depends on knowing its distribution. Two approximate distributions have been proposed in the literature. We demonstrate the inadequacy of these two approximations and derive useful properties of this statistic. We give an improved approximate distribution and recommendations for its use.
- Cumulative Sum Control Charts for Censored Reliability DataOlteanu, Denisa Anca (Virginia Tech, 2010-04-01)Companies routinely perform life tests for their products. Typically, these tests involve running a set of products until the units fail. Most often, the data are censored according to different censoring schemes, depending on the particulars of the test. On occasion, tests are stopped at a predetermined time and the units that are yet to fail are suspended. In other instances, the data are collected through periodic inspection and only upper and lower bounds on the lifetimes are recorded. Reliability professionals use a number of non-normal distributions to model the resulting lifetime data with the Weibull distribution being the most frequently used. If one is interested in monitoring the quality and reliability characteristics of such processes, one needs to account for the challenges imposed by the nature of the data. We propose likelihood ratio based cumulative sum (CUSUM) control charts for censored lifetime data with non-normal distributions. We illustrate the development and implementation of the charts, and we evaluate their properties through simulation studies. We address the problem of interval censoring, and we construct a CUSUM chart for censored ordered categorical data, which we illustrate by a case study at Becton Dickinson (BD). We also address the problem of monitoring both of the parameters of the Weibull distribution for processes with right-censored data.
- Design and analysis for a two level factorial experiment in the presence of dispersion effectsMays, Darcy P. (Virginia Tech, 1993)Standard response surface methodology experimental designs for estimating location models involve the assumption of homogeneous variance throughout the design region. However, with heterogeneity of variance these standard designs are not optimal. Using the D and Q-optimality criteria, this dissertation proposes a two-stage experimental design procedure that gives more efficient designs than the standard designs when heterogeneous variance exists. Several multiple variable location models, with and without interactions, are considered. For each the first stage estimates the heterogeneous variance structure, while the second stage then augments the first stage to produce a D or Q-optimal design for fitting the location model under the estimated variance structure. However, there is a potential instability of the variance estimates in the first stage that can lower the efficiency of the two-stage procedure. This problem can be addressed and the efficiency of the procedure enhanced if certain mild assumptions concerning the variance structure are made and formulated as a prior distribution to produce a Bayes estimator. With homogeneous variance, designs are analyzed using ordinary least squares. However, with heterogeneous variance the correct analysis is to use weighted least squares. This dissertation also examines the effects that analysis by weighted least squares can have and compares this procedure to the proposed two-stage procedure.
- Dual Model Robust RegressionRobinson, Timothy J. (Virginia Tech, 2004-07-30)In typical normal theory regression, the assumption of homogeneity of variances is often not appropriate. Instead of treating the variances as a nuisance and transforming away the heterogeneity, the structure of the variances may be of interest and it is desirable to model the variances. Aitkin (1987) proposes a parametric dual model in which a log linear dependence of the variances on a set of explanatory variables is assumed. Aitkin's parametric approach is an iterative one providing estimates for the parameters in the mean and variance models through joint maximum likelihood. Estimation of the mean and variance parameters are interrelatedas the responses in the variance model are the squared residuals from the fit to the means model. When one or both of the models (the mean or variance model) are misspecified, parametric dual modeling can lead to faulty inferences. An alternative to parametric dual modeling is to let the data completely determine the form of the true underlying mean and variance functions (nonparametric dual modeling). However, nonparametric techniques often result in estimates which are characterized by high variability and they ignore important knowledge that the user may have regarding the process. Mays and Birch (1996) have demonstrated an effective semiparametric method in the one regressor, single-model regression setting which is a "hybrid" of parametric and nonparametric fits. Using their techniques, we develop a dual modeling approach which is robust to misspecification in either or both of the two models. Examples will be presented to illustrate the new technique, termed here as Dual Model Robust Regression.
- Effect of heat treatment on dyeability, glass transition temperature, and tensile properties of polyacrylonitrile fibers (orlon 42)Sarmadi, Abdolmajid (Virginia Polytechnic Institute and State University, 1986)Deniers of treated and untreated fibers were determined and the „ results were used in calculations of tenacity and initial modulus. Tensile properties were measured on a constant—rate—of—extension machine. Shrinkage of treated and untreated fibers were measured after they were boiled in water for 15 min. The glass transition temperatures (Tg) were obtained by differential scanning calorimetry. The ratio of the intensities of the CN/CH stretching bands were found by infrared spectroscopy, using the KBr method
- Effect of Phase I Estimation on Phase II Control Chart Performance with Profile DataChen, Yajuan; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2014)This paper illustrates how Phase I estimators in statistical process control (SPC) can affect the performance of Phase II control charts. The deleterious impact of poor Phase I estimators on the performance of Phase II control charts is illustrated in the context of profile monitoring. Two types of Phase I estimators are discussed. One approach uses functional cluster analysis to initially distinguish between estimated profiles from an in-control process and those from an out-of-control process. The second approach does not use clustering to make the distinction. The Phase II control charts are established based on the two resulting types of estimates and compared across varying sizes of sustained shifts in Phase II. A simulated example and a Monte Carlo study show that the performance of the Phase II control charts can be severely distorted when constructed with poor Phase I estimators. The use of clustering leads to much better Phase II performance. We also illustrate that the performance of Phase II control charts based on the poor Phase I estimators not only have more false alarms than expected but can also take much longer than expected to detect potential changes to the process.