Technical Reports, Statistics

Permanent URI for this collection

https://hdl.handle.net/10919/46731

Browse

Now showing 1 - 20 of 22

Cluster-Based Bounded Influence Regression
Lawrence, David E.; Birch, Jeffrey B.; Chen, Yajuan (Virginia Tech, 2012)
A regression methodology is introduced that obtains competitive, robust, efficient, high breakdown regression parameter estimates as well as providing an informative summary regarding possible multiple outlier structure. The proposed method blends a cluster analysis phase with a controlled bounded influence regression phase, thereby referred to as cluster-based bounded influence regression, or CBI. Representing the data space via a special set of anchor points, a collection of point-addition OLS regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster “half-set” of observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stability, scale estimation and standard errors. The dendrogram of the clustering process and the weight plot are graphical displays available for multivariate outlier detection. Overall, the proposed methodology represents advancement in the field of robust regression, offering a distinct philosophical view point towards data analysis and the marriage of estimation with diagnostic summary.
Cluster-Based Profile Monitoring in Phase I Analysis
Chen, Yajuan; Birch, Jeffrey B. (Virginia Tech, 2012)
An innovative profile monitoring methodology is introduced for Phase I analysis. The proposed technique, which is referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase to aid in determining if non conforming profiles are present in the historical data set (HDS). To cluster the profiles, the proposed method first replaces the data for each profile with an estimated profile curve, using some appropriate regression method, and clusters the profiles based on their estimated parameter vectors. This cluster phase then yields a main cluster which contains more than half of the profiles. The initial estimated population average (PA) parameters are obtained by fitting a linear mixed model to those profiles in the main cluster. In-control profiles, determined using the Hotelling’s T² statistic, that are not contained in the initial main cluster are iteratively added to the main cluster and the mixed model is used to update the estimated PA parameters. A simulated example and Monte Carlo results demonstrate the performance advantage of this proposed method over a current noncluster based method with respect to more accurate estimates of the PA parameters and better classification performance in determining those profiles from an in-control process from those from an out-of-control process in Phase I.
Effect of Phase I Estimation on Phase II Control Chart Performance with Profile Data
Chen, Yajuan; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2014)
This paper illustrates how Phase I estimators in statistical process control (SPC) can affect the performance of Phase II control charts. The deleterious impact of poor Phase I estimators on the performance of Phase II control charts is illustrated in the context of profile monitoring. Two types of Phase I estimators are discussed. One approach uses functional cluster analysis to initially distinguish between estimated profiles from an in-control process and those from an out-of-control process. The second approach does not use clustering to make the distinction. The Phase II control charts are established based on the two resulting types of estimates and compared across varying sizes of sustained shifts in Phase II. A simulated example and a Monte Carlo study show that the performance of the Phase II control charts can be severely distorted when constructed with poor Phase I estimators. The use of clustering leads to much better Phase II performance. We also illustrate that the performance of Phase II control charts based on the poor Phase I estimators not only have more false alarms than expected but can also take much longer than expected to detect potential changes to the process.
High Breakdown Estimation Methods for Phase I Multivariate Control Charts
Jensen, Willis A.; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2005)
The goal of Phase I monitoring of multivariate data is to identify multivariate outliers and step changes so that the estimated control limits are sufficiently accurate for Phase II monitoring. High breakdown estimation methods based on the minimum volume ellipsoid (MVE) or the minimum covariance determinant (MCD) are well suited to detecting multivariate outliers in data. However, they are difficult to implement in practice due to the extensive computation required to obtain the estimates. Based on previous studies, it is not clear which of these two estimation methods is best for control chart applications. The comprehensive simulation study here gives guidance for when to use which estimator, and control limits are provided. High breakdown estimation methods such as MCD and MVE, can be applied to a wide variety of multivariate quality control data.
An Improved Genetic Algorithm Using a Directional Search
Wan, Wen; Birch, Jeffrey B. (Virginia Tech, 2009)
The genetic algorithm (GA), a very powerful tool used in optimization, has been applied in various fields including statistics. However, the general GA is usually computationally intensive, often having to perform a large number of evaluations of an objective function. This paper presents four different versions of computationally efficient genetic algorithms by incorporating several different local directional searches into the GA process. These local searches are based on using the method of steepest descent (SD), the Newton-Raphson method (NR), a derivative-free directional search method (denoted by “DFDS”), and a method that combines SD with DFDS. Some benchmark functions, such as a low-dimensional function versus a high-dimensional function, and a relatively bumpy function versus a very bumpy function, are employed to illustrate the improvement of these proposed methods through a Monte Carlo simulation study using a split-plot design. A real problem related to the multi-response optimization problem is also used to illustrate the improvement of these proposed methods over the traditional GA and over the method implemented in the Design-Expert statistical software package. Our results show that the GA can be improved both in accuracy and in computational efficiency in most cases by incorporating a local directional search into the GA process.
An Improved Hybrid Genetic Algorithm with a New Local Search Procedure
Wan, Wen; Birch, Jeffrey B. (Virginia Tech, 2012)
A hybrid genetic algorithm (HGA) combines a genetic algorithm (GA) with an individual learning procedure. One such learning procedure is a local search technique (LS) used by the GA for refining global solutions. A HGA is also called a memetic algorithm (MA), one of the most successful and popular heuristic search methods. An important challenge of MAs is the trade-off between global and local searching as it is the case that the cost of a LS can be rather high. This paper proposes a novel, simplified, and efficient HGA with a new individual learning procedure that performs a LS only when the best offspring (solution) in the offspring population is also the best in the current parent population. Additionally, a new LS method is developed based on a three-directional search (TD), which is derivative-free and self-adaptive. The new HGA with two different LS methods (the TD and Neld-Mead simplex) is compared with a traditional HGA. Two benchmark functions are employed to illustrate the improvement of the proposed method with the new learning procedure. The results show that the new HGA greatly reduces the number of function evaluations and converges much faster to the global optimum than a traditional HGA. The TD local search method is a good choice in helping to locate a global “mountain” (or “valley”) but may not perform as well as the Nelder-Mead method in the final fine tuning toward the optimal solution.
Interaction Analysis of Three Combination Drugs via a Modified Genetic Algorithm
Wan, Wen; Pei, Xin-Yan; Grant, Steven; Birch, Jeffrey B.; Felthousen, Jessica; Dai, Yun; Fang, Hong-Bin; Tan, Ming; Sun, Shumei (Virginia Tech, 2014)
Few articles have been written on analyzing and visualizing three-way interactions between drugs. Although it may be quite straightforward to extend a statistical method from two-drugs to three-drugs, it is hard to visually illustrate which dose regions are synergistic, additive, or antagonistic, due to a four-dimensional (4-D) problem of plot- ting three-drug dose regions plus a response. This problem can be converted and solved by showing some dose regions of our interest in a 3-D, three-drug dose regions. We propose to apply a modified genetic algorithm (MGA) to construct the dose regions of interest after fitting the response surface to the interaction index (II) by a semiparametric method, the model robust regression method (MRR). A case study with three anti-cancer drugs in an in vitro experiment is employed to illustrate how to find the dose regions of interest. For example, suppose researchers are interested in visualizing where the synergistic areas with II ≤ 0:4 are in 3-D. After fitting a MRR model to the calculated II, the MGA procedure is used to collect those feasible points that satisfy the estimated values of II ≤ 0:4. All these feasible points are used to construct the approximate dose regions of interest in a 3-D.
Linear Mixed Model Robust Regression
Waterman, Megan J.; Birch, Jeffrey B.; Schabenberger, Oliver (Virginia Tech, 2006-11-05)
Mixed models are powerful tools for the analysis of clustered data and many extensions of the classical linear mixed model with normally distributed response have been established. As with all parametric models, correctness of the assumed model is critical for the validity of the ensuing inference. An incorrectly specified parametric means model may be improved by using a local, or nonparametric, model. Two local models are proposed by a pointwise weighting of the marginal and conditional variance-covariance matrices. However, nonparametric models tend to fit to irregularities in the data and provide fits with high variance. Model robust regression techniques estimate mean response as a convex combination of a parametric and a nonparametric model fits to the data. It is a semiparametric method by which incomplete or incorrect specified parametric models can be improved through adding an appropriate amount of the nonparametric fit. We compare the approximate integrated mean square error of the parametric, nonparametric, and mixed model robust methods via a simulation study, and apply these methods to monthly wind speed data from counties in Ireland.
Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers
Walker, Eric L.; Starnes, B. Alden; Birch, Jeffrey B.; Mays, James E. (American Institute of Aeronautics and Astronautics, 2010)
This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed byMays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.
Nonparametric and Semiparametric Linear Mixed Models
Waterman, Megan J.; Birch, Jeffrey B.; Abdel-Salam, Abdel-Salam Gomaa (Virginia Tech, 2012)
Mixed models are powerful tools for the analysis of clustered data and many extensions of the classical linear mixed model with normally distributed response have been established. As with all parametric models, correctness of the assumed model is critical for the validity of the ensuing inference. An incorrectly specified parametric means model may be improved by using a local, or nonparametric, model. Two local models are proposed by a pointwise weighting of the marginal and conditional variance-covariance matrices. However, nonparametric models tend to fit to irregularities in the data and may provide fits with high variance. Model robust regression techniques estimate mean response as a convex combination of a parametric and a nonparametric model fit to the data. It is a semiparametric method by which incomplete or incorrectly specified parametric models can be improved by adding an appropriate amount of the nonparametric fit. We compare the approximate integrated mean square error of the parametric, nonparametric, and mixed model robust methods via a simulation study and apply these methods to two real data sets: the monthly wind speed data from counties in Ireland and the engine speed data.
Nonparametric and Semiparametric Mixed Model Methods for Phase I Profile Monitoring
Abdel-Salam, Abdel-Salam Gomaa; Birch, Jeffrey B.; Jensen, Willis A. (Virginia Tech, 2010)
Profile monitoring is an approach in quality control best used where the process data follow a profile (or curve). The majority of previous studies in profile monitoring focused on the parametric modeling of either linear or nonlinear profiles, with both fixed and random-effects, under the assumption of correct model specification. Our work considers those cases where the parametric model for the family of profiles is unknown or, at least uncertain. Consequently, we consider monitoring profiles via two methods, a nonparametric (NP) method and a semiparametric procedure that combines both parametric and NP profile fits. We refer to our semiparametric procedure as mixed model robust profile monitoring (MMRPM). Also, we incorporate a mixed model approach to both the parametric and NP model fits to account for the autocorrelation within profiles and to deal with the collection of profiles as a random sample from a common population. For each case, we propose two Hotelling’s T² statistics for use in Phase I analysis to determine unusual profiles, one based on the estimated random effects and one based on the fitted values and obtain the corresponding control limits. Our simulation results show that our methods are robust to the common problem of model misspecification of the user’s proposed parametric model. We also found that both the NP and the semiparametric methods result in charts with good abilities to detect changes in Phase I data, and in charts with easily calculated control limits. The proposed methods provide greater flexibility and efficiency when compared to parametric methods commonly used in profile monitoring for Phase I that rely on correct model specification, an unrealistic situation in many practical problems in industrial applications. An example using our techniques is also presented.
On the Distribution of Hotelling's T² Statistic Based on the Successive Differences Covariance Matrix Estimator
Williams, James D.; Woodall, William H.; Birch, Jeffrey B.; Sullivan, Joe H. (Virginia Tech, 2004-09-30)
In the historical (or retrospective or Phase I) multivariate data analysis, the choice of the estimator for the variance-covariance matrix is crucial to successfully detecting the presence of special causes of variation. For the case of individual multivariate observations, the choice is compounded by the lack of rational subgroups of observations with the same distribution. Other research has shown that the use of the sample covariance matrix, with all of the individual observations pooled, impairs the detection of a sustained step shift in the mean vector. For example, research has shown that, with the use of the sample covariance matrix, the probability of a signal actually decreases below the false alarm probability with a sustained step shift near the middle of the data and that the signal probability decreases with the size of the shift. An alternative estimator, based on the successive differences of the individual observations, leads to an increasing signal probability as the size of the step shift increases and has been recommended for use in Phase I analysis. However, the exact distribution for the resulting T² chart statistics has not been determined when the successive differences estimator is used. Three approximate distributions have been proposed in the literature. In this paper we demonstrate several useful properties of the T² statistics based on the successive differences estimator and give a more accu- rate approximate distribution for calculating the upper control limit for individual observations in a Phase I analysis.
Outlier Robust Nonlinear Mixed Model Estimation
Williams, James D.; Birch, Jeffrey B.; Abdel-Salam, Abdel-Salam Gomaa (Virginia Tech, 2014)
In standard analyses of data well-modeled by a nonlinear mixed model (NLMM), an aberrant observation, either within a cluster, or an entire cluster itself, can greatly distort parameter estimates and subsequent standard errors. Consequently, inferences about the parameters are misleading. This paper proposes an outlier robust method based on linearization to estimate fixed effects parameters and variance components in the NLMM. An example is given using the 4-parameter logistic model and bioassay data, comparing the robust parameter estimates to the nonrobust estimates given by SASR®.
A Phase I Cluster-Based Method for Analyzing Nonparametric Profiles
Chen, Yajuan; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2014)
A cluster-based method was used by Chen et al.²⁴ to analyze parametric profiles in Phase I of the profile monitoring process. They showed performance advantages in using their cluster-based method of analyzing parametric profiles over a non-cluster-based method with respect to more accurate estimates of the parameters and improved classification performance criteria. However, it is known that, in many cases, profiles can be better represented using a nonparametric method. In this study, we use the clusterbased method to analyze profiles that cannot be easily represented by a parametric function. The similarity matrix used during the clustering phase is based on the fits of the individual profiles with pspline regression. The clustering phase will determine an initial main cluster set which contains greater than half of the total profiles in the historical data set. The profiles with in-control T² statistics are sequentially added to the initial main cluster set and upon completion of the algorithm, the profiles in the main cluster set are classified as the in-control profiles and the profiles not in the main cluster set are classified as out-of-control profiles. A Monte Carlo study demonstrates that the cluster-based method results in superior performance over a non-cluster-based method with respect to better classification and higher power in detecting out-of-control profiles. Also, our Monte Carlo study shows that the clusterbased method has better performance than a non-cluster-based method whether the model is correctly specified or not. We illustrate the use of our method with data from the automotive industry.
Profile Monitoring via Linear Mixed Models
Jensen, Willis A.; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2006)
Profile monitoring is a relatively new technique in quality control used when the product or process quality is best represented by a profile (or a curve) at each time period. The essential idea is often to model the profile via some parametric method and then monitor the estimated parameters over time to determine if there have been changes in the profiles. Previous modeling methods have not incorporated the correlation structure within the profiles. We propose the use of linear mixed models to monitor the linear profiles in order to account for the correlation structure within a profile. We consider various data scenarios and show using simulation when the linear mixed model approach is preferable to an approach that ignores the correlation structure. Our focus is on Phase I control chart applications.
Profile Monitoring via Nonlinear Mixed Models
Jensen, Willis A.; Birch, Jeffrey B. (Virginia Tech, 2006)
Profile monitoring is a relatively new technique in quality control best used where the process data follows a profile (or curve) at each time period. Little work has been done on the monitoring on nonlinear profiles. Previous work has assumed that the measurements within a profile are uncorrelated. To relax this restriction we propose the use of nonlinear mixed models to monitor the nonlinear profiles in order to account for the correlation structure. We evaluate the effectiveness of fitting separate nonlinear regression models to each profile in Phase I control chart applications for data with uncorrelated errors and no random effects. For data with random effects, we compare the effectiveness of charts based on a separate nonlinear regression approach versus those based on a nonlinear mixed model approach. Our proposed approach uses the separate nonlinear regression model fits to obtain a nonlinear mixed model fit. The nonlinear mixed model approach results in charts with good abilities to detect changes in Phase I data and has a simple to calculate control limit.
Robust Parameter Design: A Semi-Parametric Approach
Pickle, Stephanie M.; Robinson, Timothy J.; Birch, Jeffrey B.; Anderson-Cook, Christine M. (Virginia Tech, 2005)
Parameter design or robust parameter design (RPD) is an engineering methodology intended as a cost-effective approach for improving the quality of products and processes. The goal of parameter design is to choose the levels of the control variables that optimize a defined quality characteristic. An essential component of robust parameter design involves the assumption of well estimated models for the process mean and variance. Traditionally, the modeling of the mean and variance has been done parametrically. It is often the case, particularly when modeling the variance, that nonparametric techniques are more appropriate due to the nature of the curvature in the underlying function. Most response surface experiments involve sparse data. In sparse data situations with unusual curvature in the underlying function, nonparametric techniques often result in estimates with problematic variation whereas their parametric counterparts may result in estimates with problematic bias. We propose the use of semi-parametric modeling within the robust design setting, combining parametric and nonparametric functions to improve the quality of both mean and variance model estimation. The proposed method will be illustrated with an example and simulations.
A Semiparametric Approach to Dual Modeling
Robinson, Timothy J.; Birch, Jeffrey B.; Starnes, B. Alden (Virginia Tech, 2006)
In typical normal theory regression, the assumption of homogeneity of variances is often not appropriate. When heteroscedasticity exists, instead of treating the variances as a nuisance and transforming away the heterogeneity, the structure of the variances may be of interest and it is desirable to model the variances. Modeling both the mean and variance is commonly referred to as dual modeling. In parametric dual modeling, estimation of the mean and variance parameters are interrelated. When one or both of the models (the mean or variance model) are misspecified, parametric dual modeling can lead to faulty inferences. An alternative to parametric dual modeling is nonparametric dual modeling. However, nonparametric techniques often result in estimates that are characterized by high variability and ignore important knowledge that the user may have regarding the process. We develop a dual modeling approach [Dual Model Robust Regression (DMRR)], which is robust to user misspecification of the mean and/or variance models. Numerical and asymptotic results illustrate the advantages of DMRR over several other dual model procedures.
A Semiparametric Technique for the Multi-Response Optimization Problem
Wan, Wen; Birch, Jeffrey B. (Virginia Tech, 2009)
Multi-response optimization (MRO) in response surface methodology (RSM) is quite common in applications. Before the optimization phase, appropriate fitted models for each response are required. A common problem is model misspecification and occurs when any of the models built for the responses are misspecified resulting in an erroneous optimal solution. The model robust regression technique, a semiparametric method, has been shown to be more robust to misspecification than either parametric or nonparamet- ric methods. In this study, we propose the use of model robust regression to improve the quality of model estimation and adapt its fits of each response to the desirability function approach, one of the most popular MRO techniques. A case study and simulation studies are presented to illustrate the procedure and to compare the semiparametric method with the parametric and nonparametric methods. The results show that model robust regression performs much better than the other two methods in terms of model comparison criteria in most situations during the modeling stage. In addition, the simulated optimization results for model robust regression are more reliable during the optimization stage.
Statistical Monitoring of Heteroscedastic Dose-Response Profiles from High-throughput Screening
Williams, J.D.; Birch, Jeffrey B.; Woodall, William H.; Ferry, N.M. (Virginia Tech, 2006)
In pharmaceutical drug discovery and agricultural crop product discovery, in vivo bioassay experiments are used to identify promising compounds for further research. The reproducibility and accuracy of the bioassay is crucial to be able to correctly distinguish between active and inactive compounds. In the case of agricultural product discovery, a replicated dose-response of commercial crop protection products is assayed and used to monitor test quality. The activity of these compounds on the test organisms, the weeds, insects, or fungi, is characterized by a dose-response curve measured from the bioassay. These curves are used to monitor the quality of the bioassays. If undesirable conditions in the bioassay arise, such as equipment failure or problems with the test organisms, then a bioassay monitoring procedure is needed to quickly detect such issues. In this paper we illustrate a proposed nonlinear profile monitoring method to monitor the variability of multiple assays, the adequacy of the dose-response model chosen, and the estimated dose-response curves for aberrant cases in the presence of heteroscedasticity. We illustrate these methods with in vivo bioassay data collected over one year from DuPont Crop Protection.

Browse

Browsing Technical Reports, Statistics by Author "Birch, Jeffrey B."

Results Per Page

Sort Options