Cluster_Based Profile Monitoring in Phase I Analysis


TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Profile monitoring is a well-known approach used in statistical process control where the quality of the product or process is characterized by a profile or a relationship between a response variable and one or more explanatory variables. Profile monitoring is conducted over two phases, labeled as Phase I and Phase II. In Phase I profile monitoring, regression methods are used to model each profile and to detect the possible presence of out-of-control profiles in the historical data set (HDS). The out-of-control profiles can be detected by using the statis-tic. However, previous methods of calculating the statistic are based on using all the data in the HDS including the data from the out-of-control process. Consequently, the ability of using this method can be distorted if the HDS contains data from the out-of-control process. This work provides a new profile monitoring methodology for Phase I analysis. The proposed method, referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase before calculating the statistic.

Before introducing our proposed cluster-based method in profile monitoring, this cluster-based method is demonstrated to work efficiently in robust regression, referred to as cluster-based bounded influence regression or CBI. It will be demonstrated that the CBI method provides a robust, efficient and high breakdown regression parameter estimator. The CBI method first represents the data space via a special set of points, referred to as anchor points. Then a collection of single-point-added ordinary least squares regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster containing at least half the observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stabil-ity, scale estimation and standard errors.

The cluster-based method in Phase I profile monitoring first replaces the data from each sampled unit with an estimated profile, using some appropriate regression method. The estimated parameters for the parametric profiles are obtained from parametric models while the estimated parameters for the nonparametric profiles are obtained from the p-spline model. The cluster phase clusters the profiles based on their estimated parameters and this yields an initial main cluster which contains at least half the profiles. The initial estimated parameters for the population average (PA) profile are obtained by fitting a mixed model (parametric or nonparametric) to those profiles in the main cluster. Profiles that are not contained in the initial main cluster are iteratively added to the main cluster provided their statistics are "small" and the mixed model (parametric or nonparametric) is used to update the estimated parameters for the PA profile. Those profiles contained in the final main cluster are considered as resulting from the in-control process while those not included are considered as resulting from an out-of-control process. This cluster-based method has been applied to monitor both parametric and nonparametric profiles. A simulated example, a Monte Carlo study and an application to a real data set demonstrates the detail of the algorithm and the performance advantage of this proposed method over a non-cluster-based method is demonstrated with respect to more accurate estimates of the PA parameters and improved classification performance criteria.

When the profiles can be represented by vectors, the profile monitoring process is equivalent to the detection of multivariate outliers. For this reason, we also compared our proposed method to a popular method used to identify outliers when dealing with a multivariate response. Our study demonstrated that when the out-of-control process corresponds to a sustained shift, the cluster-based method using the successive difference estimator is clearly the superior method, among those methods we considered, based on all performance criteria. In addition, the influence of accurate Phase I estimates on the performance of Phase II control charts is presented to show the further advantage of the proposed method. A simple example and Monte Carlo results show that more accurate estimates from Phase I would provide more efficient Phase II control charts.



Cluster, Mixed Model, Phase I, Phase II, Robust, T2 Statistic