Browsing by Author "Chen, Yajuan"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- Cluster-Based Bounded Influence RegressionLawrence, David E.; Birch, Jeffrey B.; Chen, Yajuan (Virginia Tech, 2012)A regression methodology is introduced that obtains competitive, robust, efficient, high breakdown regression parameter estimates as well as providing an informative summary regarding possible multiple outlier structure. The proposed method blends a cluster analysis phase with a controlled bounded influence regression phase, thereby referred to as cluster-based bounded influence regression, or CBI. Representing the data space via a special set of anchor points, a collection of point-addition OLS regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster “half-set” of observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stability, scale estimation and standard errors. The dendrogram of the clustering process and the weight plot are graphical displays available for multivariate outlier detection. Overall, the proposed methodology represents advancement in the field of robust regression, offering a distinct philosophical view point towards data analysis and the marriage of estimation with diagnostic summary.
- Cluster-Based Profile Monitoring in Phase I AnalysisChen, Yajuan; Birch, Jeffrey B. (Virginia Tech, 2012)An innovative profile monitoring methodology is introduced for Phase I analysis. The proposed technique, which is referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase to aid in determining if non conforming profiles are present in the historical data set (HDS). To cluster the profiles, the proposed method first replaces the data for each profile with an estimated profile curve, using some appropriate regression method, and clusters the profiles based on their estimated parameter vectors. This cluster phase then yields a main cluster which contains more than half of the profiles. The initial estimated population average (PA) parameters are obtained by fitting a linear mixed model to those profiles in the main cluster. In-control profiles, determined using the Hotelling’s T² statistic, that are not contained in the initial main cluster are iteratively added to the main cluster and the mixed model is used to update the estimated PA parameters. A simulated example and Monte Carlo results demonstrate the performance advantage of this proposed method over a current noncluster based method with respect to more accurate estimates of the PA parameters and better classification performance in determining those profiles from an in-control process from those from an out-of-control process in Phase I.
- Cluster_Based Profile Monitoring in Phase I AnalysisChen, Yajuan (Virginia Tech, 2014-03-26)Profile monitoring is a well-known approach used in statistical process control where the quality of the product or process is characterized by a profile or a relationship between a response variable and one or more explanatory variables. Profile monitoring is conducted over two phases, labeled as Phase I and Phase II. In Phase I profile monitoring, regression methods are used to model each profile and to detect the possible presence of out-of-control profiles in the historical data set (HDS). The out-of-control profiles can be detected by using the statis-tic. However, previous methods of calculating the statistic are based on using all the data in the HDS including the data from the out-of-control process. Consequently, the ability of using this method can be distorted if the HDS contains data from the out-of-control process. This work provides a new profile monitoring methodology for Phase I analysis. The proposed method, referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase before calculating the statistic. Before introducing our proposed cluster-based method in profile monitoring, this cluster-based method is demonstrated to work efficiently in robust regression, referred to as cluster-based bounded influence regression or CBI. It will be demonstrated that the CBI method provides a robust, efficient and high breakdown regression parameter estimator. The CBI method first represents the data space via a special set of points, referred to as anchor points. Then a collection of single-point-added ordinary least squares regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster containing at least half the observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stabil-ity, scale estimation and standard errors. The cluster-based method in Phase I profile monitoring first replaces the data from each sampled unit with an estimated profile, using some appropriate regression method. The estimated parameters for the parametric profiles are obtained from parametric models while the estimated parameters for the nonparametric profiles are obtained from the p-spline model. The cluster phase clusters the profiles based on their estimated parameters and this yields an initial main cluster which contains at least half the profiles. The initial estimated parameters for the population average (PA) profile are obtained by fitting a mixed model (parametric or nonparametric) to those profiles in the main cluster. Profiles that are not contained in the initial main cluster are iteratively added to the main cluster provided their statistics are "small" and the mixed model (parametric or nonparametric) is used to update the estimated parameters for the PA profile. Those profiles contained in the final main cluster are considered as resulting from the in-control process while those not included are considered as resulting from an out-of-control process. This cluster-based method has been applied to monitor both parametric and nonparametric profiles. A simulated example, a Monte Carlo study and an application to a real data set demonstrates the detail of the algorithm and the performance advantage of this proposed method over a non-cluster-based method is demonstrated with respect to more accurate estimates of the PA parameters and improved classification performance criteria. When the profiles can be represented by vectors, the profile monitoring process is equivalent to the detection of multivariate outliers. For this reason, we also compared our proposed method to a popular method used to identify outliers when dealing with a multivariate response. Our study demonstrated that when the out-of-control process corresponds to a sustained shift, the cluster-based method using the successive difference estimator is clearly the superior method, among those methods we considered, based on all performance criteria. In addition, the influence of accurate Phase I estimates on the performance of Phase II control charts is presented to show the further advantage of the proposed method. A simple example and Monte Carlo results show that more accurate estimates from Phase I would provide more efficient Phase II control charts.
- Effect of Phase I Estimation on Phase II Control Chart Performance with Profile DataChen, Yajuan; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2014)This paper illustrates how Phase I estimators in statistical process control (SPC) can affect the performance of Phase II control charts. The deleterious impact of poor Phase I estimators on the performance of Phase II control charts is illustrated in the context of profile monitoring. Two types of Phase I estimators are discussed. One approach uses functional cluster analysis to initially distinguish between estimated profiles from an in-control process and those from an out-of-control process. The second approach does not use clustering to make the distinction. The Phase II control charts are established based on the two resulting types of estimates and compared across varying sizes of sustained shifts in Phase II. A simulated example and a Monte Carlo study show that the performance of the Phase II control charts can be severely distorted when constructed with poor Phase I estimators. The use of clustering leads to much better Phase II performance. We also illustrate that the performance of Phase II control charts based on the poor Phase I estimators not only have more false alarms than expected but can also take much longer than expected to detect potential changes to the process.
- A Phase I Cluster-Based Method for Analyzing Nonparametric ProfilesChen, Yajuan; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2014)A cluster-based method was used by Chen et al.²⁴ to analyze parametric profiles in Phase I of the profile monitoring process. They showed performance advantages in using their cluster-based method of analyzing parametric profiles over a non-cluster-based method with respect to more accurate estimates of the parameters and improved classification performance criteria. However, it is known that, in many cases, profiles can be better represented using a nonparametric method. In this study, we use the clusterbased method to analyze profiles that cannot be easily represented by a parametric function. The similarity matrix used during the clustering phase is based on the fits of the individual profiles with pspline regression. The clustering phase will determine an initial main cluster set which contains greater than half of the total profiles in the historical data set. The profiles with in-control T² statistics are sequentially added to the initial main cluster set and upon completion of the algorithm, the profiles in the main cluster set are classified as the in-control profiles and the profiles not in the main cluster set are classified as out-of-control profiles. A Monte Carlo study demonstrates that the cluster-based method results in superior performance over a non-cluster-based method with respect to better classification and higher power in detecting out-of-control profiles. Also, our Monte Carlo study shows that the clusterbased method has better performance than a non-cluster-based method whether the model is correctly specified or not. We illustrate the use of our method with data from the automotive industry.