Technical Reports, Statistics
Permanent URI for this collection
Browse
Browsing Technical Reports, Statistics by Issue Date
Now showing 1 - 20 of 39
Results Per Page
Sort Options
- A Model to Predict the Impact of Specification Changes on Chloride-Induced Corrosion Service Life of Virginia Bridge DecksTrevor J. Kirkpatrick; Richard E. Weyers; Anderson-Cook, Christine M.; Michael M. Sprinkel; Michael C. Brown (Virginia Center for Transportation Innovation and Research, 2002-10)A model to determine the time to first repair and subsequent rehabilitation of concrete bridge decks exposed to chloride deicer salts that recognizes and incorporates the statistical nature of factors affecting the corrosion process is developed. The model expands on an existing deterministic model by using statistical computing techniques, including resampling techniques such as the parametric and simple bootstrap. Emphasis was placed on the diffusion portion of the diffusion-cracking model, but advances can be readily included for the time for corrosion deterioration after corrosion initiation. Data collected from ten bridge decks built in Virginia between 1981 and 1994 were used to model the surface chloride concentration, apparent diffusion coefficient, and clear cover depth. Several ranges of the chloride corrosion initiation concentration, as determined from the available literature, were investigated. The time to first repair and subsequent rehabilitation predicted by the stochastic model is shorter than the time to first repair and subsequent rehabilitation predicted by the deterministic model. The stochastic model is believed to more accurately reflect the true nature of bridge deck deterioration because it takes into account the fact that data for each of the parameters affecting chloride diffusion and corrosion initiation are not necessarily normally distributed. The model was validated by comparison of projected service lives of bridge decks built from 1981 to 1994 derived from the model to historical service life data for 129 bridge decks built in Virginia between 1968 and 1972. The time to rehabilitation predicted for the set of bridge decks built between 1981 and 1994 by the stochastic model was approximately 13 years longer than the normalized time to rehabilitation projected for the bridge decks built between 1968 and 1972 using historical data. The time to first repair and rehabilitation predicted by the probabilistic method more closely matches that of historical data than the time to first repair and rehabilitation predicted by the average value solution. The additional service life expected for the set of bridges built between 1981 and 1994 over those constructed from 1968 to 1972 can be attributed to the decrease in w/c ratio from 0.47 to 0.45 and slight increase in as-built cover depth from approximately 50 mm (2 in.) to 63.5 to 76 mm (2.5 to 3.0 in.).
- Construction Concepts for Continuum RegressionSpitzner, Dan J. (Virginia Tech, 2004-08-28)Approaches for meaningful regressor construction in the linear prediction problem are investigated in a framework similar to partial least squares and continuum regression, but weighted to allow for intelligent specification of an evaluative scheme. A cross-validatory continuum regression procedure is proposed, and shown to compare well with ordinary continuum regression in empirical demonstrations. Similar procedures are formulated from model-based constructive criteria, but are shown to be severely limited in their potential to enhance predictive performance. By paying careful attention to the interpretability of the proposed methods, the paper addresses a long-standing criticism that the current methodology relies on arbitrary mechanisms.
- On the Distribution of Hotelling's T² Statistic Based on the Successive Differences Covariance Matrix EstimatorWilliams, James D.; Woodall, William H.; Birch, Jeffrey B.; Sullivan, Joe H. (Virginia Tech, 2004-09-30)In the historical (or retrospective or Phase I) multivariate data analysis, the choice of the estimator for the variance-covariance matrix is crucial to successfully detecting the presence of special causes of variation. For the case of individual multivariate observations, the choice is compounded by the lack of rational subgroups of observations with the same distribution. Other research has shown that the use of the sample covariance matrix, with all of the individual observations pooled, impairs the detection of a sustained step shift in the mean vector. For example, research has shown that, with the use of the sample covariance matrix, the probability of a signal actually decreases below the false alarm probability with a sustained step shift near the middle of the data and that the signal probability decreases with the size of the shift. An alternative estimator, based on the successive differences of the individual observations, leads to an increasing signal probability as the size of the step shift increases and has been recommended for use in Phase I analysis. However, the exact distribution for the resulting T² chart statistics has not been determined when the successive differences estimator is used. Three approximate distributions have been proposed in the literature. In this paper we demonstrate several useful properties of the T² statistics based on the successive differences estimator and give a more accu- rate approximate distribution for calculating the upper control limit for individual observations in a Phase I analysis.
- Evaluating And Interpreting InteractionsHinkelmann, Klaus H. (Virginia Tech, 2004-12-13)The notion of interaction plays an important − and sometimes frightening − role in the analysis and interpretation of results from observational and experimental studies. In general, results are much easier to explain and to implement if interaction effects are not present. It is for this reason that they are often assumed to be negligible. This may, however, lead to erroneous conclusions and poor actions. One reason why interactions are sometimes feared is because of limited understanding of what the word “interaction” actually means, in a practical sense and,in particular, in a statistical sense. As far as the latter is concerned, simply stating that interaction is significant is generally not sufficient. Subsequent interpretation of that finding is needed, and that brings us back to the definition and meaning of interaction within the context of the experimental setting. In the following sections we shall define and discuss various types of variables that affect the response and the types of interactions among them. These notions will be illustrated for one particular experiment to which we shall return throughout our discussion. To help us in the interpretation of interactions we take a closer look at the definitions of two-factor and three-factor interactions in terms of simple effects. This is followed by a discussion of the nature of interactions and the role they play in the context of the experiment, from the statistical point of view and with regard to the interpretation of the results. After a general overview of how to dissect interactions we return to our example and perform a detailed analysis and interpretation of the data using SASr (SAS Institute, 2000), in particular PROC GLM and some of its options, such as SLICE. We mention also different methods for the analysis when interaction is actually present. We conclude the analytical part with a discussion of a useful graphical method when no error term is available for testing for interactions. Finally, we summarize the results with some recommendation reminding the reader that in all of this the experimental design is of fundamental importance.
- A Bayesian Hierarchical Approach to Dual Response Surface ModelingChen, Younan; Ye, Keying (Virginia Tech, 2005)In modern quality engineering, dual response surface methodology is a powerful tool to monitor an industrial process by using both the mean and the standard deviation of the measurements as the responses. The least squares method in regression is often used to estimate the coefficients in the mean and standard deviation models, and various decision criteria are proposed by researchers to find the optimal conditions. Based on the inherent hierarchical structure of the dual response problems, we propose a hierarchical Bayesian approach to model dual response surfaces. Such an approach is compared with two frequentist least squares methods by using two real data sets and simulated data.
- High Breakdown Estimation Methods for Phase I Multivariate Control ChartsJensen, Willis A.; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2005)The goal of Phase I monitoring of multivariate data is to identify multivariate outliers and step changes so that the estimated control limits are sufficiently accurate for Phase II monitoring. High breakdown estimation methods based on the minimum volume ellipsoid (MVE) or the minimum covariance determinant (MCD) are well suited to detecting multivariate outliers in data. However, they are difficult to implement in practice due to the extensive computation required to obtain the estimates. Based on previous studies, it is not clear which of these two estimation methods is best for control chart applications. The comprehensive simulation study here gives guidance for when to use which estimator, and control limits are provided. High breakdown estimation methods such as MCD and MVE, can be applied to a wide variety of multivariate quality control data.
- Robust Parameter Design: A Semi-Parametric ApproachPickle, Stephanie M.; Robinson, Timothy J.; Birch, Jeffrey B.; Anderson-Cook, Christine M. (Virginia Tech, 2005)Parameter design or robust parameter design (RPD) is an engineering methodology intended as a cost-effective approach for improving the quality of products and processes. The goal of parameter design is to choose the levels of the control variables that optimize a defined quality characteristic. An essential component of robust parameter design involves the assumption of well estimated models for the process mean and variance. Traditionally, the modeling of the mean and variance has been done parametrically. It is often the case, particularly when modeling the variance, that nonparametric techniques are more appropriate due to the nature of the curvature in the underlying function. Most response surface experiments involve sparse data. In sparse data situations with unusual curvature in the underlying function, nonparametric techniques often result in estimates with problematic variation whereas their parametric counterparts may result in estimates with problematic bias. We propose the use of semi-parametric modeling within the robust design setting, combining parametric and nonparametric functions to improve the quality of both mean and variance model estimation. The proposed method will be illustrated with an example and simulations.
- Cost Penalized Estimation and Prediction Evaluation for Split-Plot DesignsLiang, Li; Anderson-Cook, Christine M.; Robinson, Timothy J. (Virginia Tech, 2005-02-02)The use of response surface methods generally begins with a process or system involving a response y that depends on a set of k controllable input variables (factors) x₁, x₂,…,xk. To assess the effects of these factors on the response, an experiment is conducted in which the levels of the factors are varied and changes in the response are noted. The size of the experimental design (number of distinct level combinations of the factors as well as number of runs) depends on the complexity of the model the user wishes to fit. Limited resources due to time and/or cost constraints are inherent to most experiments, and hence, the user typically approaches experimentation with a desire to minimize the number of experimental trials while still being able to adequately estimate the underlying model.
- Speculations Concerning the First Ultraintelligent MachineGood, Irving John (Virginia Tech, 2005-03-05)The survival of man depends on the early construction of an ultraintelligent machine. In order to design an ultraintelligent machine we need to understand more about the human brain or human thought or both. In the following pages an attempt is made to take more of the magic out of the brain by means of a "subassembly" theory, which is a modification of Hebb's famous speculative cell-assembly theory. My belief is that the first ultraintelligent machine is most likely to incorporate vast artificial neural circuitry, and that its behavior will be partly explicable in terms of the subassembly theory. Later machines will all be designed by ultra-intelligent machines, and who am I to guess what principles they will devise? But probably Man will construct the deus ex machina in his own image.
- Profile Monitoring via Nonlinear Mixed ModelsJensen, Willis A.; Birch, Jeffrey B. (Virginia Tech, 2006)Profile monitoring is a relatively new technique in quality control best used where the process data follows a profile (or curve) at each time period. Little work has been done on the monitoring on nonlinear profiles. Previous work has assumed that the measurements within a profile are uncorrelated. To relax this restriction we propose the use of nonlinear mixed models to monitor the nonlinear profiles in order to account for the correlation structure. We evaluate the effectiveness of fitting separate nonlinear regression models to each profile in Phase I control chart applications for data with uncorrelated errors and no random effects. For data with random effects, we compare the effectiveness of charts based on a separate nonlinear regression approach versus those based on a nonlinear mixed model approach. Our proposed approach uses the separate nonlinear regression model fits to obtain a nonlinear mixed model fit. The nonlinear mixed model approach results in charts with good abilities to detect changes in Phase I data and has a simple to calculate control limit.
- A Semiparametric Approach to Dual ModelingRobinson, Timothy J.; Birch, Jeffrey B.; Starnes, B. Alden (Virginia Tech, 2006)In typical normal theory regression, the assumption of homogeneity of variances is often not appropriate. When heteroscedasticity exists, instead of treating the variances as a nuisance and transforming away the heterogeneity, the structure of the variances may be of interest and it is desirable to model the variances. Modeling both the mean and variance is commonly referred to as dual modeling. In parametric dual modeling, estimation of the mean and variance parameters are interrelated. When one or both of the models (the mean or variance model) are misspecified, parametric dual modeling can lead to faulty inferences. An alternative to parametric dual modeling is nonparametric dual modeling. However, nonparametric techniques often result in estimates that are characterized by high variability and ignore important knowledge that the user may have regarding the process. We develop a dual modeling approach [Dual Model Robust Regression (DMRR)], which is robust to user misspecification of the mean and/or variance models. Numerical and asymptotic results illustrate the advantages of DMRR over several other dual model procedures.
- Statistical Monitoring of Heteroscedastic Dose-Response Profiles from High-throughput ScreeningWilliams, J.D.; Birch, Jeffrey B.; Woodall, William H.; Ferry, N.M. (Virginia Tech, 2006)In pharmaceutical drug discovery and agricultural crop product discovery, in vivo bioassay experiments are used to identify promising compounds for further research. The reproducibility and accuracy of the bioassay is crucial to be able to correctly distinguish between active and inactive compounds. In the case of agricultural product discovery, a replicated dose-response of commercial crop protection products is assayed and used to monitor test quality. The activity of these compounds on the test organisms, the weeds, insects, or fungi, is characterized by a dose-response curve measured from the bioassay. These curves are used to monitor the quality of the bioassays. If undesirable conditions in the bioassay arise, such as equipment failure or problems with the test organisms, then a bioassay monitoring procedure is needed to quickly detect such issues. In this paper we illustrate a proposed nonlinear profile monitoring method to monitor the variability of multiple assays, the adequacy of the dose-response model chosen, and the estimated dose-response curves for aberrant cases in the presence of heteroscedasticity. We illustrate these methods with in vivo bioassay data collected over one year from DuPont Crop Protection.
- A Finite Mixture Approach for Identification of Geographic Regions with Distinctive Ecological Stressor-Response RelationshipsFarrar, David; Prins, Samantha C. Bates; Smith, Eric P. (Virginia Tech, 2006)We study a model-based clustering procedure that aims to identify geographic regions with distinctive relationships among ecological and environmental variables. We use a finite mixture model with a distinct linear regression model for each mixture component, relating a measure of environmental quality to multiple regressors. Component-specific values of regression coefficients are allowed, for a common set of regressors. We implement Bayesian inference jointly for the true partition and component regression parameters. We assume a known, prior classification of measurement locations into “clustering units,” where measurement locations belong to the same mixture component if they belong to the same clustering unit. A Metropolis algorithm, derived from a well-known Gibbs sampler, is used to sample the posterior distribution. Our approach to the label switching problem relies on constraints on cluster membership, selected based on statistics and graphical displays that do not depend upon cluster indexing. Our approach is applied to data representing streams and rivers in the state of Ohio, equating clustering units to river basins. The results appear to be interpretable given geographic features of possible ecological significance.
- Profile Monitoring via Linear Mixed ModelsJensen, Willis A.; Birch, Jeffrey B.; Woodall, William H. (Virginia Tech, 2006)Profile monitoring is a relatively new technique in quality control used when the product or process quality is best represented by a profile (or a curve) at each time period. The essential idea is often to model the profile via some parametric method and then monitor the estimated parameters over time to determine if there have been changes in the profiles. Previous modeling methods have not incorporated the correlation structure within the profiles. We propose the use of linear mixed models to monitor the linear profiles in order to account for the correlation structure within a profile. We consider various data scenarios and show using simulation when the linear mixed model approach is preferable to an approach that ignores the correlation structure. Our focus is on Phase I control chart applications.
- Approaches to the Label-Switching Problem of Classification, Based on Partition-Space Relabeling and Label-Invariant VisualizationFarrar, David (Virginia Tech, 2006-07-15)In the context of interest, a method of cluster analysis is used to classify a set of units into a fixed number of classes. Simulation procedures with various conceptual foundations may be used to evaluate uncertainty, stability, or sampling error of such a classification. However simulation approaches may be subject to a label-switching problem, when a likelihood function, posterior density, or some objective function is invariant under permutation of class labels. We suggest a relabeling algorithm that maximizes a simple measure of agreement among classifications. However, it is known that effective summaries and visualization tools can be based on sample concurrence fractions, which we define as sample fractions with given pairs of units falling in the same cluster, and which are invariant under permutation of class labels. We expand the study of concurrence fractions by presenting a matrix theory, which is employed in relabeling, as well as in elaboration of visualization tools. We explore an ordination approach treating concurrence fractions as similarities between pairs of units. A matrix result supports straightforward application of the method of principal coordinates, leading to ordination plots in which Euclidean distances between pairs of units have a simple relationship to concurrence fractions. The use of concurrence fractions complements relabeling, by providing an efficient initial labeling.
- Clustering Monitoring Stations Based on Two Rank-Based Criteria of Similarity of Temporal ProfilesFarrar, David; Smith, Eric (Virginia Tech, 2006-09)To support evaluation of water quality trends, a water quality variable may be measured at a series of points in time, at multiple stations. Summarization of such data and detection of spatiotemporal patterns may benefit from the application of multivariate methods. We propose hierarchical cluster analysis methods that group stations according to similarities among temporal profiles, relying on standard clustering algorithms combined with two proposed, rank-based criteria of similarity. An approach complementary to standard environmental trend evaluation relies on the incremental sum of squares clustering algorithm and a criterion of similarity related to a standard test for trend heterogeneity. Relevance to the context of trend evaluation is enhanced by transforming dendrogram edge lengths to reflect cluster homogeneity according to a standard test. However, the standard homogeneity criterion may not be sensitive to patterns with possible practical significance, such as region-specific reversal in the sign of a trend. We introduce a second criterion, which is based on concordance of changes in the water quality variable between pairs of stations from one measurement time to the next, that may be sensitive to a wider range of patterns. Our suggested criteria are illustrated and compared based on application to measurements of dissolved oxygen in the James River of Virginia, USA. Results have limited similarity between the two methods, but agree in identifying a cluster associated with a locality that is characterized by pronounced negative trends at multiple stations.
- Linear Mixed Model Robust RegressionWaterman, Megan J.; Birch, Jeffrey B.; Schabenberger, Oliver (Virginia Tech, 2006-11-05)Mixed models are powerful tools for the analysis of clustered data and many extensions of the classical linear mixed model with normally distributed response have been established. As with all parametric models, correctness of the assumed model is critical for the validity of the ensuing inference. An incorrectly specified parametric means model may be improved by using a local, or nonparametric, model. Two local models are proposed by a pointwise weighting of the marginal and conditional variance-covariance matrices. However, nonparametric models tend to fit to irregularities in the data and provide fits with high variance. Model robust regression techniques estimate mean response as a convex combination of a parametric and a nonparametric model fits to the data. It is a semiparametric method by which incomplete or incorrect specified parametric models can be improved through adding an appropriate amount of the nonparametric fit. We compare the approximate integrated mean square error of the parametric, nonparametric, and mixed model robust methods via a simulation study, and apply these methods to monthly wind speed data from counties in Ireland.
- Statistical Monitoring of Nonlinear Product and Process Quality ProfilesWilliams, James D.; Woodall, William H.; Birch, Jeffrey B. (Virginia Tech, 2007)In many quality control applications, use of a single (or several distinct) quality characteristic(s) is insufficient to characterize the quality of a produced item. In an increasing number of cases, a response curve (profile), is required. Such profiles can frequently be modeled using linear or nonlinear regression models. In recent research others have developed multivariate T² control charts and other methods for monitoring the coefficients in a simple linear regression model of a profile. However, little work has been done to address the monitoring of profiles that can be represented by a parametric nonlinear regression model. Here we extend the use of the T² control chart to monitor the coefficients resulting from a parametric nonlinear regression model fit to profile data. We give three general approaches to the formulation of the T² statistics and determination of the associated upper control limits for Phase I applications. We also consider the use of nonparametric regression methods and the use of metrics to measure deviations from a baseline profile. These approaches are illustrated using the vertical board density profile data presented in Walker and Wright[1].
- Error Models in Geographic Information Systems Vector Data Using Bayesian MethodsLove, Kimberly R.; Ye, Keying; Smith, Eric P.; Prisley, Stephen P. (Virginia Tech, 2007)Geographic Information Systems, or GIS, has been an evolving science since its introduction. Recently, many users have become concerned with the incorporation of error analysis into GIS map products. In particular, there is concern over the error in the location of features in vector data, which relies heavily on geographic x—; y— coordinates. Current work in the field is based on bivariate normal distributions for these points, and their extension to line and polygon features. We propose here to incorporate Bayesian methodology into this existing model, which presents multiple advantages over existing methods. Bayesian methods allow for the incorporation of expert and historical knowledge and reduce the number of observations required to perform an accurate analysis. This is essential to the field of GIS where multiple observations are rare and outside knowledge is often very informative. Bayesian methods also provide results that are more easily understood by the average GIS user. We explore this addition and provide several examples based on our calculations. We conclude by discussing the advantages of Bayesian analysis for GIS vector data, and discuss our ongoing work, which is being conducted under a research grant from the National Geospatial Intelligence Agency.
- Technical Report on the Evaluation of Median Rank Regression and Maximum Likelihood Estimation Techniques for a Two-Parameter Weibull DistributionOlteanu, Denisa; Freeman, Laura J. (Virginia Tech, 2008)Practitioners frequently model failure times in reliability analysis via the Weibull distribution. Often risk managers must make decisions after only a few failures. Thus, an important question is how to estimate the parameters of this distribution for small sample sizes. This study evaluates two methods: maximum likelihood estimation and median rank regression. Asymptotically, we know that maximum likelihood estimation has superior properties; however, this study seeks to evaluate these two methods for small numbers of failures and high degrees of censoring. Specifically, this paper compares the two estimation methods based on their ability to estimate the individual parameters, and the methods’ ability to predict future failures. The last section of the paper provides recommendations on which method to use based on sample size, the parameter values, and the degree of censoring present in the data.