### Browsing by Author "House, Leanna L."

Now showing 1 - 20 of 52

###### Results Per Page

###### Sort Options

- Adapting Response Surface Methods for the Optimization of Black-Box SystemsZielinski, Jacob Jonathan (Virginia Tech, 2010-08-16)
Show more Complex mathematical models are often built to describe a physical process that would otherwise be extremely difficult, too costly or sometimes impossible to analyze. Generally, these models require solutions to many partial differential equations. As a result, the computer codes may take a considerable amount of time to complete a single evaluation. A time tested method of analysis for such models is Monte Carlo simulation. These simulations, however, often require many model evaluations, making this approach too computationally expensive. To limit the number of experimental runs, it is common practice to model the departure as a Gaussian stochastic process (GaSP) to develop an emulator of the computer model. One advantage for using an emulator is that once a GaSP is fit to realized outcomes, the computer model is easy to predict in unsampled regions of the input space. This is an attempt to 'characterize' the overall model of the computer code. Most of the historical work on design and analysis of computer experiments focus on the characterization of the computer model over a large region of interest. However, many practitioners seek other objectives, such as input screening (Welch et al., 1992), mapping a response surface, or optimization (Jones et al., 1998). Only recently have researchers begun to consider these topics in the design and analysis of computer experiments. In this dissertation, we explore a more traditional response surface approach (Myers, Montgomery and Anderson-Cook, 2009) in conjunction with traditional computer experiment methods to search for the optimum response of a process. For global optimization, Jones, Schonlau, and Welch's (1998) Efficient Global Optimization (EGO) algorithm remains a benchmark for subsequent research of computer experiments. We compare the proposed method in this paper to this leading benchmark. Our goal is to show that response surface methods can be effective means towards estimating an optimum response in the computer experiment framework.Show more - Andromeda in Education: Studies on Student Collaboration and Insight Generation with Interactive Dimensionality ReductionTaylor, Mia Rachel (Virginia Tech, 2022-10-04)
Show more Andromeda is an interactive visualization tool that projects high-dimensional data into a scatterplot-like visualization using Weighted Multidimensional Scaling (WMDS). The visualization can be explored through surface-level interaction (viewing data values), parametric interaction (altering underlying parameterizations), and observation-level interaction (directly interacting with projected points). This thesis presents analyses on the collaborative utility of Andromeda in a middle school class and the insights college-level students generate when using Andromeda. The first study discusses how a middle school class collaboratively used Andromeda to explore and compare their engineering designs. The students analyzed their designs, represented as high-dimensional data, as a class. This study shows promise for introducing collaborative data analysis to middle school students in conjunction with other technical concepts such as the engineering design process. Participants in the study on college-level students were given a version of Andromeda, with access to different interactions, and were asked to generate insights on a dataset. By applying a novel visualization evaluation methodology on students' natural language insights, the results of this study indicate that students use different vocabulary supported by the interactions available to them, but not equally. The implications, as well as limitations, of these two studies are further discussed.Show more - Applying an Intrinsic Conditional Autoregressive Reference Prior for Areal DataPorter, Erica May (Virginia Tech, 2019-07-09)
Show more Bayesian hierarchical models are useful for modeling spatial data because they have flexibility to accommodate complicated dependencies that are common to spatial data. In particular, intrinsic conditional autoregressive (ICAR) models are commonly assigned as priors for spatial random effects in hierarchical models for areal data corresponding to spatial partitions of a region. However, selection of prior distributions for these spatial parameters presents a challenge to researchers. We present and describe ref.ICAR, an R package that implements an objective Bayes intrinsic conditional autoregressive prior on a vector of spatial random effects. This model provides an objective Bayesian approach for modeling spatially correlated areal data. ref.ICAR enables analysis of spatial areal data for a specified region, given user-provided data and information about the structure of the study region. The ref.ICAR package performs Markov Chain Monte Carlo (MCMC) sampling and outputs posterior medians, intervals, and trace plots for fixed effect and spatial parameters. Finally, the functions provide regional summaries, including medians and credible intervals for fitted values by subregion.Show more - Assessment of Model Validation, Calibration, and Prediction Approaches in the Presence of UncertaintyWhiting, Nolan Wagner (Virginia Tech, 2019-07-19)
Show more Model validation is the process of determining the degree to which a model is an accurate representation of the true value in the real world. The results of a model validation study can be used to either quantify the model form uncertainty or to improve/calibrate the model. However, the model validation process can become complicated if there is uncertainty in the simulation and/or experimental outcomes. These uncertainties can be in the form of aleatory uncertainties due to randomness or epistemic uncertainties due to lack of knowledge. Four different approaches are used for addressing model validation and calibration: 1) the area validation metric (AVM), 2) a modified area validation metric (MAVM) with confidence intervals, 3) the standard validation uncertainty from ASME VandV 20, and 4) Bayesian updating of a model discrepancy term. Details are given for the application of the MAVM for accounting for small experimental sample sizes. To provide an unambiguous assessment of these different approaches, synthetic experimental values were generated from computational fluid dynamics simulations of a multi-element airfoil. A simplified model was then developed using thin airfoil theory. This simplified model was then assessed using the synthetic experimental data. The quantities examined include the two dimensional lift and moment coefficients for the airfoil with varying angles of attack and flap deflection angles. Each of these validation/calibration approaches will be assessed for their ability to tightly encapsulate the true value in nature at locations both where experimental results are provided and prediction locations where no experimental data are available. Generally it was seen that the MAVM performed the best in cases where there is a sparse amount of data and/or large extrapolations and Bayesian calibration outperformed the others where there is an extensive amount of experimental data that covers the application domain.Show more - Bayesian Model Selection for Spatial Data and Cost-constrained ApplicationsPorter, Erica May (Virginia Tech, 2023-07-03)
Show more Bayesian model selection is a useful tool for identifying an appropriate model class, dependence structure, and valuable predictors for a wide variety of applications. In this work we consider objective Bayesian model selection where no subjective information is available to inform priors on model parameters a priori, specifically in the case of hierarchical models for spatial data, which can have complex dependence structures. We develop an approach using trained priors via fractional Bayes factors where standard Bayesian model selection methods fail to produce valid probabilities under improper reference priors. This enables researchers to concurrently determine whether spatial dependence between observations is apparent and identify important predictors for modeling the response. In addition to model selection with objective priors on model parameters, we also consider the case where the priors on the model space are used to penalize individual predictors a priori based on their costs. We propose a flexible approach that introduces a tuning parameter to cost-penalizing model priors that allows researchers to control the level of cost penalization to meet budget constraints and accommodate increasing sample sizes.Show more - Bayesian Visual Analytics: Interactive Visualization for High Dimensional DataHan, Chao (Virginia Tech, 2012-12-07)
Show more In light of advancements made in data collection techniques over the past two decades, data mining has become common practice to summarize large, high dimensional datasets, in hopes of discovering noteworthy data structures. However, one concern is that most data mining approaches rely upon strict criteria that may mask information in data that analysts may find useful. We propose a new approach called Bayesian Visual Analytics (BaVA) which merges Bayesian Statistics with Visual Analytics to address this concern. The BaVA framework enables experts to interact with the data and the feature discovery tools by modeling the "sense-making" process using Bayesian Sequential Updating. In this paper, we use BaVA idea to enhance high dimensional visualization techniques such as Probabilistic PCA (PPCA). However, for real-world datasets, important structures can be arbitrarily complex and a single data projection such as PPCA technique may fail to provide useful insights. One way for visualizing such a dataset is to characterize it by a mixture of local models. For example, Tipping and Bishop [Tipping and Bishop, 1999] developed an algorithm called Mixture Probabilistic PCA (MPPCA) that extends PCA to visualize data via a mixture of projectors. Based on MPPCA, we developped a new visualization algorithm called Covariance-Guided MPPCA which group similar covariance structured clusters together to provide more meaningful and cleaner visualizations. Another way to visualize a very complex dataset is using nonlinear projection methods such as the Generative Topographic Mapping algorithm(GTM). We developped an interactive version of GTM to discover interesting local data structures. We demonstrate the performance of our approaches using both synthetic and real dataset and compare our algorithms with existing ones.Show more - Be the Data: Embodied Visual AnalyticsChen, Xin (Virginia Tech, 2016-08-22)
Show more With the rise of big data, it is becoming increasingly important to educate students about data analytics. In particular, students without a strong mathematical background usually have an unenthusiastic attitude towards high-dimensional data and find it challenging to understand relevant complex analytical methods, such as dimension reduction. In this thesis, we present an embodied approach for visual analytics designed to teach students exploring alternative 2D projections of high dimensional data points using weighted multidimensional scaling. We proposed a novel application,*Be the Data*, to explore the possibilities of using human's embodied resources to learn from high dimensional data. In our system, each student embodies a data point and the position of students in a physical space represents a 2D projection of the high-dimensional data. Students physically moves in a room with respect to others to interact with alternative projections and receive visual feedback. We conducted educational workshops with students inexperienced in relevant data analytical methods. Our findings indicate that the students were able to learn about high-dimensional data and data analysis process despite their low level of knowledge about the complex analytical methods. We also applied the same techniques into social meetings to explain social gatherings and facilitate interactions.Show more - Bridging Cognitive Gaps Between User and Model in Interactive Dimension ReductionWang, Ming (Virginia Tech, 2020-05-05)
Show more High-dimensional data is prevalent in all domains but is challenging to explore. Analysis and exploration of high-dimensional data are important for people in numerous fields. To help people explore and understand high-dimensional data, Andromeda, an interactive visual analytics tool, has been developed. However, our analysis uncovered several cognitive gaps relating to the Andromeda system: users do not realize the necessity of explicitly highlighting all the relevant data points; users are not clear about the dimensional information in the Andromeda visualization; and the Andromeda model cannot capture user intentions when constructing and deconstructing clusters. In this study, we designed and implemented solutions to address these gaps. Specifically, for the gap in highlighting all the relevant data points, we introduced a foreground and background view and distance lines. Our user study with a group of undergraduate students revealed that the foreground and background views and distance lines could significantly alleviate the highlighting issue. For the gap in understanding visualization dimensions, we implemented a dimension-assist feature. The results of a second user study with students with various backgrounds suggested that the dimension-assist feature could make it easier for users to find the extremum in one dimension and to describe correlations among multiple dimensions; however, the dimension-assist feature had only a small impact on characterizing the data distribution and assisting users in understanding the meanings of the weighted multidimensional scaling (WMDS) plot axes. Regarding the gap in creating and deconstructing clusters, we implemented a solution utilizing random sampling. A quantitative analysis of the random sampling strategy was performed, and the results demonstrated that the strategy improved Andromeda's capabilities in constructing and deconstructing clusters. We also applied the random sampling to two-point manipulations, making the Andromeda system more flexible and adaptable to differing data exploration tasks. Limitations are discussed, and potential future research directions are identified.Show more - Bridging cognitive gaps between user and model in interactive dimension reductionWang, Ming; Wenskovitch, John; House, Leanna L.; Polys, Nicholas F.; North, Christopher L. (2021-06)
Show more Interactive machine learning (ML) systems are difficult to design because of the "Two Black Boxes" problem that exists at the interface between human and machine. Many algorithms that are used in interactive ML systems are black boxes that are presented to users, while the human cognition represents a second black box that can be difficult for the algorithm to interpret. These black boxes create cognitive gaps between the user and the interactive ML model. In this paper, we identify several cognitive gaps that exist in a previously-developed interactive visual analytics (VA) system, Andromeda, but are also representative of common problems in other VA systems. Our goal with this work is to open both black boxes and bridge these cognitive gaps by making usability improvements to the original Andromeda system. These include designing new visual features to help people better understand how Andromeda processes and interacts with data, as well as improving the underlying algorithm so that the system can better implement the intent of the user during the data exploration process. We evaluate our designs through both qualitative and quantitative analysis, and the results confirm that the improved Andromeda system outperforms the original version in a series of high-dimensional data analysis tasks. (C) 2021 The Author(s). Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University Press Co. Ltd.Show more - Community Structure and Function of Amphibian Skin Microbes: An Experiment with Bullfrogs Exposed to a Chytrid FungusWalke, Jenifer B.; Becker, Matthew H.; Loftus, Stephen C.; House, Leanna L.; Teotonio, Thais L.; Minbiole, Kevin P. C.; Belden, Lisa K. (PLOS, 2015-10-07)
Show more The vertebrate microbiome contributes to disease resistance, but few experiments have examined the link between microbiome community structure and disease resistance functions. Chytridiomycosis, a major cause of amphibian population declines, is a skin disease caused by the fungus, Batrachochytrium dendrobatidis (Bd). In a factorial experiment, bullfrog skin microbiota was reduced with antibiotics, augmented with an anti-Bd bacterial isolate (Janthinobacterium lividum), or unmanipulated, and individuals were then either exposed or not exposed to Bd. We found that the microbial community structure of individual frogs prior to Bd exposure influenced Bd infection intensity one week following exposure, which, in turn, was negatively correlated with proportional growth during the experiment. Microbial community structure and function differed among unmanipulated, antibiotic-treated, and augmented frogs only when frogs were exposed to Bd. Bd is a selective force on microbial community structure and function, and beneficial states of microbial community structure may serve to limit the impacts of infection.Show more - Computer Experimental Design for Gaussian Process SurrogatesZhang, Boya (Virginia Tech, 2020-09-01)
Show more With a rapid development of computing power, computer experiments have gained popularity in various scientific fields, like cosmology, ecology and engineering. However, some computer experiments for complex processes are still computationally demanding. A surrogate model or emulator, is often employed as a fast substitute for the simulator. Meanwhile, a common challenge in computer experiments and related fields is to efficiently explore the input space using a small number of samples, i.e., the experimental design problem. This dissertation focuses on the design problem under Gaussian process surrogates. The first work demonstrates empirically that space-filling designs disappoint when the model hyperparameterization is unknown, and must be estimated from data observed at the chosen design sites. A purely random design is shown to be superior to higher-powered alternatives in many cases. Thereafter, a new family of distance-based designs are proposed and their superior performance is illustrated in both static (one-shot design) and sequential settings. The second contribution is motivated by an agent-based model(ABM) of delta smelt conservation. The ABM is developed to assist in a study of delta smelt life cycles and to understand sensitivities to myriad natural variables and human interventions. However, the input space is high-dimensional, running the simulator is time-consuming, and its outputs change nonlinearly in both mean and variance. A batch sequential design scheme is proposed, generalizing one-at-a-time variance-based active learning, as a means of keeping multi-core cluster nodes fully engaged with expensive runs. The acquisition strategy is carefully engineered to favor selection of replicates which boost statistical and computational efficiencies. Design performance is illustrated on a range of toy examples before embarking on a smelt simulation campaign and downstream high-fidelity input sensitivity analysis.Show more - A Cost-Effective Semi-Automated Approach for Comprehensive Event ExtractionSaraf, Parang (Virginia Tech, 2018-04-26)
Show more Automated event extraction from free text remains an open problem, particularly when the goal is to identify all relevant events. Manual extraction is currently the only alternative for comprehensive and reliable extraction. Therefore, it is required to have a system that can comprehensively extract events reported in news articles (high recall) and is also scalable enough to handle a large number of articles. In this dissertation, we explore various methods to develop an event extraction system that can mitigate these challenges. We primarily investigate three major problems related to event extraction as follows. (i) What are the strengths and weaknesses of the automated event extractors? A thorough understanding of what can be automated with high success and what leads to common pitfalls is crucial before we could develop a superior event extraction system. (ii) How can we build a hybrid event extraction system that can bridge the gap between manual and automated event extraction? Hybrid extraction is a semi-automated approach that uses an ecosystem of machine learning models along with a carefully designed user interface for extracting events. Since this method is semi-automated it also requires a meticulous understanding of user behavior in order to identify tasks that humans can perform with ease while diverting the more tedious task to the machine learning methods (iii) Finally, we explore methods for displaying extracted events that could simplify the analytical and inference generation processes for an analyst. We particularly aim to develop visualizations that would allow analysts can perform macro and micro level analysis of significant societal events.Show more - Designing and Evaluating Object-Level Interaction to Support Human-Model Communication in Data AnalysisSelf, Jessica Zeitz (Virginia Tech, 2016-05-09)
Show more High-dimensional data appear in all domains and it is challenging to explore. As the number of dimensions in datasets increases, the harder it becomes to discover patterns and develop insights. Data analysis and exploration is an important skill given the amount of data collection in every field of work. However, learning this skill without an understanding of high-dimensional data is challenging. Users naturally tend to characterize data in simplistic one-dimensional terms using metrics such as mean, median, mode. Real-world data is more complex. To gain the most insight from data, users need to recognize and create high-dimensional arguments. Data exploration methods can encourage thinking beyond traditional one-dimensional insights. Dimension reduction algorithms, such as multidimensional scaling, support data explorations by reducing datasets to two dimensions for visualization. Because these algorithms rely on underlying parameterizations, they may be manipulated to assess the data from multiple perspectives. Manipulating can be difficult for users without a strong knowledge of the underlying algorithms. Visual analytics tools that afford object-level interaction (OLI) allow for generation of more complex insights, despite inexperience with multivariate data or the underlying algorithm. The goal of this research is to develop and test variations on types of interactions for interactive visual analytic systems that enable users to tweak model parameters directly or indirectly so that they may explore high-dimensional data. To study interactive data analysis, we present an interface, Andromeda, that enables non-experts of statistical models to explore domain-specific, high-dimensional data. This application implements interactive weighted multidimensional scaling (WMDS) and allows for both parametric and observation-level interaction to provide in-depth data exploration. We performed multiple user studies to answer how parametric and object-level interaction aid in data analysis. With each study, we found usability issues and then designed solutions for the next study. With each critique we uncovered design principles of effective, interactive, visual analytic tools. The final part of this research presents these principles supported by the results of our multiple informal and formal usability studies. The established design principles focus on human-centered usability for developing interactive visual analytic systems that enable users to analyze high-dimensional data through object-level interaction.Show more - Detection of Latent Heteroscedasticity and Group-Based Regression Effects in Linear Models via Bayesian Model SelectionMetzger, Thomas Anthony (Virginia Tech, 2019-08-22)
Show more Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a "latent grouping factor'' to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent group structures is frequently unknown a priori to the researcher, we use fractional Bayes factor methods and mixture g-priors to overcome lack of prior information. We provide an R package, slgf, that implements our methodology in practice, and demonstrate its usage in practice.Show more - Dimension Reduction for Multinomial Models Via a Kolmogorov-Smirnov Measure (KSM)Loftus, Stephen C.; House, Leanna L.; Hughey, Myra C.; Walke, Jenifer B.; Becker, Matthew H.; Belden, Lisa K. (Virginia Tech, 2015)
Show more Due to advances in technology and data collection techniques, the number of measurements often exceeds the number of samples in ecological datasets. As such, standard models that attempt to assess the relationship between variables and a response are inapplicable and require a reduction in the number of dimensions to be estimable. Several filtering methods exist to accomplish this, including Indicator Species Analyses and Sure Information Screening, but these techniques often have questionable asymptotic properties or are not readily applicable to data with multinomial responses. As such, we propose and validate a new metric called the Kolmogorov-Smirnov Measure (KSM) to be used for filtering variables. In the paper, we develop the KSM, investigate its asymptotic properties, and compare it to group equalized Indicator Species Values through simulation studies and application to a well-known biological dataset.Show more - Efficient computer experiment designs for Gaussian process surrogatesCole, David Austin (Virginia Tech, 2021-06-28)
Show more Due to advancements in supercomputing and algorithms for finite element analysis, today's computer simulation models often contain complex calculations that can result in a wealth of knowledge. Gaussian processes (GPs) are highly desirable models for computer experiments for their predictive accuracy and uncertainty quantification. This dissertation addresses GP modeling when data abounds as well as GP adaptive design when simulator expense severely limits the amount of collected data. For data-rich problems, I introduce a localized sparse covariance GP that preserves the flexibility and predictive accuracy of a GP's predictive surface while saving computational time. This locally induced Gaussian process (LIGP) incorporates latent design points, inducing points, with a local Gaussian process built from a subset of the data. Various methods are introduced for the design of the inducing points. LIGP is then extended to adapt to stochastic data with replicates, estimating noise while relying upon the unique design locations for computation. I also address the goal of identifying a contour when data collection resources are limited through entropy-based adaptive design. Unlike existing methods, the entropy-based contour locator (ECL) adaptive design promotes exploration in the design space, performing well in higher dimensions and when the contour corresponds to a high/low quantile. ECL adaptive design can join with importance sampling for the purpose of reducing uncertainty in reliability estimation.Show more - Expert-Guided Generative Topographical Modeling with Visual to Parametric InteractionHan, Chao; House, Leanna L.; Leman, Scotland C. (PLOS, 2016-02-23)
Show more Introduced by Bishop et al. in 1996, Generative Topographic Mapping (GTM) is a powerful nonlinear latent variable modeling approach for visualizing high-dimensional data. It has shown useful when typical linear methods fail. However, GTM still suffers from drawbacks. Its complex parameterization of data make GTM hard to fit and sensitive to slight changes in the model. For this reason, we extend GTM to a visual analytics framework so that users may guide the parameterization and assess the data from multiple GTM perspectives. Specifically, we develop the theory and methods for Visual to Parametric Interaction (V2PI) with data using GTM visualizations. The result is a dynamic version of GTM that fosters data exploration. We refer to the new version as V2PI-GTM. In this paper, we develop V2PI-GTM in stages and demonstrate its benefits within the context of a text mining case study.Show more - Extensions of Weighted Multidimensional Scaling with Statistics for Data Visualization and Process MonitoringKodali, Lata (Virginia Tech, 2020-09-04)
Show more This dissertation is the compilation of two major innovations that rely on a common technique known as multidimensional scaling (MDS). MDS is a dimension-reduction method that takes high-dimensional data and creates low-dimensional versions. Project 1: Visualizations are useful when learning from high-dimensional data. However, visualizations, just as any data summary, can be misleading when they do not incorporate measures of uncertainty; e.g., uncertainty from the data or the dimension reduction algorithm used to create the visual display. We incorporate uncertainty into visualizations created by a weighted version of MDS called WMDS. Uncertainty exists in these visualizations on the variable weights, the coordinates of the display, and the fit of WMDS. We quantify these uncertainties using Bayesian models in a method we call Informative Probabilistic WMDS (IP-WMDS). Visually, we display estimated uncertainty in the form of color and ellipses, and practically, these uncertainties reflect trust in WMDS. Our results show that these displays of uncertainty highlight different aspects of the visualization, which can help inform analysts. Project 2: Analysis of network data has emerged as an active research area in statistics. Much of the focus of ongoing research has been on static networks that represent a single snapshot or aggregated historical data unchanging over time. However, most networks result from temporally-evolving systems that exhibit intrinsic dynamic behavior. Monitoring such temporally-varying networks to detect anomalous changes has applications in both social and physical sciences. In this work, we simulate data from models that rely on MDS, and we perform an evaluation study of the use of summary statistics for anomaly detection by incorporating principles from statistical process monitoring. In contrast to most previous studies, we deliberately incorporate temporal auto-correlation in our study. Other considerations in our comprehensive assessment include types and duration of anomaly, model type, and sparsity in temporally-evolving networks. We conclude that the use of summary statistics can be valuable tools for network monitoring and often perform better than more involved techniques.Show more - Frequentist-Bayesian Hybrid Tests in Semi-parametric and Non-parametric Models with Low/High-Dimensional CovariateXu, Yangyi (Virginia Tech, 2014-12-03)
Show more We provide a Frequentist-Bayesian hybrid test statistic in this dissertation for two testing problems. The first one is to design a test for the significant differences between non-parametric functions and the second one is to design a test allowing any departure of predictors of high dimensional X from constant. The implementation is also given in construction of the proposal test statistics for both problems. For the first testing problem, we consider the statistical difference among massive outcomes or signals to be of interest in many diverse fields including neurophysiology, imaging, engineering, and other related fields. However, such data often have nonlinear system, including to row/column patterns, having non-normal distribution, and other hard-to-identifying internal relationship, which lead to difficulties in testing the significance in difference between them for both unknown relationship and high-dimensionality. In this dissertation, we propose an Adaptive Bayes Sum Test capable of testing the significance between two nonlinear system basing on universal non-parametric mathematical decomposition/smoothing components. Our approach is developed from adapting the Bayes sum test statistic by Hart (2009). Any internal pattern is treated through Fourier transformation. Resampling techniques are applied to construct the empirical distribution of test statistic to reduce the effect of non-normal distribution. A simulation study suggests our approach performs better than the alternative method, the Adaptive Neyman Test by Fan and Lin (1998). The usefulness of our approach is demonstrated with an application in the identification of electronic chips as well as an application to test the change of pattern of precipitations. For the second testing problem, currently numerous statistical methods have been developed for analyzing high-dimensional data. These methods mainly focus on variable selection approach, but are limited for purpose of testing with high-dimensional data, and often are required to have explicit derivative likelihood functions. In this dissertation, we propose ``Hybrid Omnibus Test'' for high-dimensional data testing purpose with much less requirements. Our Hybrid Omnibus Test is developed under semi-parametric framework where likelihood function is no longer necessary. Our Hybrid Omnibus Test is a version of Freqentist-Bayesian hybrid score-type test for a functional generalized partial linear single index model, which has link being functional of predictors through a generalized partially linear single index. We propose an efficient score based on estimating equation to the mathematical difficulty in likelihood derivation and construct our Hybrid Omnibus Test. We compare our approach with a empirical likelihood ratio test and Bayesian inference based on Bayes factor using simulation study in terms of false positive rate and true positive rate. Our simulation results suggest that our approach outperforms in terms of false positive rate, true positive rate, and computation cost in high-dimensional case and low-dimensional case. The advantage of our approach is also demonstrated by published biological results with application to a genetic pathway data of type II diabetes.Show more - Impact of Ignoring Nested Data Structures on Ability EstimationShropshire, Kevin O'Neil (Virginia Tech, 2014-06-03)
Show more The literature is clear that intentional or unintentional clustering of data elements typically results in the inflation of the estimated standard error of fixed parameter estimates. This study is unique in that it examines the impact of multilevel data structures on subject ability which are random effect predictions known as empirical Bayes estimates in the one-parameter IRT / Rasch model. The literature on the impact of complex survey design on latent trait models is mixed and there is no "best practice" established regarding how to handle this situation. A simulation study was conducted to address two questions related to ability estimation. First, what impacts does design based clustering have with respect to desirable statistical properties when estimating subject ability with the one-parameter IRT / Rasch model? Second, since empirical Bayes estimators have shrinkage properties, what impacts does clustering of first-stage sampling units have on measurement validity-does the first-stage sampling unit impact the ability estimate, and if so, is this desirable and equitable? Two models were fit to a factorial experimental design where the data were simulated over various conditions. The first model Rasch model formulated as a HGLM ignores the sample design (incorrect model) while the second incorporates a first-stage sampling unit (correct model). Study findings generally showed that the two models were comparable with respect to desirable statistical properties under a majority of the replicated conditions-more measurement error in ability estimation is found when the intra-class correlation is high and the item pool is small. In practice this is the exception rather than the norm. However, it was found that the empirical Bayes estimates were dependent upon the first-stage sampling unit raising the issue of equity and fairness in educational decision making. A real-world complex survey design with binary outcome data was also fit with both models. Analysis of the data supported the simulation design results which lead to the conclusion that modeling binary Rasch data may resort to a policy tradeoff between desirable statistical properties and measurement validity.Show more

- «
- 1 (current)
- 2
- 3
- »