Extensions of Weighted Multidimensional Scaling with Statistics for Data Visualization and Process Monitoring

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

This dissertation is the compilation of two major innovations that rely on a common technique known as multidimensional scaling (MDS). MDS is a dimension-reduction method that takes high-dimensional data and creates low-dimensional versions.

Project 1: Visualizations are useful when learning from high-dimensional data. However, visualizations, just as any data summary, can be misleading when they do not incorporate measures of uncertainty; e.g., uncertainty from the data or the dimension reduction algorithm used to create the visual display. We incorporate uncertainty into visualizations created by a weighted version of MDS called WMDS. Uncertainty exists in these visualizations on the variable weights, the coordinates of the display, and the fit of WMDS. We quantify these uncertainties using Bayesian models in a method we call Informative Probabilistic WMDS (IP-WMDS). Visually, we display estimated uncertainty in the form of color and ellipses, and practically, these uncertainties reflect trust in WMDS. Our results show that these displays of uncertainty highlight different aspects of the visualization, which can help inform analysts.

Project 2: Analysis of network data has emerged as an active research area in statistics. Much of the focus of ongoing research has been on static networks that represent a single snapshot or aggregated historical data unchanging over time. However, most networks result from temporally-evolving systems that exhibit intrinsic dynamic behavior. Monitoring such temporally-varying networks to detect anomalous changes has applications in both social and physical sciences. In this work, we simulate data from models that rely on MDS, and we perform an evaluation study of the use of summary statistics for anomaly detection by incorporating principles from statistical process monitoring. In contrast to most previous studies, we deliberately incorporate temporal auto-correlation in our study. Other considerations in our comprehensive assessment include types and duration of anomaly, model type, and sparsity in temporally-evolving networks. We conclude that the use of summary statistics can be valuable tools for network monitoring and often perform better than more involved techniques.

Uncertainty, Bayesian, Multidimensional Scaling, Visualizations, Anomaly Detection, Dynamic Networks