Semi-Supervised Anomaly Detection and Heterogeneous Covariance Estimation for Gaussian Processes
In this thesis, we propose a statistical framework for estimating correlation between sensor systems measuring diverse physical phenomenon. We consider systems that measure at different temporal frequencies and measure responses with different dimensionalities. Our goal is to provide estimates of correlation between all pairs of sensors and use this information to flag potentially anomalous readings.
Our anomaly detection method consists of two primary components: dimensionality reduction through projection and Gaussian process (GP) regression. We use non-metric multidimensional scaling to project a partially observed and potentially non-definite covariance matrix into a low dimensional manifold. The projection is estimated in such a way that positively correlated sensors are close to each other and negatively correlated sensors are distant. We then fit a Gaussian process given these positions and use it to make predictions at our observed locations. Because of the large amount of data we wish to consider, we develop methods to scale GP estimation by taking advantage of the replication structure in the data.
Finally, we introduce a semi-supervised method to incorporate expert input into a GP model. We are able to learn a probability surface defined over locations and responses based on sets of points labeled by an analyst as either anomalous or nominal. This allows us to discount the influence of points resembling anomalies without removing them based on a threshold.