Semi-Supervised Anomaly Detection and Heterogeneous Covariance Estimation for Gaussian Processes
Abstract
In this thesis, we propose a statistical framework for estimating correlation between sensor
systems measuring diverse physical phenomenon. We consider systems that measure at
different temporal frequencies and measure responses with different dimensionalities. Our
goal is to provide estimates of correlation between all pairs of sensors and use this information
to flag potentially anomalous readings.
Our anomaly detection method consists of two primary components: dimensionality reduction
through projection and Gaussian process (GP) regression. We use non-metric multidimensional
scaling to project a partially observed and potentially non-definite covariance
matrix into a low dimensional manifold. The projection is estimated in such a way that
positively correlated sensors are close to each other and negatively correlated sensors are
distant. We then fit a Gaussian process given these positions and use it to make predictions
at our observed locations. Because of the large amount of data we wish to consider, we
develop methods to scale GP estimation by taking advantage of the replication structure in
the data.
Finally, we introduce a semi-supervised method to incorporate expert input into a GP model.
We are able to learn a probability surface defined over locations and responses based on sets
of points labeled by an analyst as either anomalous or nominal. This allows us to discount
the influence of points resembling anomalies without removing them based on a threshold.
Collections
- Doctoral Dissertations [14913]