Abnormal Pattern Recognition in Spatial Data
In the recent years, abnormal spatial pattern recognition has received a great deal of attention from both industry and academia, and has become an important branch of data mining. Abnormal spatial patterns, or spatial outliers, are those observations whose characteristics are markedly different from their spatial neighbors. The identification of spatial outliers can be used to reveal hidden but valuable knowledge in many applications. For example, it can help locate extreme meteorological events such as tornadoes and hurricanes, identify aberrant genes or tumor cells, discover highway traffic congestion points, pinpoint military targets in satellite images, determine possible locations of oil reservoirs, and detect water pollution incidents.
Numerous traditional outlier detection methods have been developed, but they cannot be directly applied to spatial data in order to extract abnormal patterns. Traditional outlier detection mainly focuses on "global comparison" and identifies deviations from the remainder of the entire data set. In contrast, spatial outlier detection concentrates on discovering neighborhood instabilities that break the spatial continuity. In recent years, a number of techniques have been proposed for spatial outlier detection. However, they have the following limitations. First, most of them focus primarily on single-attribute outlier detection. Second, they may not accurately locate outliers when multiple outliers exist in a cluster and correlate with each other. Third, the existing algorithms tend to abstract spatial objects as isolated points and do not consider their geometrical and topological properties, which may lead to inexact results.
This dissertation reports a study of the problem of abnormal spatial pattern recognition, and proposes a suite of novel algorithms. Contributions include: (1) formal definitions of various spatial outliers, including single-attribute outliers, multi-attribute outliers, and region outliers; (2) a set of algorithms for the accurate detection of single-attribute spatial outliers; (3) a systematic approach to identifying and tracking region outliers in continuous meteorological data sequences; (4) a novel Mahalanobis-distance-based algorithm to detect outliers with multiple attributes; (5) a set of graph-based algorithms to identify point outliers and region outliers; and (6) extensive analysis of experiments on several spatial data sets (e.g., West Nile virus data and NOAA meteorological data) to evaluate the effectiveness and efficiency of the proposed algorithms.