Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Doctoral Dissertations
    • View Item
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Doctoral Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Efficient Algorithms for Mining Large Spatio-Temporal Data

    Thumbnail
    View/Open
    Chen_F_D_2013.pdf (2.448Mb)
    Downloads: 12314
    Date
    2013-01-21
    Author
    Chen, Feng
    Metadata
    Show full item record
    Abstract
    Knowledge discovery on spatio-temporal datasets has attracted
    growing interests. Recent advances on remote sensing technology mean
    that massive amounts of spatio-temporal data are being collected,
    and its volume keeps increasing at an ever faster pace. It becomes
    critical to design efficient algorithms for identifying novel and
    meaningful patterns from massive spatio-temporal datasets. Different
    from the other data sources, this data exhibits significant
    space-time statistical dependence, and the assumption of i.i.d. is
    no longer valid. The exact modeling of space-time dependence will
    render the exponential growth of model complexity as the data size
    increases. This research focuses on the construction of efficient
    and effective approaches using approximate inference techniques for
    three main mining tasks, including spatial outlier detection, robust
    spatio-temporal prediction, and novel applications to real world
    problems.

    Spatial novelty patterns, or spatial outliers, are those data points
    whose characteristics are markedly different from their spatial
    neighbors. There are two major branches of spatial outlier detection
    methodologies, which can be either global Kriging based or local
    Laplacian smoothing based. The former approach requires the exact
    modeling of spatial dependence, which is time extensive; and the
    latter approach requires the i.i.d. assumption of the smoothed
    observations, which is not statistically solid. These two approaches
    are constrained to numerical data, but in real world applications we
    are often faced with a variety of non-numerical data types, such as
    count, binary, nominal, and ordinal. To summarize, the main research
    challenges are: 1) how much spatial dependence can be eliminated via
    Laplace smoothing; 2) how to effectively and efficiently detect
    outliers for large numerical spatial datasets; 3) how to generalize
    numerical detection methods and develop a unified outlier detection
    framework suitable for large non-numerical datasets; 4) how to
    achieve accurate spatial prediction even when the training data has
    been contaminated by outliers; 5) how to deal with spatio-temporal
    data for the preceding problems.

    To address the first and second challenges, we mathematically
    validated the effectiveness of Laplacian smoothing on the
    elimination of spatial autocorrelations. This work provides
    fundamental support for existing Laplacian smoothing based methods.
    We also discovered a nontrivial side-effect of Laplacian smoothing,
    which ingests additional spatial variations to the data due to
    convolution effects. To capture this extra variability, we proposed
    a generalized local statistical model, and designed two fast forward
    and backward outlier detection methods that achieve a better balance
    between computational efficiency and accuracy than most existing
    methods, and are well suited to large numerical spatial datasets.

    We addressed the third challenge by mapping non-numerical variables
    to latent numerical variables via a link function, such as logit
    function used in logistic regression, and then utilizing
    error-buffer artificial variables, which follow a Student-t
    distribution, to capture the large valuations caused by outliers. We
    proposed a unified statistical framework, which integrates the
    advantages of spatial generalized linear mixed model, robust spatial
    linear model, reduced-rank dimension reduction, and Bayesian
    hierarchical model. A linear-time approximate inference algorithm
    was designed to infer the posterior distribution of the error-buffer
    artificial variables conditioned on observations. We demonstrated
    that traditional numerical outlier detection methods can be directly
    applied to the estimated artificial variables for outliers
    detection. To the best of our knowledge, this is the first
    linear-time outlier detection algorithm that supports a variety of
    spatial attribute types, such as binary, count, ordinal, and
    nominal.

    To address the fourth and fifth challenges, we proposed a robust
    version of the Spatio-Temporal Random Effects (STRE) model, namely
    the Robust STRE (R-STRE) model. The regular STRE model is a recently
    proposed statistical model for large spatio-temporal data that has a
    linear order time complexity, but is not best suited for
    non-Gaussian and contaminated datasets. This deficiency can be
    systemically addressed by increasing the robustness of the model
    using heavy-tailed distributions, such as the Huber, Laplace, or
    Student-t distribution to model the measurement error, instead of
    the traditional Gaussian. However, the resulting R-STRE model
    becomes analytical intractable, and direct application of
    approximate inferences techniques still has a cubic order time
    complexity. To address the computational challenge, we reformulated
    the prediction problem as a maximum a posterior (MAP) problem with a
    non-smooth objection function, transformed it to a equivalent
    quadratic programming problem, and developed an efficient
    interior-point numerical algorithm with a near linear order
    complexity. This work presents the first near linear time robust
    prediction approach for large spatio-temporal datasets in both
    offline and online cases.
    URI
    http://hdl.handle.net/10919/19220
    Collections
    • Doctoral Dissertations [16358]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us