Contributions to Efficient Statistical Modeling of Complex Data with Temporal Structures
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation will focus on three research projects: Neighborhood vector auto regression in multivariate time series, uncertainty quantification for agent-based modeling networked anagrams, and a scalable algorithm for multi-class classification. The first project studies the modeling of multivariate time series, with the applications in the environmental sciences and other areas. In this work, a so-called neighborhood vector autoregression (NVAR) model is proposed to efficiently analyze large-dimensional multivariate time series. The time series are assumed to have underlying distances among them based on the inherent setting of the problem. When this distance matrix is available or can be obtained, the proposed NVAR method is demonstrated to provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real application of stream nitrogen study. The second project focuses on the study of group anagram games. In a group anagram game, players are provided letters to form as many words as possible. In this work, the enhanced agent behavior models for networked group anagram games are built, exercised, and evaluated under an uncertainty quantification framework. Specifically, the game data for players is clustered based on their skill levels (forming words, requesting letters, and replying to requests), the multinomial logistic regressions for transition probabilities are performed, and the uncertainty is quantified within each cluster. The result of this process is a model where players are assigned different numbers of neighbors and different skill levels in the game. Simulations of ego agents with neighbors are conducted to demonstrate the efficacy of the proposed methods. The third project aims to develop efficient and scalable algorithms for multi-class classification, which achieve a balance between prediction accuracy and computing efficiency, especially in high dimensional settings. The traditional multinomial logistic regression becomes slow in high dimensional settings where the number of classes (M) and the number of features (p) is large. Our algorithms are computing efficiently and scalable to data with even higher dimensions. The simulation and case study results demonstrate that our algorithms have huge advantage over traditional multinomial logistic regressions, and maintains comparable prediction performance.