Browsing by Author "Liu, Meimei"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Graph auto-encoding brain networks with applications to analyzing large-scale brain imaging datasetsLiu, Meimei; Zhang, Zhengwu; Dunson, David B. (Academic Press-Elsevier, 2021-12-15)There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer their relationships to human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to study the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference, and computing efficiency. We found that the structural connectome has a stronger association with a wide range of human cognitive traits than was apparent using previous approaches.
- Nonparametric distributed learning under general designsLiu, Meimei; Shang, Zuofeng; Cheng, Guang (2020-08-21)This paper focuses on the distributed learning in nonparametric regression framework. With sufficient computational resources, the efficiency of distributed algorithms improves as the number of machines increases. We aim to analyze how the number of machines affects statistical optimality. We establish an upper bound for the number of machines to achieve statistical minimax in two settings: nonparametric estimation and hypothesis testing. Our framework is general compared with existing work. We build a unified frame in distributed inference for various regression problems, including thin-plate splines and additive regression under random design: univariate, multivariate, and diverging-dimensional designs. The main tool to achieve this goal is a tight bound of an empirical process by introducing the Green function for equivalent kernels. Thorough numerical studies back theoretical findings.
- Statistical learning for cyber physical systemQian, Chen (Virginia Tech, 2024-07-29)Cyber-Physical Systems represent a critical intersection of physical infrastructure and digital technologies. Ensuring the safety and reliability of these interconnected systems is vital for mitigating risks and enhancing overall system safety. In recent decades, the transportation domain has seen significant adoption of cyber-physical systems, such as automated vehicles. This dissertation will focus on the application of cyber-physical systems in transportation. Statistical learning techniques offer a powerful approach to analyzing complex transportation data, providing insights that enhance safety measures and operational efficiencies. This dissertation underscores the pivotal role of statistical learning in advancing safety within cyber physical transportation systems. By harnessing the power of data-driven insights, predictive modeling, and advanced analytics, this research contributes to the development of smarter, safer, and more resilient transportation systems. Chapter 2 proposes a novel stochastic jump-based model to capture the driving dynamics of safety-critical events. The identification of such events is challenging due to their complex nature and the high frequency kinematic data generated by modern data acquisition systems. This chapter addresses these challenges by developing a model that effectively represents the stochastic nature of driving behaviors and assume the happening of a jump process will lead to safety-critical situations. To tackle the issue of rarity in crash data, Chapter 3 introduces a variational inference of extremes approach based on a generalized additive neural network. This method leverages statistical learning to infer the distribution of extreme events, allowing for better generalization ability to unseen data despite the limited availability of crash events. By focusing on extreme value theory, this chapter enhances statistical learning's ability to predict and understand rare but high-impact events. Chapter 4 shifts focus to the safety validation of cyber-physical transportation systems, requiring a unique approach due to their advanced and complex nature. This chapter proposes a kernel-based method that simultaneously satisfies representativeness and criticality for safety verification. This method ensures that the safety evaluation process covers a wide range of scenarios while focusing on those most likely to lead to critical outcomes. In Chapter 5, a deep generative model is proposed to identify the boundary of safety-critical events. This model uses the encoder component to reduce high-dimensional input data into lower-dimensional latent representations, which are then utilized to generate new driving scenarios and predict their associated risks. The decoder component reconstructs the original high-dimensional case parameters, allowing for a comprehensive understanding of the factors contributing to safety-critical events. The chapter also introduces an adversarial perturbation approach to efficiently determine the boundary of risk, significantly reducing computational time while maintaining precision. Overall, this dissertation demonstrates the potential of using advanced statistical learning methods to enhance the safety and reliability of cyber-physical transportation systems. By developing innovative models and methodologies, this dissertation provides valuable tools and theoretical foundations for risk prediction, safety validation, and proactive management of transportation systems in an increasingly digital and interconnected world.
- Statistical Learning for Sequential Unstructured DataXu, Jingbin (Virginia Tech, 2024-07-30)Unstructured data, which cannot be organized into predefined structures, such as texts, human behavior status, and system logs, often presented in a sequential format with inherent dependencies. Probabilistic model are commonly used to capture these dependencies in the data generation process through latent parameters and can naturally extend into hierarchical forms. However, these models rely on the correct specification of assumptions about the sequential data generation process, which often limits their scalable learning abilities. The emergence of neural network tools has enabled scalable learning for high-dimensional sequential data. From an algorithmic perspective, efforts are directed towards reducing dimensionality and representing unstructured data units as dense vectors in low-dimensional spaces, learned from unlabeled data, a practice often referred to as numerical embedding. While these representations offer measures of similarity, automated generalizations, and semantic understanding, they frequently lack the statistical foundations required for explicit inference. This dissertation aims to develop statistical inference techniques tailored for the analysis of unstructured sequential data, with their application in the field of transportation safety. The first part of dissertation presents a two-stage method. It adopts numerical embedding to map large-scale unannotated data into numerical vectors. Subsequently, a kernel test using maximum mean discrepancy is employed to detect abnormal segments within a given time period. Theoretical results showed that learning from numerical vectors is equivalent to learning directly through the raw data. A real-world example illustrates how driver mismatched visual behavior occurred during a lane change. The second part of the dissertation introduces a two-sample test for comparing text generation similarity. The hypothesis tested is whether the probabilistic mapping measures that generate textual data are identical for two groups of documents. The proposed test compares the likelihood of text documents, estimated through neural network-based language models under the autoregressive setup. The test statistic is derived from an estimation and inference framework that first approximates data likelihood with an estimation set before performing inference on the remaining part. The theoretical result indicates that the test statistic's asymptotic behavior approximates a normal distribution under mild conditions. Additionally, a multiple data-splitting strategy is utilized, combining p-values into a unified decision to enhance the test's power. The third part of the dissertation develops a method to measure differences in text generation between a benchmark dataset and a comparison dataset, focusing on word-level generation variations. This method uses the sliced-Wasserstein distance to compute the contextual discrepancy score. A resampling method establishes a threshold to screen the scores. Crash report narratives are analyzed to compare crashes involving vehicles equipped with level 2 advanced driver assistance systems and those involving human drivers.