Latent Class Model in Transportation Study
MetadataShow full item record
Statistics, as a critical component in transportation research, has been widely used to analyze driver safety, travel time, traffic flow and numerous other problems. Many of these popular topics can be interpreted as to establish the statistical models for the latent structure of data. Over the past several years, the interest in latent class models has continuously increased due to their great potential in solving practical problems. In this dissertation, I developed several latent class models to quantitatively analyze the hidden structure of transportation data and addressed related application issues. The first model is focused on the uncertainty of travel time, which is critical for assessing the reliability of transportation systems. Travel time is random in nature, and contains substantial variability, especially under congested traffic conditions. A Bayesian mixture model, with the ability to incorporate the influence from covariates such as traffic volume, has been proposed. This model advances the previous multi-state travel time reliability model in which the relationship between response and predictors was lacking. The Bayesian mixture travel time model, however, lack the power to accurately predict the future travel time. The analysis indicates that the independence assumption, which is difficult to justify in real data, could be a potential issue. Therefore, I proposed a Hidden Markov model to accommodate dependency structure, and the modeling results were significantly improved. The second and third parts of the dissertation focus on the driver safety identification. Given the demographic information and crash history, the number of crashes, as a type of count data, is commonly modeled by Poisson regression. However, the over-dispersion issue within the data implies that a single Poisson distribution is insufficient to depict the substantial variability. Poisson mixture model is proposed and applied to identify risky and safe drivers. The lower bound of the estimated misclassification rate is evaluated using the concept of overlap probability. Several theoretical results have been discussed regarding the overlap probability. I also introduced quantile regression based on discrete data to specifically model the high-risk drivers. In summary, the major objective of my research is to develop latent class methods and explore the hidden structure within the transportation data, and the approaches I employed can also be implemented for similar research questions in other areas.
- Doctoral Dissertations