Analyzing Highway Safety Datasets: Simplifying Statistical Analyses from Sparse to Big Data

Data used for safety analyses have characteristics that are not found in other disciplines. In this research, we examine three characteristics that can negatively influence the outcome of these safety analyses: (1) crash data with many zero observations; (2) the rare occurrence of crash events (not necessarily related to many zero observations); and (3) big datasets. These characteristics can lead to biased results if inappropriate analysis tools are used. The objectives of this study are to simplify the analysis of highway safety data and develop guidelines and analysis tools for handling these unique characteristics. The research provides guidelines on when to aggregate data over time and space to reduce the number of zero observations; uses heuristics for selecting statistical models; proposes a bias adjustment method for improving the estimation of risk factors; develops a decision-adjusted modeling framework for predicting risk; and shows how cluster analyses can be used to extract relevant information from big data. The guidelines and tools were developed using simulation and observed datasets. Examples are provided to illustrate the guidelines and tools.

Keywords

safety, Big Data, sparse data, heuristics method, cluster analysis, finite sample bias adjustment, aggregated data, disaggregated data

Persistent link

http://hdl.handle.net/10919/95171

Collections

Safety through Disruption (SAFE-D) University Transportation Center (UTC)

Full item page