Investigating the Applicability of Machine Learning to the Development of Safety Performance Functions

Loading...
Thumbnail Image

Files

Report (2.41 MB)
Downloads: 21

TR Number

Date

2026-04-07

Journal Title

Journal ISSN

Volume Title

Publisher

National Surface Transportation Safety Center for Excellence

Abstract

Predicting crashes is an important aspect of road safety. This study investigates the applicability of machine learning (ML) to the development of Safety Performance Functions (SPFs). With artificial intelligence becoming the new trend and generative artificial intelligence becoming more ubiquitous and accessible, it is crucial to investigate the potential of this tool for use in crash prediction modeling.

To address this void, we collected, preprocessed, and created a suitable integrated crash dataset for training ML models. The integrated dataset combined crash data, traffic volume data, speed limit data, and road geometric characteristic data using geographic information system (GIS) spatial indexing. Using the integrated crash data, we built and evaluated different traditional ML models, deep learning neural networks, and zero-inflated negative binomial (ZINB) statistical models. The results show that the traditional ML models and the deep learning neural network models outperformed traditional ZINB statistical models. Specifically, the ExtraTrees ensemble model outperformed other traditional ML models and deep learning neural network models. The deep learning neural network models outperformed all other models except the ExtraTrees model. These results emphasize and confirm the applicability of the use of ML techniques in developing better SPFs. The ML models were demonstrated to be significantly superior to traditional statistical models, with crash prediction errors reduced by 50%. In addition, the coefficient of determination between the actual and estimated crashes increased from 0.22 to 0.90, which is very encouraging and warrants further investigation.

The performance of the ML models demonstrates that the prediction error increases with an increase in the number of crashes on a roadway segment. This indicates that road segments with a high number of crashes (which are rare) may require tailored models. This finding was only feasible by integrating different road characteristic features (curvature, type of road segment [weaving, merging, diverging, bridge, or has reversible lanes]) with the traditional traffic volume and road length features that are commonly used in road safety.

The key findings of this study can be summarized as:

  1. ML models significantly outperform traditional statistical models, reducing the mean absolute error by 50% and increasing the coefficient of determination between estimated and actual crashes from 0.22 to 0.90.
  2. The inclusion of traffic volume and road length as sole explanatory variables is insufficient for the prediction of roadway crashes. Other roadway characteristics are equally important in developing realistic crash prediction models. These more advanced models capture the correlations between the various explanatory variables.
  3. The calibration of crash prediction models using average yearly crashes from multiple years of data produces better models than developing models using the individual yearly crash observations. This is because it reduces the number of zero observations in the data and captures the average behavior.

Description

Keywords

machine learning, safety performance functions, artificial intelligence, AI, crash prediction

Citation