Ebrahimvandi, AlirezaHosseinichimeh, NiyoushaKong, Zhenyu James2022-07-082022-07-082022-06-25Ebrahimvandi, A.; Hosseinichimeh, N.; Kong, Z.J. Identifying the Early Signs of Preterm Birth from U.S. Birth Records Using Machine Learning Techniques. Information 2022, 13, 310.http://hdl.handle.net/10919/111167Preterm birth (PTB) is the leading cause of infant mortality in the U.S. and globally. The goal of this study is to increase understanding of PTB risk factors that are present early in pregnancy by leveraging statistical and machine learning (ML) techniques on big data. The 2016 U.S. birth records were obtained and combined with two other area-level datasets, the Area Health Resources File and the County Health Ranking. Then, we applied logistic regression with elastic net regularization, random forest, and gradient boosting machines to study a cohort of 3.6 million singleton deliveries to identify generalizable PTB risk factors. The response variable is preterm birth, which includes spontaneous and indicated PTB, and we performed a binary classification. Our results show that the most important predictors of preterm birth are gestational and chronic hypertension, interval since last live birth, and history of a previous preterm birth, which explains 10.92, 5.98, and 5.63% of the predictive power, respectively. Parents’ education is one of the influential variables in predicting PTB, explaining 7.89% of the predictive power. The relative importance of race declines when parents are more educated or have received adequate prenatal care. The gradient boosting machines outperformed with an AUC of 0.75 (sensitivity: 0.64, specificity: 0.73) for the validation dataset. In this study, we compare our results with seminal and most related studies to demonstrate the superiority of our results. The application of ML techniques improved the performance measures in the prediction of preterm birth. The results emphasize the importance of socioeconomic factors such as parental education as one of the most important indicators of preterm birth. More research is needed on these mechanisms through which socioeconomic factors affect biological responses.application/pdfenCreative Commons Attribution 4.0 Internationalracial disparitieseducationstatistical analysisneural networkssocioeconomic factorsIdentifying the Early Signs of Preterm Birth from U.S. Birth Records Using Machine Learning TechniquesArticle - Refereed2022-07-08Informationhttps://doi.org/10.3390/info13070310