Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction

Afrose, Sharmin; Song, Wenjia; Nemeroff, Charles B.; Yao, Danfeng

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction

dc.contributor.author	Afrose, Sharmin	en
dc.contributor.author	Song, Wenjia	en
dc.contributor.author	Nemeroff, Charles B.	en
dc.contributor.author	Yao, Danfeng	en
dc.date.accessioned	2022-10-27T14:23:07Z	en
dc.date.available	2022-10-27T14:23:07Z	en
dc.date.issued	2022-09	en
dc.description.abstract	Background: Many clinical datasets are intrinsically imbalanced, dominated by overwhelming majority groups. Off-the-shelf machine learning models that optimize the prognosis of majority patient types (e.g., healthy class) may cause substantial errors on the minority prediction class (e.g., disease class) and demographic subgroups (e.g., Black or young patients). In the typical one-machine-learning-model-fits-all paradigm, racial and age disparities are likely to exist, but unreported. In addition, some widely used whole-population metrics give misleading results. Methods: We design a double prioritized (DP) bias correction technique to mitigate representational biases in machine learning-based prognosis. Our method trains customized machine learning models for specific ethnicity or age groups, a substantial departure from the one-model-predicts-all convention. We compare with other sampling and reweighting techniques in mortality and cancer survivability prediction tasks. Results: We first provide empirical evidence showing various prediction deficiencies in a typical machine learning setting without bias correction. For example, missed death cases are 3.14 times higher than missed survival cases for mortality prediction. Then, we show DP consistently boosts the minority class recall for underrepresented groups, by up to 38.0%. DP also reduces relative disparities across race and age groups, e.g., up to 88.0% better than the 8 existing sampling solutions in terms of the relative disparity of minority class recall. Crossrace and cross-age-group evaluation also suggests the need for subpopulation-specific machine learning models. Conclusions: Biases exist in the widely accepted one-machine-learning-model-fits-all-population approach. We invent a bias correction method that produces specialized machine learning prognostication models for underrepresented racial and age groups. This technique may reduce potentially life-threatening prediction mistakes for minority populations.	en
dc.description.version	Published version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1038/s43856-022-00165-w	en
dc.identifier.uri	http://hdl.handle.net/10919/112296	en
dc.identifier.volume	2	en
dc.language.iso	en	en
dc.publisher	Nature Research	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.title	Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction	en
dc.title.serial	Communications Medicine	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Afrose_et_al-2022-Comm_Medicine.pdf
Size:: 3.26 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Works, Computer Science
Scholarly Works, Chemical Engineering