Optimal Data-driven Methods for Subject Classification in Public Health Screening

Loading...
Thumbnail Image

TR Number

Date

2019-07-01

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Biomarker testing, wherein the concentration of a biochemical marker is measured to predict the presence or absence of a certain binary characteristic (e.g., a disease) in a subject, is an essential component of public health screening. For many diseases, the concentration of disease-related biomarkers may exhibit a wide range, particularly among the disease positive subjects, in part due to variations caused by external and/or subject-specific factors. Further, a subject's actual biomarker concentration is not directly observable by the decision maker (e.g., the tester), who has access only to the test's measurement of the biomarker concentration, which can be noisy. In this setting, the decision maker needs to determine a classification scheme in order to classify each subject as test negative or test positive. However, the inherent variability in biomarker concentrations and the noisy test measurements can increase the likelihood of subject misclassification.

We develop an optimal data-driven framework, which integrates optimization and data analytics methodologies, for subject classification in disease screening, with the aim of minimizing classification errors. In particular, our framework utilizes data analytics methodologies to estimate the posterior disease risk of each subject, based on both subject-specific and external factors, coupled with robust optimization methodologies to derive an optimal robust subject classification scheme, under uncertainty on actual biomarker concentrations. We establish various key structural properties of optimal classification schemes, show that they are easily implementable, and develop key insights and principles for classification schemes in disease screening.

As one application of our framework, we study newborn screening for cystic fibrosis in the United States. Cystic fibrosis is one of the most common genetic diseases in the United States. Early diagnosis of cystic fibrosis can substantially improve health outcomes, while a delayed diagnosis can result in severe symptoms of the disease, including fatality. We demonstrate our framework on a five-year newborn screening data set from the North Carolina State Laboratory of Public Health. Our study underscores the value of optimization-based approaches to subject classification, and show that substantial reductions in classification error can be achieved through the use of the proposed framework over current practices.

Description

Keywords

Population Level Disease Screening, Risk-based Testing, Robust Optimization, Newborn Screening, Cystic Fibrosis

Citation