Data Standardization and Machine Learning Models for Histopathology

dc.contributor.authorAwaysheh, Abdullah Mamdouhen
dc.contributor.committeechairZimmerman, Kurt L.en
dc.contributor.committeememberWilcke, Jeffrey R.en
dc.contributor.committeememberElvinger, Francois C.en
dc.contributor.committeememberFan, Weiguoen
dc.contributor.committeememberRees, Loren P.en
dc.contributor.departmentVeterinary Medicineen
dc.date.accessioned2018-09-19T06:00:32Zen
dc.date.available2018-09-19T06:00:32Zen
dc.date.issued2017-03-27en
dc.description.abstractMachine learning can provide insight and support for a variety of decisions. In some areas of medicine, decision-support models are capable of assisting healthcare practitioners in making accurate diagnoses. In this work we explored the application of these techniques to distinguish between two diseases in veterinary medicine; inflammatory bowel disease (IBD) and alimentary lymphoma (ALA). Both disorders are common gastrointestinal (GI) diseases in humans and animals that share very similar clinical and pathological outcomes. Because of these similarities, distinguishing between these two diseases can sometimes be challenging. In order to identify patterns that may help with this differentiation, we retrospectively mined medical records from dogs and cats with histopathologically diagnosed GI diseases. Since the pathology report is the key conveyer of this information in the medical records, our first study focused on its information structure. Other groups have had a similar interest. In 2008, to help insure consistent reporting, the World Small Animal Veterinary Association (WSAVA) GI International Standardization Group proposed standards for recording histopathological findings (HF) from GI biopsy samples. In our work, we extend WSAVA efforts and propose an information model (composed of information structure and terminology mapped to the Systematized Nomenclature of Medicine - Clinical Terms) to be used when recording histopathological diagnoses (HDX, one or more HF from one or more tissues). Next, our aim was to identify free-text HF not currently expressed in the WSAVA format that may provide evidence for distinguishing between IBD and ALA in cats. As part of this work, we hypothesized that WSAVA-based structured reports would have higher classification accuracy of GI disorders in comparison to use of unstructured free-text format. We trained machine learning models in 60 structured, and independently, 60 unstructured reports. Results show that unstructured information-based models using two machine learning algorithms achieved higher accuracy in predicting the diagnosis when compared to the structured information-based models, and some novel free-text features were identified for possible inclusion in the WSAVA-reports. In our third study, we tested the use of machine learning algorithms to differentiate between IBD and ALA using complete blood count and serum chemistry data. Three models (using naïve Bayes, neural networks, and C4.5 decision trees) were trained and tested on laboratory results for 40 Normal, 40 IBD, and 40 ALA cats. Diagnostic models achieved classification sensitivity ranging between 63% and 71% with naïve Bayes and neural networks being superior. These models can provide another non-invasive diagnostic tool to assist with differentiating between IBD and ALA, and between diseased and non-diseased cats. We believe that relying on our information model for histopathological reporting can lead to a more complete, consistent, and computable knowledgebase in which machine learning algorithms can more efficiently identify these and other disease patterns.en
dc.description.abstractgeneralComputational models play an important role in supporting the decision making process. In some areas of medicine, decision-support models assist healthcare practitioners to make accurate diagnoses. In this work, we explored the application of computational techniques to distinguish between two diseases; inflammatory bowel disease (IBD) and alimentary lymphoma (ALA). These are common gastrointestinal (GI) diseases in humans and animals that share very similar laboratory findings. Because of these similarities, distinguishing between these two diseases can sometimes be challenging. In order to identify patterns that may help with this differentiation, we mined medical records from dogs and cats diagnosed with GI diseases. Since the pathology report is a key source of information for the diagnosis of these two diseases, in our first study we focused on its information structure. Others with similar interest have also examined reports of this type. In 2008, a group proposed standards for recording histopathological findings (HF) from GI biopsy samples. In our work, we extend the group’s efforts and propose an information model (composed of information structure and terminology) to be used when recording histopathological diagnoses (HDX, one or more HF from one or more tissues). Next, our aim was to identify free-text HF not currently expressed in the standardization group’s format that may provide evidence for distinguishing between IBD and ALA in cats. We trained computational models in 60 structured, and independently, 60 unstructured reports. Results show that unstructured information-based models using two computational models achieved higher accuracy in predicting the diagnosis when compared to the structured information-based models. As a result, novel free text features, which improved the performance of the structured reports, were identified. In our third study, we tested the use of computational models to differentiate between IBD and ALA using routine laboratory results. Three models were trained and tested on laboratory results from 40 Normal, 40 IBD, and 40 ALA cats. Diagnostic models achieved classification sensitivity ranging between 63% and 71%. These models can provide another noninvasive diagnostic tool to assist with differentiating between IBD and ALA, and between diseased and non-diseased cats. We believe that relying on our information model for histopathological reporting can lead to a more complete, consistent, and computable knowledgebase for the identification of these two diseases.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:9867en
dc.identifier.urihttp://hdl.handle.net/10919/85040en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectData Standardizationen
dc.subjectMachine learningen
dc.subjectHistopathologyen
dc.subjectInflammatory Bowel Diseaseen
dc.subjectAlimentary Lymphomaen
dc.titleData Standardization and Machine Learning Models for Histopathologyen
dc.typeDissertationen
thesis.degree.disciplineBiomedical and Veterinary Sciencesen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Awaysheh_AM_D_2017.pdf
Size:
2.19 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Awaysheh_AM_D_2017_support_1.pdf
Size:
203.46 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents