Data Standardization and Machine Learning Models for Histopathology
Awaysheh, Abdullah Mamdouh
MetadataShow full item record
Machine learning can provide insight and support for a variety of decisions. In some areas of medicine, decision-support models are capable of assisting healthcare practitioners in making accurate diagnoses. In this work we explored the application of these techniques to distinguish between two diseases in veterinary medicine; inflammatory bowel disease (IBD) and alimentary lymphoma (ALA). Both disorders are common gastrointestinal (GI) diseases in humans and animals that share very similar clinical and pathological outcomes. Because of these similarities, distinguishing between these two diseases can sometimes be challenging. In order to identify patterns that may help with this differentiation, we retrospectively mined medical records from dogs and cats with histopathologically diagnosed GI diseases. Since the pathology report is the key conveyer of this information in the medical records, our first study focused on its information structure. Other groups have had a similar interest. In 2008, to help insure consistent reporting, the World Small Animal Veterinary Association (WSAVA) GI International Standardization Group proposed standards for recording histopathological findings (HF) from GI biopsy samples. In our work, we extend WSAVA efforts and propose an information model (composed of information structure and terminology mapped to the Systematized Nomenclature of Medicine - Clinical Terms) to be used when recording histopathological diagnoses (HDX, one or more HF from one or more tissues). Next, our aim was to identify free-text HF not currently expressed in the WSAVA format that may provide evidence for distinguishing between IBD and ALA in cats. As part of this work, we hypothesized that WSAVA-based structured reports would have higher classification accuracy of GI disorders in comparison to use of unstructured free-text format. We trained machine learning models in 60 structured, and independently, 60 unstructured reports. Results show that unstructured information-based models using two machine learning algorithms achieved higher accuracy in predicting the diagnosis when compared to the structured information-based models, and some novel free-text features were identified for possible inclusion in the WSAVA-reports. In our third study, we tested the use of machine learning algorithms to differentiate between IBD and ALA using complete blood count and serum chemistry data. Three models (using naïve Bayes, neural networks, and C4.5 decision trees) were trained and tested on laboratory results for 40 Normal, 40 IBD, and 40 ALA cats. Diagnostic models achieved classification sensitivity ranging between 63% and 71% with naïve Bayes and neural networks being superior. These models can provide another non-invasive diagnostic tool to assist with differentiating between IBD and ALA, and between diseased and non-diseased cats. We believe that relying on our information model for histopathological reporting can lead to a more complete, consistent, and computable knowledgebase in which machine learning algorithms can more efficiently identify these and other disease patterns.
- Doctoral Dissertations