Biologically-Interpretable Disease Classification Based on Gene Expression Data
Classification of tissues and diseases based on gene expression data is a powerful application of DNA microarrays. Many popular classifiers like support vector machines, nearest-neighbour methods, and boosting have been applied successfully to this problem. However, it is difficult to determine from these classifiers which genes are responsible for the distinctions between the diseases. We propose a novel framework for classification of gene expression data based on notion of condition-specific clusters of co-expressed genes called xMotifs. Our xMotif-based classifier is biologically interpretable: we show how we can detect relationships between xMotifs and gene functional annotations. Our classifier achieves high-accuracy on leave-one-out cross-validation on both two-class and multi-class data. Our technique has the potential to be the method of choice for researchers interested in disease and tissue classification.