Using Artificial Life to Design Machine Learning Algorithms for Decoding Gene Expression Patterns from Images
Understanding the relationship between gene expression and phenotype is important in many areas of biology and medicine. Current methods for measuring gene expression such as microarrays however are invasive, require biopsy, and expensive. These factors limit experiments to low rate temporal sampling of gene expression and prevent longitudinal studies within a single subject, reducing their statistical power. Thus methods for non-invasive measurements of gene expression are an important and current topic of research. An interesting approach (Segal et al, Nature Biotechnology 25 (6) 2007) to indirect measurements of gene expression has recently been reported that uses existing imaging techniques and machine learning to estimate a function mapping image features to gene expression patterns, providing an image-derived surrogate for gene expression. However, the design of machine learning methods for this purpose is hampered by the cost of training and validation.
My thesis shows that populations of artificial organisms simulating genetic variation can be used for designing machine learning approaches to decoding gene expression patterns from images. If analysis of these images proves successful, then this can be applied to real biomedical images reducing the limitations of invasive imaging. The results showed that the box counting dimension was a suitable feature extraction method yielding a classification rate of at least 90% for mutation rates up to 40%. Also, the box-counting dimension was robust in dealing with distorted images. The performance of the classifiers using the fractal dimension as features, actually, seemed more vulnerable to the mutation rate as opposed to the applied distortion level.