An Interdisciplinary Approach: Computational Sequence Motif Search and Prediction of Protein Function with Experimental Validation
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Pathogens colonize their hosts by releasing molecules that can enter host cells. A biotrophic oomycete plant pathogen, Phytophthora sojae harbors a superfamily of effector genes whose protein products enter the cells of the host, soybean. Many of the effectors contain an RXLR-dEER motif in their N-terminus. More than 400 members belonging to this family have been previously identified using a Hidden Markov Model. Amino acids flanking the RXLR motif have been utilized to identify effector proteins from the P. sojae secretome, despite the high level of sequence divergence among the members of this protein family.
I present here machine learning methods to identify protein candidates that belong to a particular class, such as the effector superfamily. Converting the flanking amino acid sequences of RXLR motifs (or other candidate motifs) into numeric values that reflect their physical properties enabled the protein sequences to be analyzed through these methods. The methods evaluated include Support Vector Machines and a related spherical classification method that I have developed. I also approached the effector prediction problem by building functional linkage networks and have produced lists of predicted P. sojae effector proteins. I tested the best candidate through gene gun bombardment assays using the beta-glucuronidase reporter system, which revealed that there is a high likelihood that the candidate can enter the soybean cells.