GeneSieve: A Probe Selection Strategy for cDNA Microarrays


The DNA microarray is a powerful tool to study expression levels of thousands of genes simultaneously. Often, cDNA libraries representing expressed genes of an organism are available, along with expressed sequence tags (ESTs). ESTs are widely used as the probes for microarrays. Designing custom microarrays, rich in genes relevant to the experimental objectives, requires selection of probes based on their sequence. We have designed a probe selection method, called GeneSieve, to select EST probes for custom microarrays. To assign annotations to the ESTs, we cluster them into contigs using PHRAP. The larger contig sequences are then used for similarity search against known proteins in model organism such as Arabidopsis thaliana. We have designed three different methods to assign annotations to the contigs: bidirectional hits (BH), bidirectional best hits (BBH), and unidirectional best hits (UBH). We apply these methods to pine and potato EST sets. Results show that the UBH method assigns unambiguous annotations to a large fraction of contigs in an organism. Hence, we use UBH to assign annotations to ESTs in GeneSieve. To select a single EST from a contig, GeneSieve assigns a quality score to each EST based on its protein homology (PH), cross hybridization (CH), and relative length (RL). We use this quality score to rank ESTs according to seven different measures: length, 3' proximity, 5' proximity, protein homology, cross hybridization, relative length, and overall quality score. Results for pine and potato EST sets indicate that EST probes selected by quality score are relatively long and give better values for protein homology and cross hybridization. Results of the GeneSieve protocol are stored in a database and linked with sequence databases and known functional category schemes such as MIPS and GO. The database is made available via a web interface. A biologist is able to select large number of EST probes based on annotations or functional categories in a quick and easy way.



EST annotation, cDNA microarrays, probe selection