Microarray Approaches to Experimental Genome Annotation
This work describes the development and application of genomic DNA tiling arrays: microarrays designed to represent all of the DNA comprising a chromosome or other genomic locus, regardless of the genes that may be annotated in the region of interest. Because tiling arrays are intended for the unbiased interrogation of genomic sequence, they enable the discovery of novel functional elements beyond those described by existing gene annotation. This is of particular importance in mapping the gene structures of higher eukaryotes, where combinatorial exon usage produces rare splice variants or isoforms expressed in low abundance that may otherwise elude detection. Issues related to the design of both oligonucleotide- and amplicon-based tiling arrays are discussed; the latter technology presents distinct challenges related to the selection of suitable amplification targets from genomic DNA. Given the widespread fragmentation of mammalian genomes by repetitive elements, obtaining maximal coverage of the non-repetitive sequence with a set of fragments amenable to high-throughput polymerase chain reaction (PCR) amplification represents a non-trivial optimization problem. To address this issue, several algorithms are described for the efficient computation of optimal tile paths for the design of amplicon tiling arrays. Using these methods, it is possible to recover an optimal tile path that maximizes the coverage of non-repetitive DNA while minimizing the number of repetitive elements included in the resulting sequence fragments. Tiling arrays were constructed and used for the chromosome- and genome-wide assessment of human transcriptional activity, via hybridization to complementary DNA derived from polyadenylated RNA expressed in normal complex tissues. The approach is first demonstrated with amplicon arrays representing all of the non-repetitive DNA of human chromosome 22, then extended to the entire genome using maskless photolithographic DNA synthesis technology. A large-scale tiling array survey revealed the presence of over 10,000 novel transcribed regions and verified the expression of nearly 13,000 predicted genes, providing the first global transcription map of the human genome. In addition to those likely to encode protein sequences on the basis of evolutionary sequence conservation, many of the novel transcripts constitute a previously uncharacterized population of non-coding RNAs implicated in myriad structural, catalytic and regulatory functions.