Detection and Characterization of Multilevel Genomic Patterns

Files
TR Number
Date
2010-05-26
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

DNA microarray has become a powerful tool in genetics, molecular biology, and biomedical research. DNA microarray can be used for measuring the genotypes, structural changes, and gene expressions of human genomes. Detection and characterization of multilevel, high-throughput microarray genomic data pose new challenges to statistical pattern recognition and machine learning research. In this dissertation, we propose novel computational methods for analyzing DNA copy number changes and learning the trees of phenotypes using DNA microarray data.

DNA copy number change is an important form of structural variations in human genomes. The copy number signals measured by high-density DNA microarrays usually have low signal-to-noise ratios and complex patterns due to inhomogeneous composition of tissue samples. We propose a robust detection method for extracting copy number changes in a single signal profile and consensus copy number changes in the signal profiles of a population. We adapt a solution-path algorithm to efficiently solve the optimization problems associated with the proposed method. We tested the proposed method on both simulation and real CGH and SNP microarray datasets, and observed competitively improved performance as compared to several widely-adopted copy number change detection methods. We also propose a chromosome instability measure to summarize the extracted copy number changes for assessing chromosomal instabilities of tumor genomes. The proposed measure demonstrates distinct patterns between different subtypes of ovarian serous carcinomas and normal samples.

Among active research on complex human diseases using genomic data, little effort and progress have been made in discovering the relational structural information embedded in the molecular data. We propose two stability analysis based methods to learn stable and highly resolved trees of phenotypes using microarray gene expression data of heterogeneous diseases. In the first method, we use a hierarchical, divisive visualization approach to explore the tree of phenotypes and a leave-one-out cross validation to select stable tree structures. In the second method, we propose a node bandwidth constraint to construct stable trees that can balance the descriptive power and reproducibility of tree structures. Using a top-down merging procedure, we modify the binary tree structures learned by hierarchical group clustering methods to achieve a given node bandwidth. We use a bootstrap based stability analysis to select stable tree structures under different node bandwidth constraints. The experimental results on two microarray gene expression datasets of human diseases show that the proposed methods can discover stable trees of phenotypes that reveal the relationships between multiple diseases with biological plausibility.

Description
Keywords
Gene Expressions, DNA Copy Number Changes, Stability Analysis, Regression Analysis, Tree of Phenotypes
Citation