VTechWorks staff will be away for the Thanksgiving holiday beginning at noon on Wednesday, November 27, through Friday, November 29. We will resume normal operations on Monday, December 2. Thank you for your patience.
 

caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data

dc.contributor.authorZhu, Yitanen
dc.contributor.authorLi, Huaien
dc.contributor.authorMiller, David J.en
dc.contributor.authorWang, Zuyien
dc.contributor.authorXuan, Jianhuaen
dc.contributor.authorClarke, Roberten
dc.contributor.authorHoffman, Eric P.en
dc.contributor.authorWang, Yueen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2012-08-24T11:54:10Zen
dc.date.available2012-08-24T11:54:10Zen
dc.date.issued2008-09-18en
dc.date.updated2012-08-24T11:54:10Zen
dc.description.abstractBackground The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. Results In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA) for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive) hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy) and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample clustering, and phenotype clustering (wherein phenotype labels for samples are known), albeit with minor algorithm modifications customized to each of these tasks. Conclusion VISDA achieved robust and superior clustering accuracy, compared with several benchmark clustering schemes. The model order selection scheme in VISDA was shown to be effective for high dimensional genomic data clustering. On muscular dystrophy data and muscle regeneration data, VISDA identified biologically relevant co-expressed gene clusters. VISDA also captured the pathological relationships among different phenotypes revealed at the molecular level, through phenotype clustering on muscular dystrophy data and multi-category cancer data.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.citationBMC Bioinformatics. 2008 Sep 18;9(1):383en
dc.identifier.doihttps://doi.org/10.1186/1471-2105-9-383en
dc.identifier.urihttp://hdl.handle.net/10919/18881en
dc.language.isoenen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderYitan Zhu et al.; licensee BioMed Central Ltd.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titlecaBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic dataen
dc.title.serialBMC Bioinformaticsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
1471-2105-9-383.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
1471-2105-9-383-S1.PDF
Size:
209.06 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: