Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues

dc.contributor.authorWang, Niyaen
dc.contributor.authorHoffman, Eric P.en
dc.contributor.authorChen, Luluen
dc.contributor.authorChen, Lien
dc.contributor.authorZhang, Zhenen
dc.contributor.authorLiu, Chunyuen
dc.contributor.authorYu, Guoqiangen
dc.contributor.authorHerrington, David M.en
dc.contributor.authorClarke, Roberten
dc.contributor.authorWang, Yueen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2019-01-24T15:40:35Zen
dc.date.available2019-01-24T15:40:35Zen
dc.date.issued2016-01-07en
dc.description.abstractTissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.en
dc.description.notesThis work was funded in part by the National Institutes of Health under Grants NS029525, CA160036, CA184902, ES024988, CA149653, and HL111362.en
dc.description.sponsorshipNational Institutes of Health [NS029525, CA160036, CA184902, ES024988, CA149653, HL111362]en
dc.format.extent12en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1038/srep18909en
dc.identifier.issn2045-2322en
dc.identifier.other18909en
dc.identifier.pmid26739359en
dc.identifier.urihttp://hdl.handle.net/10919/86875en
dc.identifier.volume6en
dc.language.isoenen
dc.publisherSpringer Natureen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectcycle-regulated genesen
dc.subjectcell-cycleen
dc.subjectexpression deconvolutionen
dc.subjectseparationen
dc.subjectpatternsen
dc.subjectcanceren
dc.subjectbrainen
dc.subjecttoolen
dc.titleMathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissuesen
dc.title.serialScientific Reportsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
srep18909.pdf
Size:
1.81 MB
Format:
Adobe Portable Document Format
Description: