BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data

dc.contributor.authorGu, Jinghuaen
dc.contributor.authorWang, Xiaoen
dc.contributor.authorHilakivi-Clarke, Leenaen
dc.contributor.authorClarke, Roberten
dc.contributor.authorXuan, Jianhuaen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2014-09-10T15:04:53Zen
dc.date.available2014-09-10T15:04:53Zen
dc.date.issued2014-09-10en
dc.date.updated2014-09-10T15:04:54Zen
dc.description.abstractBackground Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data. Results We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases. Conclusions We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method. The software package is available at http://www.cbil.ece.vt.edu/software.htm.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.citationBMC Bioinformatics. 2014 Sep 10;15(Suppl 9):S6en
dc.identifier.doihttps://doi.org/10.1186/1471-2105-15-S9-S6en
dc.identifier.urihttp://hdl.handle.net/10919/50495en
dc.language.isoenen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderJinghua Gu et al.; licensee BioMed Central Ltd.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleBADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq dataen
dc.title.serialBMC Bioinformaticsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 3 of 3
Name:
1471-2105-15-S9-S6.xml
Size:
240.45 KB
Format:
Extensible Markup Language
Loading...
Thumbnail Image
Name:
1471-2105-15-S9-S6.pdf
Size:
1.68 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
1471-2105-15-S9-S6-S1.pdf
Size:
259.17 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: