Cosbin: cosine score-based iterative normalization of biologically diverse samples

dc.contributor.authorWu, Chiung-Tingen
dc.contributor.authorShen, Minjieen
dc.contributor.authorDu, Dongpingen
dc.contributor.authorCheng, Zuolinen
dc.contributor.authorParker, Sarah J.en
dc.contributor.authorLu, Yingzhouen
dc.contributor.authorVan Eyk, Jennifer E.en
dc.contributor.authorYu, Guoqiangen
dc.contributor.authorClarke, Roberten
dc.contributor.authorHerrington, David M.en
dc.contributor.authorWang, Yueen
dc.date.accessioned2023-01-24T14:28:06Zen
dc.date.available2023-01-24T14:28:06Zen
dc.date.issued2022en
dc.date.updated2023-01-24T02:48:17Zen
dc.description.abstractMotivation: Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant. Results: We report an efficient and accurate data-driven method-Cosine score-based iterative normalization (Cosbin)-to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups. Availability and implementation: The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin. Supplementary information: Supplementary data are available at <i>Bioinformatics Advances</i> online.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1093/bioadv/vbac076en
dc.identifier.eissn2635-0041en
dc.identifier.issn2635-0041en
dc.identifier.issue1en
dc.identifier.orcidYu, Guoqiang [0000-0002-6743-7413]en
dc.identifier.orcidWang, Yue [0000-0002-1788-1102]en
dc.identifier.otherPMC9614059en
dc.identifier.othervbac076 (PII)en
dc.identifier.pmid36330358en
dc.identifier.urihttp://hdl.handle.net/10919/113388en
dc.identifier.volume2en
dc.language.isoenen
dc.publisherOxford University Pressen
dc.relation.urihttps://www.ncbi.nlm.nih.gov/pubmed/36330358en
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectBiotechnologyen
dc.subjectGeneticsen
dc.titleCosbin: cosine score-based iterative normalization of biologically diverse samplesen
dc.title.serialBioinformatics Advancesen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherbrief-reporten
dc.type.otherJournal Articleen
dcterms.dateAccepted2022-10-18en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Electrical and Computer Engineeringen
pubs.organisational-group/Virginia Tech/Faculty of Health Sciencesen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cosbin.pdf
Size:
662.21 KB
Format:
Adobe Portable Document Format
Description:
Published version