COT: an efficient and accurate method for detecting marker genes among many subtypes

dc.contributor.authorLu, Yingzhouen
dc.contributor.authorWu, Chiung-Tingen
dc.contributor.authorParker, Sarah J.en
dc.contributor.authorCheng, Zuolinen
dc.contributor.authorSaylor, Georgiaen
dc.contributor.authorVan Eyk, Jennifer E.en
dc.contributor.authorYu, Guoqiangen
dc.contributor.authorClarke, Roberten
dc.contributor.authorHerrington, David M.en
dc.contributor.authorWang, Yueen
dc.date.accessioned2023-01-24T14:20:23Zen
dc.date.available2023-01-24T14:20:23Zen
dc.date.issued2022en
dc.date.updated2023-01-24T02:47:35Zen
dc.description.abstractMotivation: Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others-so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. Results: We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods. Availability and implementation: The Python COT software with a detailed user's manual and a vignette are freely available at https://github.com/MintaYLu/COT. Supplementary information: Supplementary data are available at <i>Bioinformatics Advances</i> online.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1093/bioadv/vbac037en
dc.identifier.eissn2635-0041en
dc.identifier.issn2635-0041en
dc.identifier.issue1en
dc.identifier.orcidYu, Guoqiang [0000-0002-6743-7413]en
dc.identifier.orcidWang, Yue [0000-0002-1788-1102]en
dc.identifier.otherPMC9163574en
dc.identifier.othervbac037 (PII)en
dc.identifier.pmid35673616en
dc.identifier.urihttp://hdl.handle.net/10919/113387en
dc.identifier.volume2en
dc.language.isoenen
dc.publisherOxford University Pressen
dc.relation.urihttps://www.ncbi.nlm.nih.gov/pubmed/35673616en
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleCOT: an efficient and accurate method for detecting marker genes among many subtypesen
dc.title.serialBioinformatics Advancesen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherbrief-reporten
dc.type.otherJournal Articleen
dcterms.dateAccepted2022-05-16en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Electrical and Computer Engineeringen
pubs.organisational-group/Virginia Tech/Faculty of Health Sciencesen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
COT.pdf
Size:
864.13 KB
Format:
Adobe Portable Document Format
Description:
Published version