Identification of new marker genes from plant single-cell RNA-seq data using interpretable machine learning methods

dc.contributor.authorHaidong, Yanen
dc.contributor.authorLee, Jiyoungen
dc.contributor.authorSong, Qien
dc.contributor.authorLi, Qien
dc.contributor.authorSchiefelbein, Johnen
dc.contributor.authorZhao, Bingyuen
dc.contributor.authorLi, Songen
dc.date.accessioned2023-02-09T14:01:28Zen
dc.date.available2023-02-09T14:01:28Zen
dc.date.issued2022-02-24en
dc.date.updated2023-02-09T03:45:50Zen
dc.description.abstractAn essential step in the analysis of single-cell RNA sequencing data is to classify cells into specific cell types using marker genes. In this study, we have developed a machine learning pipeline called single-cell predictive marker (SPmarker) to identify novel cell-type marker genes in the Arabidopsis root. Unlike traditional approaches, our method uses interpretable machine learning models to select marker genes. We have demonstrated that our method can: assign cell types based on cells that were labelled using published methods; project cell types identified by trajectory analysis from one data set to other data sets; and assign cell types based on internal GFP markers. Using SPmarker, we have identified hundreds of new marker genes that were not identified before. As compared to known marker genes, the new marker genes have more orthologous genes identifiable in the corresponding rice single-cell clusters. The new root hair marker genes also include 172 genes with orthologs expressed in root hair cells in five non-Arabidopsis species, which expands the number of marker genes for this cell type by 35–154%. Our results represent a new approach to identifying cell-type marker genes from scRNA-seq data and pave the way for cross-species mapping of scRNA-seq data in plants.en
dc.description.versionPublished versionen
dc.format.extentPages 1507-1520en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1111/nph.18053en
dc.identifier.orcidZhao, Bingyu [0000-0002-5392-0279]en
dc.identifier.urihttp://hdl.handle.net/10919/113748en
dc.identifier.volume234en
dc.language.isoenen
dc.relation.urihttps://doi.org/10.1111/nph.18053en
dc.rightsCreative Commons Attribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en
dc.subjectcell marker genesen
dc.subjectgene expressionen
dc.subjectmachine learningen
dc.subjectroot developmenten
dc.subjectsingle-cell genomicsen
dc.subjectsingle-cell sequencingen
dc.titleIdentification of new marker genes from plant single-cell RNA-seq data using interpretable machine learning methodsen
dc.title.serialNew Phytologisten
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherArticleen
dcterms.dateAccepted2022-01-24en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Agriculture & Life Sciencesen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Agriculture & Life Sciences/CALS T&R Facultyen
pubs.organisational-group/Virginia Tech/Agriculture & Life Sciences/School of Plant and Environmental Sciencesen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
New Phytologist - 2022 - Yan - Identification of new marker genes from plant single‐cell RNA‐seq data using interpretable.pdf
Size:
2.86 MB
Format:
Adobe Portable Document Format
Description:
Published version