Prediction of condition-specific regulatory genes using machine learning

dc.contributor.authorSong, Qien
dc.contributor.authorLee, Jiyoungen
dc.contributor.authorAkter, Shamimaen
dc.contributor.authorRogers, Matthewen
dc.contributor.authorGrene, Ruthen
dc.contributor.authorLi, Songen
dc.contributor.departmentSchool of Plant and Environmental Sciencesen
dc.contributor.departmentStatisticsen
dc.date.accessioned2021-09-23T12:53:55Zen
dc.date.available2021-09-23T12:53:55Zen
dc.date.issued2020-06-19en
dc.date.updated2021-09-23T12:53:52Zen
dc.description.abstractRecent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.en
dc.description.versionPublished versionen
dc.format.extent17 page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifierARTN e62 (Article number)en
dc.identifier.doihttps://doi.org/10.1093/nar/gkaa264en
dc.identifier.eissn1362-4962en
dc.identifier.issn0305-1048en
dc.identifier.issue11en
dc.identifier.orcidLi, Song [0000-0002-8133-3944]en
dc.identifier.other5824611 (PII)en
dc.identifier.pmid32329779en
dc.identifier.urihttp://hdl.handle.net/10919/105048en
dc.identifier.volume48en
dc.language.isoenen
dc.publisherOxford University Pressen
dc.relation.urihttp://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000574284500002&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=930d57c9ac61a043676db62af60056c1en
dc.rightsCreative Commons Attribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en
dc.subjectLife Sciences & Biomedicineen
dc.subjectBiochemistry & Molecular Biologyen
dc.subjectTRANSCRIPTION FACTORSen
dc.subjectCHEMICAL-COMPOSITIONen
dc.subjectEXPRESSION DATAen
dc.subjectOPEN CHROMATINen
dc.subjectARABIDOPSISen
dc.subjectSTRESSen
dc.subjectROOTen
dc.subjectNETWORKen
dc.subjectDNAen
dc.subjectTOLERANCEen
dc.subject05 Environmental Sciencesen
dc.subject06 Biological Sciencesen
dc.subject08 Information and Computing Sciencesen
dc.subjectDevelopmental Biologyen
dc.subject.meshArabidopsisen
dc.subject.meshArabidopsis Proteinsen
dc.subject.meshTranscription Factorsen
dc.subject.meshGene Expression Profilingen
dc.subject.meshGene Expression Regulation, Planten
dc.subject.meshGenes, Planten
dc.subject.meshGenes, Regulatoren
dc.subject.meshGene Regulatory Networksen
dc.subject.meshPromoter Regions, Geneticen
dc.subject.meshStress, Physiologicalen
dc.subject.meshSingle-Cell Analysisen
dc.subject.meshDatasets as Topicen
dc.subject.meshMachine Learningen
dc.subject.meshDeep Learningen
dc.subject.meshRNA-Seqen
dc.titlePrediction of condition-specific regulatory genes using machine learningen
dc.title.serialNucleic Acids Researchen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherArticleen
dc.type.otherJournalen
dcterms.dateAccepted2020-04-20en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Agriculture & Life Sciencesen
pubs.organisational-group/Virginia Tech/University Research Institutesen
pubs.organisational-group/Virginia Tech/University Research Institutes/Fralin Life Sciencesen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Agriculture & Life Sciences/CALS T&R Facultyen
pubs.organisational-group/Virginia Tech/University Research Institutes/Fralin Life Sciences/Durelle Scotten
pubs.organisational-group/Virginia Tech/Agriculture & Life Sciences/School of Plant and Environmental Sciencesen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Prediction of condition-specific regulatory genes using machine learning.pdf
Size:
1.36 MB
Format:
Adobe Portable Document Format
Description:
Published version