Prediction of condition-specific regulatory genes using machine learning
dc.contributor.author | Song, Qi | en |
dc.contributor.author | Lee, Jiyoung | en |
dc.contributor.author | Akter, Shamima | en |
dc.contributor.author | Rogers, Matthew | en |
dc.contributor.author | Grene, Ruth | en |
dc.contributor.author | Li, Song | en |
dc.contributor.department | School of Plant and Environmental Sciences | en |
dc.contributor.department | Statistics | en |
dc.date.accessioned | 2021-09-23T12:53:55Z | en |
dc.date.available | 2021-09-23T12:53:55Z | en |
dc.date.issued | 2020-06-19 | en |
dc.date.updated | 2021-09-23T12:53:52Z | en |
dc.description.abstract | Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants. | en |
dc.description.version | Published version | en |
dc.format.extent | 17 page(s) | en |
dc.format.mimetype | application/pdf | en |
dc.identifier | ARTN e62 (Article number) | en |
dc.identifier.doi | https://doi.org/10.1093/nar/gkaa264 | en |
dc.identifier.eissn | 1362-4962 | en |
dc.identifier.issn | 0305-1048 | en |
dc.identifier.issue | 11 | en |
dc.identifier.orcid | Li, Song [0000-0002-8133-3944] | en |
dc.identifier.other | 5824611 (PII) | en |
dc.identifier.pmid | 32329779 | en |
dc.identifier.uri | http://hdl.handle.net/10919/105048 | en |
dc.identifier.volume | 48 | en |
dc.language.iso | en | en |
dc.publisher | Oxford University Press | en |
dc.relation.uri | http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000574284500002&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=930d57c9ac61a043676db62af60056c1 | en |
dc.rights | Creative Commons Attribution-NonCommercial 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | en |
dc.subject | Life Sciences & Biomedicine | en |
dc.subject | Biochemistry & Molecular Biology | en |
dc.subject | TRANSCRIPTION FACTORS | en |
dc.subject | CHEMICAL-COMPOSITION | en |
dc.subject | EXPRESSION DATA | en |
dc.subject | OPEN CHROMATIN | en |
dc.subject | ARABIDOPSIS | en |
dc.subject | STRESS | en |
dc.subject | ROOT | en |
dc.subject | NETWORK | en |
dc.subject | DNA | en |
dc.subject | TOLERANCE | en |
dc.subject | 05 Environmental Sciences | en |
dc.subject | 06 Biological Sciences | en |
dc.subject | 08 Information and Computing Sciences | en |
dc.subject | Developmental Biology | en |
dc.subject.mesh | Arabidopsis | en |
dc.subject.mesh | Arabidopsis Proteins | en |
dc.subject.mesh | Transcription Factors | en |
dc.subject.mesh | Gene Expression Profiling | en |
dc.subject.mesh | Gene Expression Regulation, Plant | en |
dc.subject.mesh | Genes, Plant | en |
dc.subject.mesh | Genes, Regulator | en |
dc.subject.mesh | Gene Regulatory Networks | en |
dc.subject.mesh | Promoter Regions, Genetic | en |
dc.subject.mesh | Stress, Physiological | en |
dc.subject.mesh | Single-Cell Analysis | en |
dc.subject.mesh | Datasets as Topic | en |
dc.subject.mesh | Machine Learning | en |
dc.subject.mesh | Deep Learning | en |
dc.subject.mesh | RNA-Seq | en |
dc.title | Prediction of condition-specific regulatory genes using machine learning | en |
dc.title.serial | Nucleic Acids Research | en |
dc.type | Article - Refereed | en |
dc.type.dcmitype | Text | en |
dc.type.other | Article | en |
dc.type.other | Journal | en |
dcterms.dateAccepted | 2020-04-20 | en |
pubs.organisational-group | /Virginia Tech | en |
pubs.organisational-group | /Virginia Tech/Agriculture & Life Sciences | en |
pubs.organisational-group | /Virginia Tech/University Research Institutes | en |
pubs.organisational-group | /Virginia Tech/University Research Institutes/Fralin Life Sciences | en |
pubs.organisational-group | /Virginia Tech/All T&R Faculty | en |
pubs.organisational-group | /Virginia Tech/Agriculture & Life Sciences/CALS T&R Faculty | en |
pubs.organisational-group | /Virginia Tech/University Research Institutes/Fralin Life Sciences/Durelle Scott | en |
pubs.organisational-group | /Virginia Tech/Agriculture & Life Sciences/School of Plant and Environmental Sciences | en |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Prediction of condition-specific regulatory genes using machine learning.pdf
- Size:
- 1.36 MB
- Format:
- Adobe Portable Document Format
- Description:
- Published version