Large-scale protein function prediction using heterogeneous ensembles
dc.contributor.author | Wang, Linhua | en |
dc.contributor.author | Law, Jeffrey N. | en |
dc.contributor.author | Kale, Shiv D. | en |
dc.contributor.author | Murali, T. M. | en |
dc.contributor.author | Pandey, Gaurav | en |
dc.date.accessioned | 2020-11-03T14:26:58Z | en |
dc.date.available | 2020-11-03T14:26:58Z | en |
dc.date.issued | 2018-09-28 | en |
dc.description.abstract | Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred). | en |
dc.description.sponsorship | This work was supported in part by National Institutes of Health [R01GM114434] and by an IBM faculty award to GP. It was also partially supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the Army Research Office (ARO) under Cooperative Agreement Number [W911NF-17-2-0105]. | en |
dc.format.extent | 16 pages | en |
dc.format.mimetype | application/pdf | en |
dc.identifier.citation | Wang L, Law J, Kale SD et al. Large-scale protein function prediction using heterogeneous ensembles [version 1; peer review: 2 approved] F1000Research 2018, 7(ISCB Comm J):1577 https://doi.org/10.12688/f1000research.16415.1 | en |
dc.identifier.doi | https://doi.org/10.12688/f1000research.16415.1 | en |
dc.identifier.issue | 1577 | en |
dc.identifier.uri | http://hdl.handle.net/10919/100775 | en |
dc.identifier.volume | 7 | en |
dc.language.iso | en | en |
dc.publisher | F1000Research | en |
dc.rights | Creative Commons Attribution 4.0 International | en |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | en |
dc.subject | protein function prediction | en |
dc.subject | heterogeneous ensembles | en |
dc.subject | Machine learning | en |
dc.subject | high-performance computing | en |
dc.subject | performance evaluation | en |
dc.title | Large-scale protein function prediction using heterogeneous ensembles | en |
dc.title.serial | F1000Research | en |
dc.type | Article - Refereed | en |
dc.type.dcmitype | Text | en |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- linhua_wang.pdf
- Size:
- 2.48 MB
- Format:
- Adobe Portable Document Format
- Description: