Large-scale protein function prediction using heterogeneous ensembles

dc.contributor.authorWang, Linhuaen
dc.contributor.authorLaw, Jeffrey N.en
dc.contributor.authorKale, Shiv D.en
dc.contributor.authorMurali, T. M.en
dc.contributor.authorPandey, Gauraven
dc.date.accessioned2020-11-03T14:26:58Zen
dc.date.available2020-11-03T14:26:58Zen
dc.date.issued2018-09-28en
dc.description.abstractHeterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred).en
dc.description.sponsorshipThis work was supported in part by National Institutes of Health [R01GM114434] and by an IBM faculty award to GP. It was also partially supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the Army Research Office (ARO) under Cooperative Agreement Number [W911NF-17-2-0105].en
dc.format.extent16 pagesen
dc.format.mimetypeapplication/pdfen
dc.identifier.citationWang L, Law J, Kale SD et al. Large-scale protein function prediction using heterogeneous ensembles [version 1; peer review: 2 approved] F1000Research 2018, 7(ISCB Comm J):1577 https://doi.org/10.12688/f1000research.16415.1en
dc.identifier.doihttps://doi.org/10.12688/f1000research.16415.1en
dc.identifier.issue1577en
dc.identifier.urihttp://hdl.handle.net/10919/100775en
dc.identifier.volume7en
dc.language.isoenen
dc.publisherF1000Researchen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectprotein function predictionen
dc.subjectheterogeneous ensemblesen
dc.subjectMachine learningen
dc.subjecthigh-performance computingen
dc.subjectperformance evaluationen
dc.titleLarge-scale protein function prediction using heterogeneous ensemblesen
dc.title.serialF1000Researchen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
linhua_wang.pdf
Size:
2.48 MB
Format:
Adobe Portable Document Format
Description: