Large-scale protein function prediction using heterogeneous ensembles

Wang, Linhua; Law, Jeffrey N.; Kale, Shiv D.; Murali, T. M.; Pandey, Gaurav

Large-scale protein function prediction using heterogeneous ensembles

Files

linhua_wang.pdf (2.48 MB)

Downloads: 153

Date

2018-09-28

Authors

Publisher

F1000Research

Abstract

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred).

Keywords

protein function prediction, heterogeneous ensembles, Machine learning, high-performance computing, performance evaluation

Citation

Wang L, Law J, Kale SD et al. Large-scale protein function prediction using heterogeneous ensembles [version 1; peer review: 2 approved] F1000Research 2018, 7(ISCB Comm J):1577 https://doi.org/10.12688/f1000research.16415.1

Persistent link

http://hdl.handle.net/10919/100775

Collections

Scholarly Works, Computer Science

Full item page

Large-scale protein function prediction using heterogeneous ensembles

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections