MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins

dc.contributor.authorAmgarten, Deyviden
dc.contributor.authorBraga, Lucas P. P.en
dc.contributor.authorda Silva, Aline M.en
dc.contributor.authorSetubal, Joao C.en
dc.date.accessioned2019-10-28T16:47:16Zen
dc.date.available2019-10-28T16:47:16Zen
dc.date.issued2018-08-07en
dc.description.abstractHere we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences.en
dc.description.notesDA was supported in part by fellowship Grant No. 2014/16450-8 from the Sao Paulo State Research Foundation (FAPESP). DA and LB were supported by a fellowship from Brazilian Federal Agency CAPES. JS and AdS wish to acknowledge their respective research fellowships from CNPq. This work was supported in part by FAPESP Grant No. 2011/50870-6 and by CAPES Grant No. 3385/2013.en
dc.description.sponsorshipSao Paulo State Research Foundation (FAPESP)Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) [2014/16450-8]; Brazilian Federal Agency CAPESCAPES; CNPqNational Council for Scientific and Technological Development (CNPq); FAPESPFundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) [2011/50870-6]; CAPESCAPES [3385/2013]en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.3389/fgene.2018.00304en
dc.identifier.issn1664-8021en
dc.identifier.other304en
dc.identifier.pmid30131825en
dc.identifier.urihttp://hdl.handle.net/10919/95192en
dc.identifier.volume9en
dc.language.isoenen
dc.publisherFrontiersen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectphageen
dc.subjectvirusen
dc.subjectmicrobiomeen
dc.subjectMachine learningen
dc.subjectrandom foresten
dc.titleMARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Binsen
dc.title.serialFrontiers in Geneticsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.dcmitypeStillImageen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
fgene-09-00304.pdf
Size:
3.85 MB
Format:
Adobe Portable Document Format
Description: