iBLAST: Incremental BLAST of new sequences via automated e-value correction

dc.contributor.authorDash, Sajalen
dc.contributor.authorRahman, S. R.en
dc.contributor.authorHines, H. M.en
dc.contributor.authorFeng, Wu-chunen
dc.contributor.departmentComputer Scienceen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.contributor.departmentBiomedical Engineering and Mechanicsen
dc.date.accessioned2021-08-25T16:10:56Zen
dc.date.available2021-08-25T16:10:56Zen
dc.date.issued2021-04-01en
dc.date.updated2021-08-25T16:10:53Zen
dc.description.abstractSearch results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size. Given the astronomical growth in genomics data throughout a genomic research investigation, sequence databases grow as new sequences are continuously being added to these databases. As a consequence, the results (e.g., best hits) and associated statistics (e.g., e-values) for a specific set of queries may change over the course of a genomic investigation. Thus, to update the results of a previously conducted BLAST search to find the best matches on an updated database, scientists must currently rerun the BLAST search against the entire updated database, which translates into irrecoverable and, in turn, wasted execution time, money, and computational resources. To address this issue, we devise a novel and efficient method to redeem past BLAST searches by introducing iBLAST. iBLAST leverages previous BLAST search results to conduct the same query search but only on the incremental (i.e., newly added) part of the database, recomputes the associated critical statistics such as e-values, and combines these results to produce updated search results. Our experimental results and fidelity analyses show that iBLAST delivers search results that are identical to NCBI BLAST at a substantially reduced computational cost, i.e., iBLAST performs (1 + δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We then present three different use cases to demonstrate that iBLAST can enable efficient biological discovery at a much faster speed with a substantially reduced computational cost.en
dc.description.versionPublished versionen
dc.format.extentPages e0249410en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0249410en
dc.identifier.eissn1932-6203en
dc.identifier.issn1932-6203en
dc.identifier.issue4 April 2021en
dc.identifier.otherPONE-D-20-31232 (PII)en
dc.identifier.pmid33886589en
dc.identifier.urihttp://hdl.handle.net/10919/104706en
dc.identifier.volume16en
dc.language.isoenen
dc.publisherPLoSen
dc.relation.urihttps://www.ncbi.nlm.nih.gov/pubmed/33886589en
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleiBLAST: Incremental BLAST of new sequences via automated e-value correctionen
dc.title.serialPLoS ONEen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.otherJournal Articleen
dcterms.dateAccepted2021-03-17en
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/Faculty of Health Sciencesen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
iBLAST Incremental BLAST of new sequences via automated e-value correction.pdf
Size:
1.41 MB
Format:
Adobe Portable Document Format
Description:
Published version