muBLASTP: database-indexed protein sequence search on multicore CPUs

dc.contributor.authorZhang, Jingen
dc.contributor.authorMisra, Sanchiten
dc.contributor.authorWang, Haoen
dc.contributor.authorFeng, Wu-chunen
dc.date.accessioned2021-05-03T18:41:25Zen
dc.date.available2021-05-03T18:41:25Zen
dc.date.issued2016en
dc.description.abstractBackground: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. Results: muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. Conclusions: With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.en
dc.description.sponsorshipThis research was supported in part by the NSF BIGDATA Program via IIS-1247693 and the NSF XPS Program via CCF-1337131.en
dc.identifier.doihttps://doi.org/10.1186/s12859-016-1302-4en
dc.identifier.issue443en
dc.identifier.urihttp://hdl.handle.net/10919/103181en
dc.identifier.volume17en
dc.language.isoen_USen
dc.publisherBioMed Centralen
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectBLASTen
dc.subjectDatabase indexen
dc.subjectLocal alignmenten
dc.subjectMulticoreen
dc.titlemuBLASTP: database-indexed protein sequence search on multicore CPUsen
dc.title.serialBMC Bioinformaticsen
dc.typeArticle - Refereeden

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang2016_Article.pdf
Size:
1.57 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: