Show simple item record

dc.contributor.authorZhang, Jingen
dc.contributor.authorMisra, Sanchiten
dc.contributor.authorWang, Haoen
dc.contributor.authorFeng, Wu-chunen
dc.description.abstractBackground: The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. Results: muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. Conclusions: With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.en
dc.description.sponsorshipThis research was supported in part by the NSF BIGDATA Program via IIS-1247693 and the NSF XPS Program via CCF-1337131.en
dc.publisherBioMed Centralen
dc.rightsAttribution 4.0 Internationalen
dc.subjectDatabase indexen
dc.subjectLocal alignmenten
dc.titlemuBLASTP: database-indexed protein sequence search on multicore CPUsen
dc.typeArticle - Refereeden
dc.title.serialBMC Bioinformaticsen

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
License: Attribution 4.0 International