Analysis and Abstraction of Parallel Sequence Search

dc.contributor.authorGoddard, Christopher Josephen
dc.contributor.committeechairFeng, Wu-chunen
dc.contributor.committeememberBack, Godmar V.en
dc.contributor.committeememberTilevich, Elien
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T20:45:42Zen
dc.date.adate2007-10-03en
dc.date.available2014-03-14T20:45:42Zen
dc.date.issued2007-09-05en
dc.date.rdate2007-10-03en
dc.date.sdate2007-09-19en
dc.description.abstractThe ability to compare two biological sequences is extremely valuable, as matches can suggest evolutionary origins of genes or the purposes of particular amino acids. Results of such comparisons can be used in the creation of drugs, can help combat newly discovered viruses, or can assist in treating diseases. Unfortunately, the rate of sequence acquisition is outpacing our ability to compute on these data. Further, traditional dynamic programming algorithms are too slow to meet the needs of biologists, who wish to compare millions of sequences daily. While heuristic algorithms improve upon the performance of these dated applications, they still cannot keep up with the steadily expanding search space. Parallel sequence search implementations were developed to address this issue. By partitioning databases into work units for distributed computation, applications like mpiBLAST are able to achieve super-linear speedup over their sequential counterparts. However, such implementations are limited to clusters and require significant effort to work in a grid environment. Further, their parallelization strategies are typically specific to the target sequence search, so future applications require a reimplementation if they wish to run in parallel. This thesis analyzes the performance of two versions of mpiBLAST, noting trends as well as differences between them. Results suggest that these embarrassingly parallel applications are dominated by the time required to search vast amounts of data, and not by the communication necessary to support such searches. Consequently, a framework named gridRuby is introduced which alleviates two main issues with current parallel sequence search applications; namely, the requirement of a tightly knit computing environment and the specific, hand-crafted nature of parallelization. Results show that gridRuby can parallelize an application across a set of machines through minimal implementation effort, and can still exhibit super-linear speedup.en
dc.description.degreeMaster of Scienceen
dc.identifier.otheretd-09192007-155445en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-09192007-155445/en
dc.identifier.urihttp://hdl.handle.net/10919/35110en
dc.publisherVirginia Techen
dc.relation.haspartThesis.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectsequence searchen
dc.subjectBLASTen
dc.subjectparallelismen
dc.subjectgrid frameworken
dc.titleAnalysis and Abstraction of Parallel Sequence Searchen
dc.typeThesisen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis.pdf
Size:
668.79 KB
Format:
Adobe Portable Document Format

Collections