Analysis and Abstraction of Parallel Sequence Search

Goddard, Christopher Joseph

Analysis and Abstraction of Parallel Sequence Search

dc.contributor.author	Goddard, Christopher Joseph	en
dc.contributor.committeechair	Feng, Wu-chun	en
dc.contributor.committeemember	Back, Godmar V.	en
dc.contributor.committeemember	Tilevich, Eli	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2014-03-14T20:45:42Z	en
dc.date.adate	2007-10-03	en
dc.date.available	2014-03-14T20:45:42Z	en
dc.date.issued	2007-09-05	en
dc.date.rdate	2007-10-03	en
dc.date.sdate	2007-09-19	en
dc.description.abstract	The ability to compare two biological sequences is extremely valuable, as matches can suggest evolutionary origins of genes or the purposes of particular amino acids. Results of such comparisons can be used in the creation of drugs, can help combat newly discovered viruses, or can assist in treating diseases. Unfortunately, the rate of sequence acquisition is outpacing our ability to compute on these data. Further, traditional dynamic programming algorithms are too slow to meet the needs of biologists, who wish to compare millions of sequences daily. While heuristic algorithms improve upon the performance of these dated applications, they still cannot keep up with the steadily expanding search space. Parallel sequence search implementations were developed to address this issue. By partitioning databases into work units for distributed computation, applications like mpiBLAST are able to achieve super-linear speedup over their sequential counterparts. However, such implementations are limited to clusters and require significant effort to work in a grid environment. Further, their parallelization strategies are typically specific to the target sequence search, so future applications require a reimplementation if they wish to run in parallel. This thesis analyzes the performance of two versions of mpiBLAST, noting trends as well as differences between them. Results suggest that these embarrassingly parallel applications are dominated by the time required to search vast amounts of data, and not by the communication necessary to support such searches. Consequently, a framework named gridRuby is introduced which alleviates two main issues with current parallel sequence search applications; namely, the requirement of a tightly knit computing environment and the specific, hand-crafted nature of parallelization. Results show that gridRuby can parallelize an application across a set of machines through minimal implementation effort, and can still exhibit super-linear speedup.	en
dc.description.degree	Master of Science	en
dc.identifier.other	etd-09192007-155445	en
dc.identifier.sourceurl	http://scholar.lib.vt.edu/theses/available/etd-09192007-155445/	en
dc.identifier.uri	http://hdl.handle.net/10919/35110	en
dc.publisher	Virginia Tech	en
dc.relation.haspart	Thesis.pdf	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	sequence search	en
dc.subject	BLAST	en
dc.subject	parallelism	en
dc.subject	grid framework	en
dc.title	Analysis and Abstraction of Parallel Sequence Search	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Thesis.pdf
Size:: 668.79 KB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses