Show simple item record

dc.contributor.authorXiao, Shucaien_US
dc.contributor.authorAji, Ashwin M.en_US
dc.contributor.authorFeng, Wu-chunen_US
dc.date.accessioned2013-06-19T14:35:52Z
dc.date.available2013-06-19T14:35:52Z
dc.date.issued2009
dc.identifierhttp://eprints.cs.vt.edu/archive/00001088/en_US
dc.identifier.urihttp://hdl.handle.net/10919/20206
dc.description.abstractGraphics processing units (GPUs) have been widely used to accelerate algorithms that exhibit massive data parallelism or task parallelism. When such parallelism is not inherent in an algorithm, computational scientists resort to simply replicating the algorithm on every multiprocessor of a NVIDIA GPU, for example, to create such parallelism, resulting in embarrassingly parallel ensemble runs that deliver significant aggregate speed-up. However, the fundamental issue with such ensemble runs is that the problem size to achieve this speed-up is limited to the available shared memory and cache of a GPU multiprocessor. An example of the above is dynamic programming (DP), one of the Berkeley 13 dwarfs. All known DP implementations to date use the coarse-grained approach of embarrassingly parallel ensemble runs because a finer-grained parallelization on the GPU would require extensive communication between the multiprocessors of a GPU, which could easily cripple performance as communication between multiprocessors is not natively supported in a GPU. Consequently, we address the above by proposing a fine-grained parallelization of a single instance of the DP algorithm that is mapped to the GPU. Our parallelization incorporates a set of techniques aimed to substantially improve GPU performance: matrix re-alignment, coalesced memory access, tiling, and GPU (rather than CPU) synchronization. The specific DP algorithm that we parallelize is cal led Smith-Waterman (SWat), which is an optimal local-sequence alignment algorithm. We use this SWat algorithm as a baseline to compare our GPU implementation, i.e., CUDA-SWat, to our Cell implementation, i.e., Cell-SWat.en_US
dc.format.mimetypeapplication/pdfen_US
dc.publisherDepartment of Computer Science, Virginia Polytechnic Institute & State Universityen_US
dc.subjectParallel computationen_US
dc.titleOn the Robust Mapping of Dynamic Programming onto a Graphics Processing Uniten_US
dc.typeTechnical reporten_US
dc.identifier.trnumberTR-09-20en_US
dc.type.dcmitypeTexten_US
dc.identifier.sourceurlhttp://eprints.cs.vt.edu/archive/00001088/01/TR_dynamic_programming_mapping.pdf


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record