Structure-based Optimizations for Sparse Matrix-Vector Multiply

Belgin, Mehmet

Structure-based Optimizations for Sparse Matrix-Vector Multiply

dc.contributor.author	Belgin, Mehmet	en
dc.contributor.committeecochair	Ribbens, Calvin J.	en
dc.contributor.committeecochair	Back, Godmar V.	en
dc.contributor.committeemember	Sandu, Adrian	en
dc.contributor.committeemember	Cameron, Kirk W.	en
dc.contributor.committeemember	Gugercin, Serkan	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2014-03-14T20:21:10Z	en
dc.date.adate	2011-01-16	en
dc.date.available	2014-03-14T20:21:10Z	en
dc.date.issued	2010-12-14	en
dc.date.rdate	2012-05-14	en
dc.date.sdate	2010-12-24	en
dc.description.abstract	This dissertation introduces two novel techniques, OSF and PBR, to improve the performance of Sparse Matrix-vector Multiply (SMVM) kernels, which dominate the runtime of iterative solvers for systems of linear equations. SMVM computations that use sparse formats typically achieve only a small fraction of peak CPU speeds because they are memory bound due to their low flops:byte ratio, they access memory irregularly, and exhibit poor ILP due to inefficient pipelining. We particularly focus on improving the flops:byte ratio, which is the main limiter on performance, by exploiting recurring structures or sub-structures in matrices. Our techniques also support micro-architecture level optimizations to further improve performance. Operation Stacking Framework (OSF) stacks problems in large ensemble computations, which run the same sparse kernel using an identical matrix structure, such that they share a single copy of the indexing information to significantly reduce memory bandwidth usage. OSF provides performance improvements of up to 1.94x on an AMD Opteron compared to the CSR method. We validate performance results using hardware event counters, which demonstrate significantly improved cache and pipeline utilization. Pattern-based Representation (PBR) exploits recurring block nonzero patterns by generating custom code for each recurring block pattern. In this way, no indexing data for individual nonzero elements are read from memory, reducing the overall size of the indices by up to 98%. Our code generator emits highly tuned codes that utilize SSE vectorization and software prefetching. PBR accurately identifies a block size that achieves optimal or near-optimal performance using a linear multiple regression performance model. On recent multicore machines, PBR provides performance improvements of up to 3.4x sequentially and 5x in parallel, compared to the CSR method. The PBR library we provide converts matrices at runtime, allowing our method to be used as a drop-in replacement for existing methods. We compare PBR's overhead relative to its benefits and show that PBR is beneficial for many applications that repetitively call the SMVM kernel for the same matrix structure.	en
dc.description.degree	Ph. D.	en
dc.identifier.other	etd-12242010-124006	en
dc.identifier.sourceurl	http://scholar.lib.vt.edu/theses/available/etd-12242010-124006/	en
dc.identifier.uri	http://hdl.handle.net/10919/30260	en
dc.publisher	Virginia Tech	en
dc.relation.haspart	Belgin_Mehmet_D_2010.pdf	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Code Generators	en
dc.subject	Vectorization	en
dc.subject	Sparse	en
dc.subject	SpMV	en
dc.subject	SMVM	en
dc.subject	Matrix Vector Multiply	en
dc.subject	PBR	en
dc.subject	OSF	en
dc.subject	thread pool	en
dc.subject	parallel SpMV	en
dc.title	Structure-based Optimizations for Sparse Matrix-Vector Multiply	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Ph. D.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Belgin_Mehmet_D_2010.pdf
Size:: 4.69 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations