Characterization and Exploitation of GPU Memory Systems

Lee, Kenneth Sydney

Characterization and Exploitation of GPU Memory Systems

dc.contributor.author	Lee, Kenneth Sydney	en
dc.contributor.committeechair	Feng, Wu-chun	en
dc.contributor.committeemember	Cao, Yong	en
dc.contributor.committeemember	Lin, Heshan	en
dc.contributor.department	Computer Science and Applications	en
dc.date.accessioned	2014-03-14T20:42:07Z	en
dc.date.adate	2012-10-25	en
dc.date.available	2014-03-14T20:42:07Z	en
dc.date.issued	2012-07-06	en
dc.date.rdate	2012-10-25	en
dc.date.sdate	2012-07-27	en
dc.description.abstract	Graphics Processing Units (GPUs) are workhorses of modern performance due to their ability to achieve massive speedups on parallel applications. The massive number of threads that can be run concurrently on these systems allow applications which have data-parallel computations to achieve better performance when compared to traditional CPU systems. However, the GPU is not perfect for all types of computation. The massively parallel SIMT architecture of the GPU can still be constraining in terms of achievable performance. GPU-based systems will typically only be able to achieve between 40%-60% of their peak performance. One of the major problems affecting this effeciency is the GPU memory system, which is tailored to the needs of graphics workloads instead of general-purpose computation. This thesis intends to show the importance of memory optimizations for GPU systems. In particular, this work addresses problems of data transfer and global atomic memory contention. Using the novel AMD Fusion architecture, we gain overall performance improvements over discrete GPU systems for data-intensive applications. The fused architecture systems offer an interesting trade off by increasing data transfer rates at the cost of some raw computational power. We characterize the performance of different memory paths that are possible because of the shared memory space present on the fused architecture. In addition, we provide a theoretical model which can be used to correctly predict the comparative performance of memory movement techniques for a given data-intensive application and system. In terms of global atomic memory contention, we show improvements in scalability and performance for global synchronization primitives by avoiding contentious global atomic memory accesses. In general, this work shows the importance of understanding the memory system of the GPU architecture to achieve better application performance.	en
dc.description.degree	Master of Science	en
dc.identifier.other	etd-07272012-152625	en
dc.identifier.sourceurl	http://scholar.lib.vt.edu/theses/available/etd-07272012-152625/	en
dc.identifier.uri	http://hdl.handle.net/10919/34215	en
dc.publisher	Virginia Tech	en
dc.relation.haspart	Lee_KS_T_2012.pdf	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Data Transfer	en
dc.subject	Performance Modeling	en
dc.subject	GPGPU	en
dc.subject	APU	en
dc.subject	GPU	en
dc.subject	Memory Systems	en
dc.title	Characterization and Exploitation of GPU Memory Systems	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lee_KS_T_2012.pdf
Size:: 3.92 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses