Optimizing Data Accesses for Scaling Data-intensive Scientific Applications

Yeom, Jae-seung

Optimizing Data Accesses for Scaling Data-intensive Scientific Applications

Files

Yeom_J_D_2014.pdf (16.19 MB)

Downloads: 1135

Date

2014-05-30

Authors

Yeom, Jae-seung

Publisher

Virginia Tech

Abstract

Data-intensive scientific applications often process an enormous amount of data. The scalability of such applications depends critically on how to manage the locality of data. Our study explores two common types of applications that are vastly different in terms of memory access pattern and workload variation. One includes those with multi-stride accesses in regular nested parallel loops. The other is for processing large-scale irregular social network graphs. In the former case, the memory location or the data item accessed in a loop is predictable and the load on processing a unit work (an array element) is relatively uniform with no significant variation. On the other hand, in the latter case, the data access per unit work (a vertex) is highly irregular in terms of the number of accesses and the locations being accessed. This property is further tied to the load and presents significant challenges in the scalability of the application performance.

Designing platforms to support extreme performance scaling requires understanding of how application specific information can be used to control the locality and improve the performance. Such insights are necessary to determine which control and which abstraction to provide for interfacing an underlying system and an application as well as for designing a new system. Our goal is to expose common requirements of data-intensive scientific applications for scalability.

For the former type of applications, those with regular accesses and uniform workload, we contribute new methods to improve the temporal locality of software-managed local memories, and optimize the critical path of scheduling data transfers for multi-dimensional arrays in nested loops. In particular, we provide a runtime framework allowing transparent optimization by source-to-source compilers or automatic fine tuning by programmers. Finally, we demonstrate the effectiveness of the approach by comparing against a state-of-the-art language-based framework. For the latter type, those with irregular accesses and non-uniform workload, we analyze how the heavy-tailed property of input graphs limits the scalability of the application. Then, we introduce an application-specific workload model as well as a decomposition method that allows us to optimize locality with the custom load balancing constraints of the application. Finally, we demonstrate unprecedented strong scaling of a contagion simulation on two state-of-the-art high performance computing platforms.

Keywords

Parallel systems, Software-managed memories, Distributed memories, Data locality, Scalability, Parallel discrete event simulation, Social networks, Contagion

Persistent link

http://hdl.handle.net/10919/64180

Collections

Doctoral Dissertations

Full item page

Optimizing Data Accesses for Scaling Data-intensive Scientific Applications

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections