Transforming and Optimizing Irregular Applications for Parallel Architectures
Parallel architectures, including multi-core processors, many-core processors, and multi-node systems, have become commonplace, as it is no longer feasible to improve single-core performance through increasing its operating clock frequency. Furthermore, to keep up with the exponentially growing desire for more and more computational power, the number of cores/nodes in parallel architectures has continued to dramatically increase. On the other hand, many applications in well-established and emerging fields, such as bioinformatics, social network analysis, and graph processing, exhibit increasing irregularities in memory access, control flow, and communication patterns. While multiple techniques have been introduced into modern parallel architectures to tolerate these irregularities, many irregular applications still execute poorly on current parallel architectures, as their irregularities exceed the capabilities of these techniques. Therefore, it is critical to resolve irregularities in applications for parallel architectures. However, this is a very challenging task, as the irregularities are dynamic, and hence, unknown until runtime.
To optimize irregular applications, many approaches have been proposed to improve data locality and reduce irregularities through computational and data transformations. However, there are two major drawbacks in these existing approaches that prevent them from achieving optimal performance. First, these approaches use local optimizations that exploit data locality and regularity locally within a loop or kernel. However, in many applications, there is hidden locality across loops or kernels. Second, these approaches use "one-size-fits-all'' methods that treat all irregular patterns equally and resolve them with a single method. However, many irregular applications have complex irregularities, which are mixtures of different types of irregularities and need differentiated optimizations. To overcome these two drawbacks, we propose a general methodology that includes a taxonomy of irregularities to help us analyze the irregular patterns in an application, and a set of adaptive transformations to reorder data and computation based on the characteristics of the application and architecture.
By extending our adaptive data-reordering transformation on a single node, we propose a data-partitioning framework to resolve the load imbalance problem of irregular applications on multi-node systems. Unlike existing frameworks, which use "one-size-fits-all" methods to partition the input data by a single property, our framework provides a set of operations to transform the input data by multiple properties and generates the desired data-partitioning codes by composing these operations into a workflow.