Power, Performance and Energy Models and Systems for Emergent Architectures
Massive parallelism combined with complex memory hierarchies and heterogeneity in high-performance computing (HPC) systems form a barrier to efficient application and architecture design. The performance achievements of the past must continue over the next decade to address the needs of scientific simulations. However, building an exascale system by 2022 that uses less than 20 megawatts will require significant innovations in power and performance efficiency.
A key limitation of past approaches is a lack of power-performance policies allowing users to quantitatively bound the effects of power management on the performance of their applications and systems. Existing controllers and predictors use policies fixed by a knowledgeable user to opportunistically save energy and minimize performance impact. While the qualitative effects are often good and the aggressiveness of a controller can be tuned to try to save more or less energy, the quantitative effects of tuning and setting opportunistic policies on performance and power are unknown. In other words, the controller will save energy and minimize performance loss in many cases but we have little understanding of the quantitative effects of controller tuning. This makes setting power-performance policies a manual trial and error process for domain experts and a black art for practitioners. To improve upon past approaches to high-performance power management, we need to quantitatively understand the effects of power and performance at scale.
In this work, I have developed theories and techniques to quantitatively understand the relationship between power and performance for high performance systems at scale. For instance, our system-level, iso-energy-efficiency model analyzes, evaluates and predicts the performance and energy use of data intensive parallel applications on multi-core systems. This model allows users to study the effects of machine and application dependent characteristics on system energy efficiency. Furthermore, this model helps users isolate root causes of energy or performance inefficiencies and develop strategies for scaling systems to maintain or improve efficiency. I have also developed methodologies which can be extended and applied to model modern heterogeneous architectures such as GPU-based clusters to improve their efficiency at scale.