Compute Overlap Stall (COS): Predicting Performance of Power Management for Shared Memory Codes When Throttling Processors, Memory, and Thread Concurrency
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Maximizing performance under power constraints is a priority for highly parallel scientific applications. Modern systems offer control over operating modes, including processor speed (DVFS), memory speed (DMT), and concurrency level (DCT). Throttling speed and core usage reduces energy consumption at the cost of possible performance loss. Accurate execution time prediction mechanisms are useful for choosing system configurations that yield workload efficiency. The Compute Overlap Stall model predicts execution time of parallel applications across these operating modes. The key insight of the model is that pure compute time, pure stall time, and compute-memory overlap are discretely affected by these three operating modes. We validate and update the model with an emergent architecture and reduce the size of the training set with negligible loss in prediction accuracy. We extend the model to support performance prediction for heterogeneous multi-core processors. We employ the optimized COS model on three architectures for 14 application benchmarks. We observe a mean prediction error within 10% for the homogeneous model, and within 13% for the heterogeneous-aware model for most applications.