Characterization and Optimization of the Fitting of Quantum Correlation Functions

Abstract

This case study presents a characterization and optimization of an application code for extracting parton distribution functions from high energy electron-proton scattering data. Profiling this application code reveals that the phase-space density computation accounts for 93% of the overall execution time for a single iteration on a single core. When executing multiple iterations in parallel on a multicore system, the application spends 78% of its overall execution time idling due to load imbalance. We address these issues by first transforming the application code from Python to C++ and then tackling the application load imbalance via a hybrid scheduling strategy that combines dynamic and static scheduling. These techniques result in a 62% reduction in CPU idle time and a 2.46x speedup in overall execution time per node. In addition, the typically enabled power-management mechanisms in supercomputers (e.g., AMD Turbo Core, Intel Turbo Boost, and RAPL) can significantly impact intra-node scalability when more than 50% of the CPU cores are used. This finding underscores the importance of understanding system interactions with power management, as they can adversely impact application performance, and highlights the necessity of intra-node scaling tests to identify performance degradation that inter-node scaling tests might otherwise overlook.

Description

Keywords

C++, Python, parallelization, profiling, characterization, optimization, performance, power management, scalability, systems, deep inelastic scattering, quantum physics

Citation