Chuang, Pi-YuehShah, NiteyaBarry, PatrickCloet, IanConstantinescu, Emil M.Sato, NobuoQiu, Jian-WeiFeng, Wu-chun2026-03-132026-03-132024-09979-8-3503-8714-82377-6943https://hdl.handle.net/10919/142240This case study presents a characterization and optimization of an application code for extracting parton distribution functions from high energy electron-proton scattering data. Profiling this application code reveals that the phase-space density computation accounts for 93% of the overall execution time for a single iteration on a single core. When executing multiple iterations in parallel on a multicore system, the application spends 78% of its overall execution time idling due to load imbalance. We address these issues by first transforming the application code from Python to C++ and then tackling the application load imbalance via a hybrid scheduling strategy that combines dynamic and static scheduling. These techniques result in a 62% reduction in CPU idle time and a 2.46x speedup in overall execution time per node. In addition, the typically enabled power-management mechanisms in supercomputers (e.g., AMD Turbo Core, Intel Turbo Boost, and RAPL) can significantly impact intra-node scalability when more than 50% of the CPU cores are used. This finding underscores the importance of understanding system interactions with power management, as they can adversely impact application performance, and highlights the necessity of intra-node scaling tests to identify performance degradation that inter-node scaling tests might otherwise overlook.8 page(s)application/pdfenIn CopyrightC++Pythonparallelizationprofilingcharacterizationoptimizationperformancepower managementscalabilitysystemsdeep inelastic scatteringquantum physicsCharacterization and Optimization of the Fitting of Quantum Correlation FunctionsConference proceeding2024 IEEE High Performance Extreme Computing Conference (HPEC)https://doi.org/10.1109/HPEC62836.2024.10938443Feng, Wu-Chun [0000-0002-6015-0727]