Performance Evaluation of Large Language Models for High-Performance Code Generation: A Multi-Agent Approach (MARCO)

Rahman, AsifCvetkovic, VeljkoReece, KathleenWalters, AidanHassan, YasirTummeti, AneeshTorres, BrianCooney, DeniseEllis, MargaretNikolopoulos, Dimitrios2025-05-072025-05-072025-05-07https://hdl.handle.net/10919/129385Large language models (LLMs) have transformed software development through code generation capabilities, yet their effectiveness for high-performance computing (HPC) remains limited. HPC code requires specialized optimizations for parallelism, memory efficiency, and architecture-specific considerations that general-purpose LLMs often overlook. We present MARCO (Multi-Agent Reactive Code Optimizer), a novel framework that enhances LLM-generated code for HPC through a specialized multi-agent architecture. MARCO employs separate agents for code generation and performance evaluation, connected by a feedback loop that progressively refines optimizations. A key innovation is MARCO's web-search component that retrieves real-time optimization techniques from recent conference proceedings and research publications, bridging the knowledge gap in pre-trained LLMs. Our extensive evaluation on the LeetCode 75 problem set demonstrates that MARCO achieves a 14.6% average runtime reduction compared to Claude 3.5 Sonnet alone, while the integration of the web-search component yields a 30.9% performance improvement over the base MARCO system. These results highlight the potential of multi-agent systems to address the specialized requirements of high-performance code generation, offering a cost-effective alternative to domain-specific model fine-tuning.9 page(s)application/pdfenIn CopyrightPerformance Evaluation of Large Language Models for High-Performance Code Generation: A Multi-Agent Approach (MARCO)ArticleNikolopoulos, Dimitrios [0000-0003-0217-8307]