Liu, JinshuHadian, HamidWang, YuyueBerger, DanielNguyen, MarieJian, XunNoh, SamLi, Huaicheng2025-04-042025-04-042025-03-30https://hdl.handle.net/10919/125147Compute Express Link (CXL) has emerged as a pivotal interconnect for memory expansion. Despite its potential, the performance implications of CXL across devices, latency regimes, processors, and workloads remain underexplored. We present Melody, a framework for systematic characterization and analysis of CXL memory performance. Melody builds on an extensive evaluation spanning 265 workloads, 4 real CXL devices, 7 latency levels, and 5 CPU platforms. Melody yields many insights: workload sensitivity to sub-μs CXL latencies (140-410ns), the first disclosure of CXL tail latencies, CPU tolerance to CXL latencies, a novel approach (Spa) for pinpointing CXL bottlenecks, and CPU prefetcher inefficiencies under CXL.application/pdfenIn CopyrightSystematic CXL Memory Characterization and Performance Analysis at ScaleArticle - Refereed2025-04-01The author(s)https://doi.org/10.1145/3676641.3715987