AnalyzeThis: An Analysis Workflow-Aware Storage System

TR Number

Date

2014-12-17

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Supercomputing application simulations on hundreds of thousands of cores produce vast amounts of data that need to be analyzed on smaller-scale clusters to glean insights. The process is referred to as an end-to-end workflow. Extant workflow systems are stymied by the storage wall, resulting from both the disk-based parallel file system (PFS) failing to keep pace with the compute and memory subsystems as well as the inefficiencies in end-to-end workflow processing. In the post-petaflop era, supercomputers are provisioned with flash devices, as an intermediary between compute nodes and the PFS, enabling novel paradigms not just for expediting I/O, but also for the in-situ analysis of the simulation output data on the flash device. An array of such active flash elements allows us to fundamentally rethink the way data analysis workflows interact with storage systems. By blending the flash storage array and data analysis together in a seamless fashion, we create an analysis workflow-aware storage system, AnalyzeThis. Our guiding principle is that analysis-awareness be deeply ingrained in each and every layer of the storage system—active flash fabric, analysis object abstraction layer, scheduling layer within the storage, and an easy-to-use file system interface—thereby elevating data analyses as first-class citizens. Together, these concepts transform AnalyzeThis into a potent analytics-aware appliance.

Description

Keywords

File System, Distributed System

Citation

Collections