An Application-Attuned Framework for Optimizing HPC Storage Systems

dc.contributor.authorPaul, Arnab Kumaren
dc.contributor.committeechairButt, Ali R.en
dc.contributor.committeememberTilevich, Elien
dc.contributor.committeememberWang, Gang Alanen
dc.contributor.committeememberFoster, Ianen
dc.contributor.committeememberLee, Dongyoonen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2020-08-20T08:00:51Zen
dc.date.available2020-08-20T08:00:51Zen
dc.date.issued2020-08-19en
dc.description.abstractHigh performance computing (HPC) is routinely employed in diverse domains such as life sciences, and Geology, to simulate and understand the behavior of complex phenomena. Big data driven scientific simulations are resource intensive and require both computing and I/O capabilities at scale. There is a crucial need for revisiting the HPC I/O subsystem to better optimize for and manage the increased pressure on the underlying storage systems from big data processing. Extant HPC storage systems are designed and tuned for a specific set of applications targeting a range of workload characteristics, but they lack the flexibility in adapting to the ever-changing application behaviors. The complex nature of modern HPC storage systems along with the ever-changing application behaviors present unique opportunities and engineering challenges. In this dissertation, we design and develop a framework for optimizing HPC storage systems by making them application-attuned. We select three different kinds of HPC storage systems - in-memory data analytics frameworks, parallel file systems and object storage. We first analyze the HPC application I/O behavior by studying real-world I/O traces. Next we optimize parallelism for applications running in-memory, then we design data management techniques for HPC storage systems, and finally focus on low-level I/O load balance for improving the efficiency of modern HPC storage systems.en
dc.description.abstractgeneralClusters of multiple computers connected through internet are often deployed in industry and laboratories for large scale data processing or computation that cannot be handled by standalone computers. In such a cluster, resources such as CPU, memory, disks are integrated to work together. With the increase in popularity of applications that read and write a tremendous amount of data, we need a large number of disks that can interact effectively in such clusters. This forms the part of high performance computing (HPC) storage systems. Such HPC storage systems are used by a diverse set of applications coming from organizations from a vast range of domains from earth sciences, financial services, telecommunication to life sciences. Therefore, the HPC storage system should be efficient to perform well for the different read and write (I/O) requirements from all the different sets of applications. But current HPC storage systems do not cater to the varied I/O requirements. To this end, this dissertation designs and develops a framework for HPC storage systems that is application-attuned and thus provides much improved performance than other state-of-the-art HPC storage systems without such optimizations.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:27129en
dc.identifier.urihttp://hdl.handle.net/10919/99793en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectParallel File Systemsen
dc.subjectObject-Based Storageen
dc.subjectData Managementen
dc.subjectLoad Balancingen
dc.subjectFile System Indexingen
dc.subjectMetadata Managementen
dc.subjectHigh Performance Computingen
dc.titleAn Application-Attuned Framework for Optimizing HPC Storage Systemsen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Paul_A_D_2020.pdf
Size:
14.27 MB
Format:
Adobe Portable Document Format