A Workload-aware Resource Management and Scheduling System for Big Data Analysis

dc.contributor.authorXu, Lunaen
dc.contributor.committeechairButt, Ali R.en
dc.contributor.committeememberRibbens, Calvin J.en
dc.contributor.committeememberCameron, Kirk W.en
dc.contributor.committeememberLee, Dongyoonen
dc.contributor.committeememberLi, Minen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2019-02-06T09:00:43Zen
dc.date.available2019-02-06T09:00:43Zen
dc.date.issued2019-02-05en
dc.description.abstractThe big data era has driven the needs for data analysis in every aspect of our daily lives. With the rapid growth of data size and complexity of data analysis models, modern big data analytic applications face the challenge to provide timely results often with limited resources. Such demand drives the growth of new hardware resources including GPUs and FPGAs, as well as storage devices such as SSDs and NVMs. It is challenging to manage the resources available in a cost restricted environment to best serve the applications with different characteristics. Extant approaches are agnostic to such heterogeneity in both underlying resources and workloads and require user knowledge and manual configuration for best performance. In this dissertation, we design, and implement a series of novel techniques, algorithms, and frameworks, to realize workload-aware resource management and scheduling. We demonstrate our techniques for efficient resource management across memory resource for in-memory data analytic platforms, processing resources for compute-intensive machine learning applications, and finally we design and develop a workload and heterogeneity-aware scheduler for general big data platforms. The dissertation demonstrates that designing an effective resource manager requires efforts from both application and system side. The presented approach makes and joins the efforts on both sides to provide a holistic heterogeneity-aware resource manage and scheduling system. We are able to avoid task failure due to resource unavailability by workload-aware resource management, and improve the performance of data processing frameworks by carefully scheduling tasks according to the task characteristics and utilization and availability of the resources.en
dc.description.abstractgeneralClusters of multiple computers connected through internet are often deployed in industry for larger scale data processing or computation that cannot be handled by standalone computers. In such a cluster, resources such as CPU, memory, disks are integrated to work together. It is important to manage a pool of such resources in a cluster to efficiently work together to provide better performance for workloads running on top. This role is taken by a software component in the middle layer called resource manager. Resource manager coordinates the resources in the computers and schedule tasks to them for computation. This dissertation reveals that current resource managers often partition resources statically hence cannot capture the dynamic resource needs of workloads as well as the heterogeneous configurations of the underlying resources. For example, some computers in a clsuter might be older than the others with slower CPU, less memory, etc. Workloads can show different resource needs. Watching YouTube require a lot of network resource while playing games demands powerful GPUs. To this end, the disseration proposes novel approaches to manage resources that are able to capture the heterogeneity of resources and dynamic workload needs, based on which, it can achieve efficient resource management, and schedule the right task to the right resource.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:18051en
dc.identifier.urihttp://hdl.handle.net/10919/87469en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectresource managementen
dc.subjectbig dataen
dc.subjectschedulingen
dc.subjectheterogeneityen
dc.titleA Workload-aware Resource Management and Scheduling System for Big Data Analysisen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Xu_L_D_2019.pdf
Size:
1.58 MB
Format:
Adobe Portable Document Format