Show simple item record

dc.contributor.authorXu, Lunaen_US
dc.date.accessioned2019-02-06T09:00:43Z
dc.date.available2019-02-06T09:00:43Z
dc.date.issued2019-02-05
dc.identifier.othervt_gsexam:18051en_US
dc.identifier.urihttp://hdl.handle.net/10919/87469
dc.description.abstractThe big data era has driven the needs for data analysis in every aspect of our daily lives. With the rapid growth of data size and complexity of data analysis models, modern big data analytic applications face the challenge to provide timely results often with limited resources. Such demand drives the growth of new hardware resources including GPUs and FPGAs, as well as storage devices such as SSDs and NVMs. It is challenging to manage the resources available in a cost restricted environment to best serve the applications with different characteristics. Extant approaches are agnostic to such heterogeneity in both underlying resources and workloads and require user knowledge and manual configuration for best performance. In this dissertation, we design, and implement a series of novel techniques, algorithms, and frameworks, to realize workload-aware resource management and scheduling. We demonstrate our techniques for efficient resource management across memory resource for in-memory data analytic platforms, processing resources for compute-intensive machine learning applications, and finally we design and develop a workload and heterogeneity-aware scheduler for general big data platforms. The dissertation demonstrates that designing an effective resource manager requires efforts from both application and system side. The presented approach makes and joins the efforts on both sides to provide a holistic heterogeneity-aware resource manage and scheduling system. We are able to avoid task failure due to resource unavailability by workload-aware resource management, and improve the performance of data processing frameworks by carefully scheduling tasks according to the task characteristics and utilization and availability of the resources.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectresource managementen_US
dc.subjectbig dataen_US
dc.subjectschedulingen_US
dc.subjectheterogeneityen_US
dc.titleA Workload-aware Resource Management and Scheduling System for Big Data Analysisen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairButt, Alien_US
dc.contributor.committeememberRibbens, Calvin J.en_US
dc.contributor.committeememberCameron, Kirk W.en_US
dc.contributor.committeememberLee, Dongyoonen_US
dc.contributor.committeememberLi, Minen_US
dc.description.abstractgeneralClusters of multiple computers connected through internet are often deployed in industry for larger scale data processing or computation that cannot be handled by standalone computers. In such a cluster, resources such as CPU, memory, disks are integrated to work together. It is important to manage a pool of such resources in a cluster to efficiently work together to provide better performance for workloads running on top. This role is taken by a software component in the middle layer called resource manager. Resource manager coordinates the resources in the computers and schedule tasks to them for computation. This dissertation reveals that current resource managers often partition resources statically hence cannot capture the dynamic resource needs of workloads as well as the heterogeneous configurations of the underlying resources. For example, some computers in a clsuter might be older than the others with slower CPU, less memory, etc. Workloads can show different resource needs. Watching YouTube require a lot of network resource while playing games demands powerful GPUs. To this end, the disseration proposes novel approaches to manage resources that are able to capture the heterogeneity of resources and dynamic workload needs, based on which, it can achieve efficient resource management, and schedule the right task to the right resource.en


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record