Towards Workload-aware Efficient Machine Learning Systems

dc.contributor.authorKhan, Redwan Ibne Serajen
dc.contributor.committeechairButt, Alien
dc.contributor.committeememberCameron, Kirk W.en
dc.contributor.committeememberJi, Boen
dc.contributor.committeememberJian, Xunen
dc.contributor.committeememberCheng, Yueen
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2025-03-04T09:00:13Zen
dc.date.available2025-03-04T09:00:13Zen
dc.date.issued2025-03-03en
dc.description.abstractMachine learning (ML) is transforming various aspects of our lives, driving the need for computing systems that efficiently support large-scale ML workloads. As models grow in size and complexity, existing systems struggle to adapt, limiting both performance and flexibility. Additionally, ML techniques can enhance traditional computing tasks, but current systems lack the adaptability to integrate these advancements effectively. Building systems for running machine learning workloads, and running workloads using machine learning - both require a careful understanding of the nature of the systems and ML models. In this dissertation we design and develop a series of novel storage and scheduling solutions for ML systems by bringing attention to the unique characteristics of workloads and the underlying system. We find that by designing ML systems that are finely tuned to workload characteristics and underlying infrastructure, we can significantly enhance application performance and maximize resource utilization. In the first part of this dissertation (Ch- 3), we analyze popular ML models and datasets, uncovering insights that inspired SHADE, a data-importance-aware caching solution for ML. The second part of this dissertation (Ch- 4) proposes to leverage system characteristics of hundreds of client devices along with the characteristics of the samples within the clients to design novel sampling, caching and client scheduling mechanisms to tackle the data and system heterogeneity among client devices and thereby fundamentally improve the performance of federated learning using edge devices in the cloud. The third part of this dissertation (Ch- 5) proposes to leverage multi-agent LLM application and user request characteristics to design an efficient request scheduling mechanism that can serve clients in multi-tenant environments in a fair and efficient manner while preventing abuse. My dissertation demonstrates that leveraging workload-aware strategies can significantly enhance the efficiency (e.g., reduced training time, increased throughput, lower latency) and flexibility (e.g., improved ease of use, deployment, and programmability) of ma- chine learning systems. By accounting for workload dynamicity and heterogeneity, these principles can guide the design of next-generation ML systems, ensuring adaptability to emerging models and evolving hardware technologies.en
dc.description.abstractgeneralMachine learning (ML) has become an integral part of our daily lives, powering applications from virtual assistants to medical diagnostics. As ML models grow larger and more complex, the systems that run them must evolve to keep pace. This dissertation explores how we can build more efficient and adaptable computing systems to support large-scale ML workloads. Traditional computing systems often struggle to accommodate the ever-changing demands of ML applications. Similarly, ML techniques can be leveraged to improve the performance of non-ML workloads, but existing systems lack the flexibility to integrate these advancements seamlessly. This research tackles both challenges: designing systems optimized for ML workloads and enhancing traditional systems using ML-driven insights. By designing intelligent, workload-aware strategies, this research demonstrates substantial improvements in the speed, efficiency, and flexibility of ML systems. These principles will help shape the next generation of computing infrastructure, ensuring that future ML models and applications can be deployed smoothly, regardless of scale or complexity.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:42574en
dc.identifier.urihttps://hdl.handle.net/10919/124762en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectMachine Learningen
dc.subjectDeep Learningen
dc.subjectFederated Learningen
dc.subjectHigh Performance Computingen
dc.subjectCloud Computingen
dc.subjectStorage Systemsen
dc.subjectData Storageen
dc.subjectData Managementen
dc.subjectMachine Learning Systemsen
dc.subjectJob Schedulingen
dc.subjectResource Managementen
dc.subjectMLSysen
dc.subjectSysMLen
dc.subjectEfficiencyen
dc.subjectFlexibilityen
dc.titleTowards Workload-aware Efficient Machine Learning Systemsen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Khan_R_D_2025.pdf
Size:
2.13 MB
Format:
Adobe Portable Document Format